Benchmark Data
Online Supporting Information A. This learning dataset includes 8,897 protein sequences (7,766 different proteins), classified into 22 eukaryotic subcellular locations. Among the 7,766 different proteins, 6,687 belong to one subcellular location, 1,029 to two locations, 48 to three locations, and 2 to four locations. Both the accession numbers and sequences are given. None of the proteins has more than 25% sequence identity to any other in the same subset (subcellular location). See the text of the paper for further explanation. Click Supp-A to download the dataset.