Benchmark Data
Online Supporting Information A. This learning dataset includes 3134 protein sequences (2750 different proteins), classified into 14 human subcellular locations. Among the 2750 different proteins, 2396 of them belong only to 1 location; 325 of them belong to 2 locations; 28 of them belong to 3 locations and 1 of them belongs to 4 locations. Both the accession numbers and sequences are given. None of the proteins has more than 25% sequence identity to any other in the same subset (subcellular location). See the reference given on the top page of the web-server for further explanation. Click Supp-A to download the dataset.