Online Supporting Information A. This learning dataset includes 3,681 protein sequences (3,106 different proteins), classified into 14 human subcellular locations. Among the 3,106 different proteins, 2,580 of them belong only to 1 location; 480 of them belong to 2 locations; 43 of them belong to 3 locations and 3 of them belong to 4 locations. Both the accession numbers and sequences are given. None of the proteins has more than 25% sequence identity to any other in the same subset (subcellular location). See the text of the paper for further explanation.
Click Supp-A to download the dataset.