Benchmark Data
Online Supporting Information A. The benchmark dataset(HumB) extract human proteins from SWISS-PROT released on January 2012, which includes 4,229 protein sequences (3,129 different proteins), classified into 12 human subcellular locations. Among the 3,129 different proteins, 2,306 of them belong only to 1 location; 595 of them belong to 2 locations; 186 of them belong to 3 locations; 36 of them belong to 4 locations; 5 of them belong to 5 locations and 1 of them belong to 6 locations. Both the accession numbers and sequences are given. None of the proteins has more than 25% sequence identity to any other in this benchmark. See the text of the paper for further explanation. Click Supp-A to download the bench mark dataset (HumB).
Online Supporting Information B. The independent dataset(HumT) extract human protein from SWISS-PROT released on May 2015,which includes 541 protein sequences (379 different proteins), classified into 12 human subcellular locations, Among the 379 different proteins, 259 of them belong to 1 location; 83 of them belong to 2 locations; 32 of them belong to 3 locations and 5 of them belong to 4 locations. Both the accession numbers and sequences are given. None of the proteins has more than 25% sequence identity to the benchmark dataset(HumB). See the text of the paper for further explanation. Click Supp-B to download the independent dataset (HumT).