Benchmark Datasets
Online Supporing Infomation A:   The accession numbers and sequences of the 3,051 proteases in the benchmark dataset S+ classified into 6 types (subsets) according to the MEROPS database (Rawlings, N. D.; Tolle, D. P.; Barrett, A. J. Nucleic Acids Research 2004, 32, D160-D164): (1) aspartic proteases, (2) cysteine, (3) glutamic, (4) metallo, (5) serine, and (6) threonine. None of proteins included here has more than 25% pairwise sequence identity to any other in a same subset. See the text of the paper for further explanation. To download the data in the Online Supporting Information A, click Supp-A.
 
Online Supporing Infomation B:   List of 3,278 accession numbers and protein sequences in the non-protease benchmark dataset S- that were randomly picked from Swiss-Prot databank (version 55.3 released on 29-Apr-2008). None of the proteins included here has more than 25% pairwise sequence identity to any other. See the text of the paper for further explanation. To download the data in the Online Supporting Information B, click Supp-B.