SCOPe Dataset

FoldExplorer uses SCOPe v 2.07 (published in March 2018) as the benchmark dataset. We only use the subset with less than 40% sequence identity, a total of 14323 protein structure domains. After removing multi-chain domains, we finally obtained 13265 protein domains. The sid of remaining structures after removal is provided here. You can download from its website SCOPe.

Independent Dataset

To comprehensively assess the generalization capability of FoldExplorer and evaluate its performance on out-of-distribution (OOD) datasets, we have curated a set of protein structural domains from PDB website as an independent test set called inDomain. Detailed information can be obtained from the indDomain file, and corresponding structure files can be downloaded from PDB website.

AlphaFold Protein Structure Dataset

FoldExplorer supports searching structures in larger databases. We provide access to the SwissProt database, as well as databases for various species (constantly updating). For detailed information, please refer to UniProt and AlphaFold Protein Structure Database.