To prove the effectiveness of ZeroBind, we construct three independent test sets for model evaluation: 1) Transductive test set, where both drugs and proteins are present in the training set, but interactions between them are absent from the training set; 2) Semi-inductive test set, where only the drugs are present in the training set; 3) Inductive test set, where both drugs and proteins are absent from the training set.
Download datasets
1.Three independent test sets for model evaluation
As the meta-learning-based framework requires sufficient data, we first extract the proteins with the number of associated molecules of more than 20. Then, we randomly select 90% proteins, 90% DTIs of each protein as the training set, and the rest 10% DTIs are used to further construct the Transductive test set. The DTIs of the rest 10% proteins and proteins with the number of associated molecules less than 20 are divided into the Semi-inductive test set and the Inductive test set using the above strategy.
2. Few-shot test sets for model evaluation
In addition, we construct another few-shot test set with combining the Semi-inductive test set and the Inductive test set to evaluate the few-shot learning power of ZeroBind. Then, we randomly select 5 positive and 5 negative DTIs of each protein as the few-shot fine-tuning set, and the rest DTIs of each protein as the few-shot test set. Proteins that don't have enough positive or negative DTIs are excluded from the few-shot fine-tuning set and the few-shot test set.
Dataset format:
Among each txt file, each line contains the following five types of information:
Protein_PDBID, Molecule SMILES, Molecule Inchy Key, standard_type/nM, and DTI value
These five types of information are separated by commas as follows:
Q16790,CCCC(CCC)C(=O)NC1Cc2ccc(cc2C1)S(N)(=O)=O,XBYJCVDSFWJBSM-UHFFFAOYSA-N,Ki (nM),282