Dataset description: | |||||||||||||||||||||||||||||||||||||
We
constructed benchmark datasets from BioLip, which is a semi-manually
curated database for biologically relevant protein-ligand interactions.
5 different types of ligands, i.e., Ca2+, Mg2+, Mn2+, ATP and HEME,
were considered in the present study. We constructed the training dataset
and corresponding independent testing dataset for each of them except for ATP.
Table
1. Composition of the training datasets and the
testing datasets
for the 4 types of ligands
Ligand Category
Ligand
Type
Training Dataset
Testing Dataset
Total No. of Proteins
No. of proteins
(numP, numN)
No. of Proteins
(numP, numN)
Metal Ion
Ca2+
1022
(4830,255917)
515
(2958,186678)
1537
Mg2+
1194
(4147, 320736)
651
(2321, 244088)
1845
Mn2+
440
(1931, 150299)
144
(612, 50838)
584
HEME
175
(3851, 44477)
96
(2012, 26341)
271
numP, numN in 2-tuple (numP, numN) represent the numbers of positive
(binding residues) and negative (non-binding residues) samples,
respectively. | |||||||||||||||||||||||||||||||||||||
Dataset format: |
|||||||||||||||||||||||||||||||||||||
The training dataset and the
testing dataset for each type of ligands were contained in the
training and the testing directories, respectively. | |||||||||||||||||||||||||||||||||||||