Improving accuracy of protein contact prediction using balanced network deconvolution
Introduction
Residue contact map is essential to protein three dimensional structure determination.
But most of the current contact prediction methods based on residue co-evolution suffer from
high false-positives as introduced by indirect and transitive contacts
(i.e. Residues A-B and B-C are in contact but A-C are not). Built on the work by Feizi et al (2013)
which demonstrated a general network model to distinguish direct dependencies by network deconvolution,
we present a new balanced network deconvolution algorithm to identify optimized dependency matrix
without limit on the eigenvalue range in the applied network systems. The algorithm was used to
filter contact predictions of five widely-used co-evolution methods. On the test of proteins from
three benchmark datasets of CASP9, CASP10 and PSICOV database experiments, the BND can improve
the medium- and long-range contact predictions at the L/5 cutoff by 55.59% to 47.68%,
respectively, without additional CPU cost. The improvement is statistically significant with a
p-value < 5.93×10-3 in the student t-test. A further comparison with the ab initio
structure predictions in CASPs showed that the usefulness of the current co-evolution based contact
prediction to the three dimensional structure modeling relies on the number of homologous sequences
existing in the sequence databases. BND can be used as a general contact refinement method.

Fig. 1. The flow chart of experiments. Top L/2 predictions are drew for the T0525 protein in CASP 9: Green dots are benchmark contacts in the protein; Red dots are right predictions; Blue dots are wrong predictions.
BND online server
The BND online server provides a query-driven service: we accept an online submitted raw matrix,
and our calculator will be triggered to provide the immediate contact map optimization process.
Code and datasets
Code
Datasets
- The CASP 9 and 10 datasets can be found here.
- The PSICOV datasets can be found here.
- The Gene regulatory networks can be found here.
- The Co-authorship networks can be found here.
Reference
Hai-Ping Sun, Yan Huang, Xiao-Fan Wang, Yang Zhang, and Hong-Bin Shen, Improving accuracy of protein contact prediction using balanced network deconvolution, PROTEINS: Structure, Function, and Bioinformatics, 2015, 83: 485-496.
|
|