Improving accuracy of protein contact prediction using balanced network deconvolution



Introduction


Residue contact map is essential to protein three dimensional structure determination. But most of the current contact prediction methods based on residue co-evolution suffer from high false-positives as introduced by indirect and transitive contacts (i.e. Residues A-B and B-C are in contact but A-C are not). Built on the work by Feizi et al (2013) which demonstrated a general network model to distinguish direct dependencies by network deconvolution, we present a new balanced network deconvolution algorithm to identify optimized dependency matrix without limit on the eigenvalue range in the applied network systems. The algorithm was used to filter contact predictions of five widely-used co-evolution methods. On the test of proteins from three benchmark datasets of CASP9, CASP10 and PSICOV database experiments, the BND can improve the medium- and long-range contact predictions at the L/5 cutoff by 55.59% to 47.68%, respectively, without additional CPU cost. The improvement is statistically significant with a p-value < 5.93×10-3 in the student t-test. A further comparison with the ab initio structure predictions in CASPs showed that the usefulness of the current co-evolution based contact prediction to the three dimensional structure modeling relies on the number of homologous sequences existing in the sequence databases. BND can be used as a general contact refinement method.


Fig. 1. The flow chart of experiments. Top L/2 predictions are drew for the T0525 protein in CASP 9: Green dots are benchmark contacts in the protein; Red dots are right predictions; Blue dots are wrong predictions.



BND online server

The BND online server provides a query-driven service: we accept an online submitted raw matrix, and our calculator will be triggered to provide the immediate contact map optimization process.


  Please upload raw residue contact map (Example):  
  Email address:  
    

Code and datasets

Code
Datasets
  • The CASP 9 and 10 datasets can be found here.
  • The PSICOV datasets can be found here.
  • The Gene regulatory networks can be found here.
  • The Co-authorship networks can be found here.

Reference

Hai-Ping Sun, Yan Huang, Xiao-Fan Wang, Yang Zhang, and Hong-Bin Shen, Improving accuracy of protein contact prediction using balanced network deconvolution, PROTEINS: Structure, Function, and Bioinformatics, 2015, 83: 485-496.