ToxDL 2.0: Protein toxicity prediction based on pretrained language model with graph neural networks



Introduction

In this study, we present ToxDL 2.0, a novel multimodal deep learning model that integrates both evolutionary and structural data for protein toxicity prediction. ToxDL 2.0 model consists of three key componentsmodules: (1) a Graph Convolutional Network (GCN) module for generating protein graph embeddings, (2) a domain embedding module for capturing protein domain representations, and (3) a dense module that combines these embeddings to predict toxicity using a multilayer perceptron. We first construct a large toxicity benchmark dataset, and experiment results on both test and independent test sets demonstrate that ToxDL 2.0 outperforms existing state-of-the-art methods. Furthermore, we apply integrated gradient to discover known toxic motifs associated with protein toxicity.


Availability: The ToxDL 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/ToxDL2.




Figure 1. The flowchart of the proposed ToxDL 2.0  


If you are interested in our previous version of ToxDL, you can access it via http://www.csbio.sjtu.edu.cn/bioinf/ToxDL



© 2017 Computational Systems Biology/Shen Group.