lncLocator 2.0: an end-to-end lncRNA subcellular localization predictor based on deep learning



Introduction

Long non-coding RNAs (lncRNAs) are expressed in a tissue-specific way, subcellular localizaitons of lncRNAs depend on the tissues or cell lines that they are expressed. Previous computational methods for predicting subcellular localizations of lncRNAs do not take this characteristic into account, they train a unified machine learning model for pooled lncRNAs from all available cell lines. It is of importance to develop a cell-line-specific computational method to predict lncRNA locations in different cell lines.
In this study, we present an updated cell-line-specific predictor lncLocator 2.0, which trains a deep model per cell line, for predicting lncRNA subcellular localization from sequences. We first construct benchmark datasets of lncRNA subcellular localizations for 15 cell lines. Then we learn word embeddings using natural language models, and these learned embeddings are fed into convolutional neural network, long short-term memory and multilayer perceptron to classify subcellular localizations. lncLocator 2.0 achieves varying effectiveness for different cell lines and demonstrates the necessity of training cell-line-specific models. Furthermore, we adopt Integrated Gradients to explain the proposed model in lncLocator 2.0, and find some potential patterns that determine the subcellular localizations of lncRNAs, suggesting that the subcellular localization of lncRNAs is linked to some specific nucleotides.


Availability: The lncLocator 2.0 is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator2.




Figure 1. The flowchart of the proposed lncLocator 2.0  


The output consists of three parts: 1) the predicted CNRCI by lncLocator 2.0; 2) the heatmap generated by Integrated Gradients; 3) the sequence logo generated by Integrated Gradients.

© 2017 Computational Systems Biology/Shen Group.