Additional Information
We
have developed various online applications to advance the research for NLS universe. The NLS candidate library is
built by leveraging the promising exploration capabilities of NLSExplorer and includes not only known NLSs but also potential ones. The similarity
between sequences and structures is reflected in the neighborhood’s relationship of this map based on the cosine similarity of segment
embeddings. This helps build a comprehensive landscape for each prediction by automatically searching for the nearest neighbors and
provides customizable parameters to meet various usage requirements. NLSExplorer-SCNLS provides a powerful tool to highlight the core
amino acids of NLSs and discover discontinuous NLS patterns. It facilitates NLS template finding and uncovering novel types of NLSs
patterns. The Nuclear Transport pattern map, mined by the SCNLS algorithm, provides a reference for potential NLS patterns and other
key segments important for nuclear transport. The map helps analyze the potential NLS characteristic and uncover the evolution relationship
of NLSs among species. In addition, it offers the possibility of promoting advancements in various applications like targeted drug delivery,
novel treatments for nuclear-protein-related diseases, and the development of new nuclear proteins for biological research.
For a given protein segment, the significance of each signal peptide fragment varies depending on the function of interest. Let's first assume that an expert already possesses sufficient knowledge and understanding. When presented with a set of materials, the expert's gaze will naturally focus on areas of personal interest. Simultaneously, we can assume the presence of a recorder that logs and analyzes the frequency of these patterns, thereby reflecting the expert's attention distribution throughout the test.
Now, let’s consider a different scenario: the expert's gaze is directed according to specific requirements. For example, if the task is to determine whether a protein is localized within the nucleus, the expert's attention will shift to focus on nucleus-related information.
Our model operates under the assumption that language models possess a substantial amount of knowledge. In this context:
The knowledgeable individual is replaced by a language model.
The task presented to the language model is to identify nuclear localization proteins.
The tools used to record the patterns are A2KA and SCNLS.
If you use NLSExplorer, please cite the following paper:
Yi-Fan Li, Xiaoyong Pan, and Hong-Bin Shen* "Discovering nuclear localization signal universe through a novel deep learning model with interpretable attention units"
arXiv preprint https://doi.org/10.1101/2024.08.10.606103
The software is free to academic users ONLY; For commercial usage, please contact with us.
(Google Chrome, Safari, and Firefox are recommended for better experience)
Contact @ Hongbin Shen(hbshen@sjtu.edu.cn)