SOMRuler: A Novel Interpretable Transmembrane Helices Predictor


Transmembrane helices (TMH) identification is one of the most important steps in membrane protein structure prediction. Existing TMH predictors tend to pursue accurate computational models without carefully considering the interpretability of these models and thus act as a black box. In this paper, a novel TMH predictor called SOMRuler with excellent interpretability while possessing high prediction accuracy is presented. The SOMRuler uses a self-organizing map (SOM) to learn helices distribution knowledge, which is encoded in the codebook vectors of the trained SOM, from the training samples. Human interpretable fuzzy rules are then extracted from the codebook vectors of the trained SOM. By extracting fuzzy rules from the learned knowledge rather than the original training samples, on the one hand, the computational burden of extracting fuzzy rules can be greatly reduced; on the other hand, the reliability of the extracted rules can also be enhanced since noise contained in the original samples can be smoothened by the learning procedure of SOM. The validity of the fuzzy rules extracted by SOMRuler is qualitatively and quantitatively analyzed. Experimental results on the benchmark dataset show that the SOMRuler outperforms most existing popular TMH predictors and is flexible to suite for a wide variety of problems in bioinformatics.

SOMRuler software package

Figure 1 shows the flowchart of SOMRuler and click here to download the whole software package, the benchmark dataset, and the mined fuzzy rules of SOMRuler.

Figure 1. Flowchart of SOMRuler.

Mined Fuzzy Rules

Figure 2 shows 4 mined fuzzy rules by SOMRuler for predicting TMHs and more results are availabe by clicking here.

Figure 2. Mined fuzzy rules by SOMRuler for TMH prediction.


MemBrain: Improving the Accuracy of Predicting Transmembrane Helices.


Dong-Jun Yu, Hong-Bin Shen, Jing-Yu Yang, SOMRuler: A Novel Interpretable Transmembrane Helices Predictor, IEEE Transactions on NanoBioscience, 2011, 10: 121-129.