MemBrain Tutorial
1. Introduction
   MemBrain is a web server developed for transmembrane protein structure prediction. To date, it contains two main prediction functions, i.e., transmembrane helix (TMH) prediction and TMH-TMH residue contact prediction.

Figure 1. Flowchart of TMH prediction in MemBrain.

1.1 TMH prediction

   Prediction of TMHs in alpha-helical membrane proteins provides valuable information about the protein topology when the high resolution structures are not available. To improve the accuracy of TMH detection, we developed a machine learning-based predictor, which integrates a number of modern bioinformatics approaches including sequence representation by multiple sequence alignment matrix, the optimized evidence-theoretic k-nearest neighbor prediction algorithm, fusion of multiple prediction window sizes, and classification by dynamic threshold. The result demonstrates the improvement of predicting the ends of TMHs and TMHs that are shorter than 15 residues. It also has the capability to detect N-terminal signal peptides. Figure 1 illustrates the flowchart of TMH prediction.

1.2 TMH-TMH residue contact prediction

   Prediction of TMH-TMH residue contacts can provide crucial constraints for accurately constructing 3D structures of membrane proteins. For TMH-TMH residue contact prediction, TMH locations are derived from TMH prediction embeded in MemBrain. Recently, we developed a novel TMH-TMH contact map predictor for membrane proteins directly from the primary sequence, which is a new component of MemBrain protocol. It was constructed from the combination of statistical machine learning algorithms and biological evolution analysis from multiple sequence alignments as shown in Figure 2. The machine learning-based prediction engine was trained by applying multiple algorithms on multiple random under-samplings so that strong diversities can be generated via different learning methods in various spaces. The biological evolution analysis from multiple sequence alignments was done by PSICOV algorithm. MemBrain is a useful sequence-based analysis tool for functional and structural characterization of helical membrane proteins.

Figure 2. Flowchart of TMH-TMH residue contact prediction in MemBrain.

1.3 TMH-TMH residue contact prediction (new model)

   The new MemBrain is a hierarchical two-stage residue contact predictor. For the first stage, it is a conventional two-hidden-layer perceptron. 1084-dimensional sequence-based features are fed into this neural network with 150 units for each of the two hidden layers. The single output indicates the contact potential of given residue pair. The second stage is the fusion of three powerful CNNs, which have one, two and three convolution layers respectively. On the top of each CNN, a fully connected layer with 150 hidden units is used to predict the final contact probability. Figure 3 illustrates the flow chart of MemBrain protocol.

Figure 3. Flowchart of TMH-TMH residue contact prediction in new MemBrain.

1.4 rASA prediction

   Prediction of rASA in alpha-helical transmembrane proteins provides the relative positions of the residues which is helpful to 3D structure prediction. To improve the performance of rASA prediction, We present a novel sequence-based method (MemBrain-Rasa) to predict relative solvent accessibility surface area from primary sequence. The MemBrain-Rasa features by a newly developed segment structural similarity-based prediction engine, which is further combined with the machine learning engine. We locally constructed a comprehensive database of residue relative solvent accessibility surface area, which is used to be searched for segments that are expected to be structural similar to the segments on the query sequence. The segment structural similarity-based prediction is then fused with the support vector regression outputs using a designed knowledge rule.

2. Inputs
(A). Protein sequence
   Input the protein sequence into the input box without ID, the sequence should contain more than 30 aa but not include invalid character, as shown by clicking Example hyperlink.
(B). Prediction function
   If you select "TMH prediction", MemBrain will only predict TMHs (default).
   If you select "TMH-TMH residue contact prediction", MemBrain will predict TMHs firstly, and then predict TMH-TMH residue contacts.
   If you select "Rasa prediction", MemBrain will predict real-value relative accessible surface area (rASA) for each amino acid.
(C). N-terminal signal peptide information
   MemBrain has the capability to detect N-terminal signal peptides.
   If you select "I know there is NO N-terminal signal peptide", MemBrain will not detect the N-terminal signal peptide (default).
   If you select "I do NOT know whether there is signal peptide in the N-terminal or not", MemBrain will apply "Signal-3L" predictor to automatically identify the N-terminal signal peptide. You must select the species of the query sequence below in this case.
   Note: N-terminal signal peptide information does not affect the prediction of TMH-TMH residue contacts.
(D). Email address
   You should input your email address to receive an email notification of your prediction results.
3. Outputs
Case 1. TMH prediction
   The detailed text description of prediction results will be sent to your emaill, include N-terminal signal peptide information, predicted TMHs, and TMH propensities for each residue.
   The picture description of TMH propensities for each residue as shown in Figure 4.



Figure 4. Output of MemBrain for TMH prediction

Case 2. TMH-TMH residue contact prediction
   Other than the outputs for TMH prediction described above, MemBrain also outputs detailed prediction results for TMH-TMH residue contact prediction, include predicted TMHs, predicted contact map and detailed information. [Example]
Case 3. Rasa prediction
   The detailed text description of rASA predictions will be sent to your emaill. Most of the rASA values lie in the range [0, 100]. If rASA is greater than 100%, probably because the residue is next to a chain break or there have some unusual bond angles, bond lengths and distorted geometry in real proteins. If rASA is lower than 0, probably because the machine learning engine generates predictions lower than 0.