A Brief Introduction of LabCaS

We present a novel method to predict calpain cleavage sites from the flanking sequences of substrates. This method employs Conditional Random Fields (CRFs), a sequential supervised machine learning technique, to identify the cleavage sites. The advantage of CRFs is that it is insensitive to the ratio between positive and negative so all the negative samples are used to establish model and avoid information loss. Two different fusion strategies are tested in this study, i.e., feature level fusion and decision level fusion. Experimental results have shown that the latter is a better choice.

The web server LabCaS (Labeling Calpain substrate cleavage Sites) was developed by integrating different feature informations and conditional random fields.

For a query protein sequence, LabCaS will discriminate whether its all amino acid residues are cleavage sites or not, and give the probabilities belonging to cleavage site.

For more information, refer to the original paper that has documented the predictor.
Caveat
To obtain the predicted result with the expected success rate, the query proteins should be in Fasta format as input, please reference the example. The following points are important when using LabCaS:


    1. LabCaS is focused on the prediction of calpain cleavage sites only.

    2. Your input sequence length should be at least more than 50 amino acid residues and no more than 6000 amino acid residues. The number of input sequences should be no more than 3 at a time.

    3. In the three ourput formats, "Short without Graphics" only outputs the predicted results at the high threshold; "Long without Graphics" can output the results at the high, middle and low thresholds without the corresponding graphics; "Short with Graphics" can output the results at the high threshold with the corresponding graphics.