Knowledge of protein subcellular locations can provide valuable information for inferring its functions and even we have already known a protein¡¯s functions,
it is still critical to find out where the protein functions in the cell. Computational prediction of the protein subcellular locations has attracted much
attention in recent years and much progress has been obtained. However, one of the most challenging problems in this literature is that many proteins were found
simultaneously existing at, or moving between, two or more different subcellular locations and such multiplex proteins could be extremely important in the future
drug discovery for their specific characters. Approximately 20% human proteins are found to be such multiplex proteins and in order to effectively predict the human
protein subcellular locations by incorporating samples with multiple sites, a novel multi-label (ML) learning and prediction framework called ML-PLoc is proposed in this study,
which decomposes the multi-label human protein subcellular location prediction problem into multiple independent binary classification problems.
ML-PLoc was constructed based on the SVM (support vector machine) and the sequential evolution information. Experimental results show that ML-PLoc can achieve an overall accuracy
64.6% and recall ratio 67.2% on a benchmark dataset consisting of 14 human subcellular locations and is very powerful for dealing with the multiplex proteins. Please click
here to download the stand-alone ML-PLoc software package.
|