Similarity

Motivation

Benefiting from high-throughput experimental technologies, whole-genome analysis of microRNAs (miRNAs) has been more and more common to uncover important regulatory roles of miRNAs and identify miRNA biomarkers for disease diagnosis. As a complementary information to the high-throughput experimental data, domain knowledge like the Gene Ontology and KEGG pathway is usually used to guide gene function analysis. However, functional annotation for miRNAs is scarce in the public databases. Till now, only a few methods have been proposed for measuring functional similarity between miRNAs based on public annotation data, and these methods cover very limited number of miRNAs, which are not applicable to large-scale miRNA analysis.

Results

In this paper, we propose a new method to measure the functional similarity for miRNAs, called miRGOFS, which has two notable features: I) it adopts a new GO semantic similarity metric which considers both common ancestors and descendants of GO terms; II) it computes similarity between GO sets in an asymmetric manner, and weights each GO term by its statistical significance. The miRGOFS-based predictor achieves an F1 of 61.1% on a benchmark data set of miRNA localization, and AUC values of 87.7% and 81.1% on two benchmark sets of miRNA-disease association, respectively. Compared with the existing functional similarity measurements of miRNAs, miRGOFS has the advantages of higher accuracy and larger coverage of human miRNAs (over 1000 miRNAs).

Availability

http://www.csbio.sjtu.edu.cn/bioinf/MiRGOFS/

Code and Datasets

Click here to download the introduction of the datasets.

Click here to download the dataset for microRNA subcellular localization prediction.

Click here to download the similarity scores of miRGOFS (WED).

Click here to download the normalized similarity scores of miRGODS (WED).

Source Code：https://github.com/yangy09/MiRGOFS

Dataset Details

1. The benchmark dataset of microRNA subcellular localization prediction (dataset.csv)

There are a total of 813 micorRNAs covering 6 subcellular locations, i.e., exosome, cytoplasm, mitochondrion, microvesicle, circulating and nucleus. The binary codes (1/0) denote whether the miRNA has the corresponding label showing in the column header.

2. The similarity scores of miRGOFS (WED)

The pairwise similarity scores for 2588 human miRNAs obtained by the miRGOFS_WED method, which measure the functional correlation between miRNAs. The scores are calculated by by intergrating the semantc similarities of GO terms annotated for the target genes.

3. The range of the similarity score

In the similarity.csv file, all scores are in the range [1.454, 16.188], and in the normalized.csv file, all scores are scaled into the range [0, 1].

4. The normalization strategy of the similarity score

The normalized pairwise similarity scores for 2588 human miRNAs are obtained by the following equation:

x_norm = min(S_{i, j} / S_{i, i}, 1.0).

where S_{i,j} represents the similarity score of the ith and jth miRNA.

Usage

Input: A human microRNA name starting with 'hsa-miR' or 'hsa-let'.

(note: We use the names/identifiers in miRBase. If the inputted microRNA name is not found, the suggested aliases will be returned.)

If you have multiple queries, please use semicolon (';') as the separator, e.g., hsa-miR-132-3p;hsa-miR-134-5p;hsa-miR-136-3p.

Output: 20 nearest neighbors of the query microRNAs and the similarity scores calculated by the miRGOFS method

Read Me

MiRGOFS: A GO-based functional similarity measure for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA-disease association