Benchmark datasets for RBP-binding linear RNAs
Click here (722M) to download the benchmark dataset for RBP binding linear RNAs of RBPSuite. This benchmark dataset consists of 353 RBPs and their binding sites (RNAs) are derived from POSTAR3 database.
The steps of constructing benchmark dataset for linear RNAs:
We downloaded peaks of 154 RBPs of K526 and HepG2 corresponding to human genome hg38 version and 199 RBPs for other six species from POSTAR 3 database.
These narrow peaks were produced by the eCLIP-seq Processing Pipeline v2.0 of ENCODE for human, and for other six species, we directly download the processed peaks from POSTAR3 database. To prepare the positive and negative RBP binding training data sets, several steps were processed:
1) We merge the peaks files of one RBP.
2) We select regions overlapped with reference gene by intersectBed of bedtools.
3) The gene overlapped regions with less than 101bp were extended with downstream and upstream region of the same length, and we got the positive regions of RBPs.
4) Negative RBP binding regions were produced by implementing shuffleBed of bedtools, and all regions are with 101bp.
5) The fasta files of positive and negative regions were retrieved by fastaFromBed of bedtools.
6) For each RBP, we only keep 300,000 positive sites and 300,000 negative sites if the extracted positive and negative samples are more than 300,000, respectively. Otherwise we use all the extracted samples.
Benchmark datasets for RBP-binding circular RNAs
Click here to download the benchmark dataset for RBP binding circular RNAs of RBPSuite. This benchmark dataset consists of 37 RBPs and their binding sites are derived from CircInteractome.
The steps for constructing benchmark dataset for RBP-binding circular RNAs from CircInteractome.
We download binding circRNAs associated with 37 RBPs from CircInteractome.
1) The bound sequences are extracted from the circRNA Interactome database (https://circinteractome.nia.nih.gov/), which houses over 120,000 human circRNAs.
2) Extract sequence segments spanning upstream 50 nt and downstream 50 nt around the binding sites corresponding to the read peaks. Thus each sample is a segment of length 101bp.
3) The negative samples are extracted from the remaining fragments of the circRNAs, with the same length as positive samples.
Benchmark datasets for RBP-binding linear RNAs in iDeepS
Click here to download the benchmark dataset for RBP binding RNAs from iDeepS. This benchmark dataset consists of 31 experiments and their binding sites are derived from DoRiNA (https://dorina.mdc-berlin.de/) and iCount((http://icount.biolab.si/) used in iONMF paper.
Benchmark datasets for RBP-binding circular RNAs in DeCban
Click here to download the benchmark dataset for RBP-binding circRNAs for DeCban (https://pubmed.ncbi.nlm.nih.gov/33552144/).