Benchmark datasets for RBP-binding linear RNAs
Click here (722M) to download the benchmark dataset for RBP binding linear RNAs of RBPSuite. This benchmark dataset consists of 154 RBPs and their binding sites (RNAs) are derived form ENCODE.
The steps of constructing benchmark dataset for linear RNAs from ENCODER:
We downloaded peaks of 154 RBPs of K526 and HepG2 through eCLIP-seq from ENCODE corresponding to human genome hg19 version. These narrow peaks were produced by the eCLIP-seq Processing Pipeline v2.0 of ENCODE. To prepare the positive and negative RBP binding training data sets, several steps were processed:
1) We merge the peaks files of one RBP.
2) We select regions overlapped with reference gene by intersectBed of bedtools.
3) The gene overlapped regions with less than 101bp were extended with downstream and upstream region of the same length, and we got the positive regions of RBPs.
4) Negative RBP binding regions were produced by implementing shuffleBed of bedtools, and all regions are with 101bp.
5) The fasta files of positive and negative regions were retrieved by fastaFromBed of bedtools.
6) For each RBP, we only keep 60,000 positive sites and 60,000 negative sites if the extracted positive and negative samples are more than 60,000, respectively. Otherwise we use all the extracted samples.
Benchmark datasets for RBP-binding circular RNAs
Click here to download the benchmark dataset for RBP binding circular RNAs of RBPSuite. This benchmark dataset consists of 37 RBPs and their binding sites are derived from CircInteractome
The steps for constructing benchmark dataset for RBP-binding circular RNAs from CircInteractome.
We download binding circRNAs associated with 37 RBPs from CircInteractome.
1) The bound sequences are extracted from the circRNA Interactome database (https://circinteractome.nia.nih.gov/), which houses over 120,000 human circRNAs.
2) Extract sequence segments spanning upstream 50 nt and downstream 50 nt around the binding sites corresponding to the read peaks. Thus each sample is a segment of length 101bp.
3) The negative samples are extracted from the remaining fragments of the circRNAs, with the same length as positive samples.
Benchmark datasets for RBP-binding linear RNAs in iDeepS
Click here to download the benchmark dataset for RBP binding RNAs from iDeepS. This benchmark dataset consists of 31 experiments and their binding sites are derived from DoRiNA (https://dorina.mdc-berlin.de/) and iCount((http://icount.biolab.si/)
Benchmark datasets for RBP-binding circular RNAs in DeCban
Click here to download the benchmark dataset for RBP-binding circRNAs for DeCban (https://pubmed.ncbi.nlm.nih.gov/33552144/)