• 1 Introduction
  • 2 Input Settings
  • 3 Annotation Summary for Query Regions
    • 3.1 Distribution of query regions across gene features
    • 3.2 Interactive table of genes that overlap query regions
    • 3.3 Distribution of query regions in the genome grouped by gene types
    • 3.4 Coverage profile of query regions at/around Transcription Start/End Sites
    • 3.5 Coverage profile of query regions at Exon - Intron Boundaries
    • 3.6 Coverage profile of query regions across the length of different gene features
  • 4 motifRG analysis results
    • 4.1 Top motifs discovered using motifRG
    • 4.2 motifRG motif discovery statistics
  • 5 GO Term Analysis Results
    • 5.1 Biological Processes
    • 5.2 Molecular Functions
    • 5.3 Cellular Compartments
  • 6 Gene Set Enrichment Analysis Results
  • 7 Acknowledgements
  • 8 Session Information

1 Introduction

RCAS is an automated system that provides dynamic genome annotations for custom input files that contain transcriptomic regions. Such transcriptomic regions could be, for instance, peak regions detected by CLIP-Seq analysis that detect protein-RNA interactions, RNA modifications (alias the epitranscriptome), CAGE-tag locations, or any other collection of target regions at the level of the transcriptome.

RCAS is designed as a reporting tool for the functional analysis of RNA-binding sites detected by high-throughput experiments. It takes as input a BED format file containing the genomic coordinates of the RNA binding sites and a GTF file that contains the genomic annotation features usually provided by publicly available databases such as Ensembl and UCSC. RCAS performs overlap operations between the genomic coordinates of the RNA binding sites and the genomic annotation features and produces in-depth annotation summaries such as the distribution of binding sites with respect to gene features (exons, introns, 5’/3’ UTR regions, exon-intron boundaries, promoter regions, and whole transcripts). Moreover, by detecting the collection of targeted transcripts, RCAS can carry out functional annotation tables for enriched gene sets (annotated by the Molecular Signatures Database) and GO terms. As one of the most important questions that arise during protein-RNA interaction analysis; RCAS has a module for detecting sequence motifs enriched in the targeted regions of the transcriptome. The final report of RCAS consists of high-quality dynamic figures and tables, which are readily applicable for publications or other academic usage.

2 Input Settings

3 Annotation Summary for Query Regions

3.1 Distribution of query regions across gene features

Figure 1 : The number of query regions that overlap different kinds of gene features are counted. The ‘y’ axis denotes the types of gene features included in the analysis and the ‘x’ axis denotes the percentage of query regions (out of total number of query regions denoted with ‘n’) that overlap at least one genomic interval that host the corresponding feature. Notice that the sum of the percentage values for different features don’t add up to 100%, because some query regions may overlap multiple kinds of features

cdsexonsfiveUTRsintronspromotersthreeUTRstranscripts020406080
featurespercentage of query regions, n = 10000cdsexonsfiveUTRsintronspromotersthreeUTRstranscripts

3.2 Interactive table of genes that overlap query regions

Table 1 : Interactive table of top 100 genes that overlap query regions, grouped by gene features such as introns, exons, UTRs, etc.

tx_nametranscriptsexonspromotersfiveUTRsintronscdsthreeUTRs
tx_name
transcripts
exons
promoters
fiveUTRs
introns
cds
threeUTRs
12 39
0 12
0 9
0 18
0 39
0 12
0 2
1ENST00000342788390003900
2ENST00000402597390003900
3ENST00000436443390003900
4ENST00000260943380003800
5ENST00000484594380003800
6ENST00000347735260002601
7ENST00000355446260002600
8ENST00000360905260002600
9ENST00000397203260002600
10ENST00000440394260002601
12 39
0 12
0 9
0 18
0 39
0 12
0 2
1
2
3
4
5
6
7
8
9
10
Showing 1 to 10 of 100 entries

3.3 Distribution of query regions in the genome grouped by gene types

Figure 2 : The number of query regions that overlap different kinds of gene types are counted. The ‘x’ axis denotes the types of genes included in the analysis and the ‘y’ axis denotes the percentage of query regions (out of total number of query regions denoted with ‘n’) that overlap at least one genomic interval that host the corresponding gene type. If the query regions don’t overlap any known genes, they are classified as ‘Unknown’.

antisenselincRNAmiRNAmisc_RNAMt_rRNAMt_tRNAprocessed_transcriptprotein_codingpseudogenerRNAsense_intronicsense_overlappingsnoRNAsnRNA020406080
percentage of query regions, n = 10000antisenselincRNAmiRNAmisc_RNAMt_rRNAMt_tRNAprocessed_transcriptprotein_codingpseudogenerRNAsense_intronicsense_overlappingsnoRNAsnRNA

3.4 Coverage profile of query regions at/around Transcription Start/End Sites

Figure 3 : The depth of coverage of query regions at and around Transcription Start/End Sites

−1000−500050000.00050.0010.00150.0020.0025−1000−5000500
Distance (bp) to 5' boundaryDistance (bp) to 3' boundaryMean Coverage Scoretranscripts 5' end coveragetranscripts 5' standard error (95% conf. int.)transcripts 3' end coveragetranscripts 3' standard error (95% conf. int.)

3.5 Coverage profile of query regions at Exon - Intron Boundaries

Figure 4 : The depth of coverage of query regions at exon - intron junctions

−1000−500050000.00050.0010.0015−1000−5000500
Distance (bp) to 5' boundaryDistance (bp) to 3' boundaryMean Coverage ScoreInternal Exons 5' end coverageInternal Exons 5' standard error (95% conf. int.)Internal Exons 3' end coverageInternal Exons 3' standard error (95% conf. int.)

3.6 Coverage profile of query regions across the length of different gene features

Figure 5 : The query regions are overlaid with the genomic coordinates of features. Each entry corresponding to a feature is divided into 100 bins of equal length and for each bin the number of query regions that cover the corresponding bin is counted. Features shorter than 100bp are excluded. Thus, a coverage profile is obtained based on the distribution of the query regions. Mean coverage score for each bin is represented with ribbons where the thickness of the ribbon indicates the 95% confidence interval (mean +- standard error of the mean x 1.96). The strandedness of the features are taken into account. The coverage profile is plotted in the 5’ to 3’ direction.

2040608010000.0010.0020.0030.0040.0050.0060.0070.0080.009
binsmeanCoveragetranscriptsexonspromotersfiveUTRsintronscdsthreeUTRs

4 motifRG analysis results

4.1 Top motifs discovered using motifRG

Figure 6 : Top motifs discovered in the sequences of the query regions

Motif 1 : Consensus: ACTAAC

Motif 2 : Consensus: ATTAAC

Motif 3 : Consensus: ACTAAT

Motif 4 : Consensus: TTTAAC

4.2 motifRG motif discovery statistics

Table 2 : motifRG motif discovery statistics. fg: foreground; bg: background; hits: number of motif hits; seq: number of sequences with motifs; frac: fraction of sequences that contain the motif compared to the all sequences; ratio: ratio of foreground motif fraction versus background motif fraction

patternsscoresfgHitsbgHitsfgSeqbgSeqratiofgFracbgFrac
patterns
scores
fgHits
bgHits
fgSeq
bgSeq
ratio
fgFrac
bgFrac
23.3 30.5
1210 1938
144 204
1145 1811
138 199
5.9 12.3
0.1145 0.1811
0.0138 0.0199
1ACTAAC28.81702144162013811.80.1620.0138
2ATTAAC30.51938158181115312.30.18110.0153
3ACTAAT26.2149616213681569.20.13680.0156
4TTTAAC23.3121020411451995.90.11450.0199
23.3 30.5
1210 1938
144 204
1145 1811
138 199
5.9 12.3
0.1145 0.1811
0.0138 0.0199
1
2
3
4
Showing 1 to 4 of 4 entries

5 GO Term Analysis Results

5.1 Biological Processes

Table 3 : Significant Biological Process GO terms (FDR < 0.1) enriched for genes that overlap query regions

TermSignificantExpectedbonferronibhfoldEnrichment
Term
Significant
Expected
bonferroni
bh
foldEnrichment
9 3374
4.47 3261.56
0.000000000000000 1.000000000000000
0.000000000000000 0.099129365079366
1.03 3.31
GO:0007064mitotic sister chromatid cohesion175.140.000929250.000004383254716981133.31
GO:0031440regulation of mRNA 3'-end processing134.470.2345250.0006681623931623932.91
GO:0072698protein localization to microtubule cyto...186.260.01017750.00003960116731517512.88
GO:0044380protein localization to cytoskeleton196.930.01548750.00005800561797752812.74
GO:0032878regulation of establishment or maintenan...134.920.929250.002186470588235292.64
GO:0033119negative regulation of RNA splicing145.370.663750.001634852216748772.61
GO:0034047regulation of protein phosphatase type 2...145.370.663750.001634852216748772.61
GO:0035195gene silencing by miRNA3212.30.00004380752.59215976331361e-72.6
GO:0000729DNA double-strand break processing124.710.00527002053388092.55
GO:0043555regulation of translation in response to...124.710.00527002053388092.55
9 3374
4.47 3261.56
0.000000000000000 1.000000000000000
0.000000000000000 0.099129365079366
1.03 3.31
GO:0007064
GO:0031440
GO:0072698
GO:0044380
GO:0032878
GO:0033119
GO:0034047
GO:0035195
GO:0000729
GO:0043555
Showing 1 to 10 of 945 entries

5.2 Molecular Functions

Table 4 : Significant Molecular Function GO terms (FDR < 0.1) enriched for genes that overlap query regions

TermSignificantExpectedbonferronibhfoldEnrichment
Term
Significant
Expected
bonferroni
bh
foldEnrichment
9 3163
4.51 2912.47
0.000000000000000 1.000000000000000
0.000000000000000 0.098518325581396
1.09 2.85
GO:0005487nucleocytoplasmic transporter activity186.320.00207360.00002658461538461542.85
GO:0008601protein phosphatase type 2A regulator ac...134.740.092160.00092162.74
GO:0070273phosphatidylinositol-4-phosphate binding124.740.483840.003842.53
GO:0051721protein phosphatase 2A binding156.090.15360.001409174311926612.46
GO:0004709MAP kinase kinase kinase activity124.970.852480.006132949640287772.41
GO:0003730mRNA 3'-UTR binding2811.740.000698880.000009573698630136992.39
GO:0000993RNA polymerase II core binding114.7410.01340331210191082.32
GO:0061631ubiquitin conjugating enzyme activity146.090.645120.004850526315789472.3
GO:0036002pre-mRNA binding135.640.952320.006673655172413792.3
GO:0003725double-stranded RNA binding3213.990.00042240.000006034285714285712.29
9 3163
4.51 2912.47
0.000000000000000 1.000000000000000
0.000000000000000 0.098518325581396
1.09 2.85
GO:0005487
GO:0008601
GO:0070273
GO:0051721
GO:0004709
GO:0003730
GO:0000993
GO:0061631
GO:0036002
GO:0003725
Showing 1 to 10 of 215 entries

5.3 Cellular Compartments

Table 5 : Significant Cellular Compartment GO terms (FDR < 0.1) enriched for genes that overlap query regions

TermSignificantExpectedbonferronibhfoldEnrichment
Term
Significant
Expected
bonferroni
bh
foldEnrichment
9 3610
4.34 3420.04
0.000000000000000 1.000000000000000
0.000000000000000 0.086020000000000
1.06 3.02
GO:0010494cytoplasmic stress granule237.620.0000154562.91622641509434e-73.02
GO:0000159protein phosphatase type 2A complex134.350.0215280.0002050285714285712.99
GO:0005844polysome218.490.0066240.00007362.47
GO:0090568nuclear transcriptional repressor comple...166.530.08280.0007137931034482762.45
GO:0008287protein serine/threonine phosphatase com...2510.230.00126960.00001670526315789472.44
GO:1903293phosphatase complex2510.230.00126960.00001670526315789472.44
GO:0090544BAF-type complex1250.723120.005202302158273382.4
GO:0005871kinesin complex2811.970.00099360.00001361095890410962.34
GO:1990752microtubule end114.7910.0117615686274512.3
GO:0031519PcG protein complex219.360.0430560.0004023925233644862.24
9 3610
4.34 3420.04
0.000000000000000 1.000000000000000
0.000000000000000 0.086020000000000
1.06 3.02
GO:0010494
GO:0000159
GO:0005844
GO:0090568
GO:0008287
GO:1903293
GO:0090544
GO:0005871
GO:1990752
GO:0031519
Showing 1 to 10 of 204 entries

6 Gene Set Enrichment Analysis Results

Table 6 : Significant MSigDB Gene Sets (FDR < 0.1) enriched for genes that overlap query regions

treatmentexpectedInTreatmentBHbonferronifoldEnrichment
treatment
expectedInTreatment
BH
bonferroni
foldEnrichment
7 235
1.7 163.8
0.000000000005855 0.097067733115737
0.000000000005855 1.000000000000000
1.36 4.12
REACTOME G2 M DNA DAMAGE CHECKPOINT71.70.044110973693347914.12
BIOCARTA SRCRPTP PATHWAY820.039700245022028614
REACTOME E2F ENABLED INHIBITION OF PRE REPLICATION COMPLEX FORMATION71.80.056550537542648613.89
BIOCARTA EIF2 PATHWAY720.066395456454276413.5
BIOCARTA PTC1 PATHWAY720.066395456454276413.5
REACTOME SOS MEDIATED SIGNALLING92.60.040922587589833613.46
BIOCARTA G2 PATHWAY154.40.008309105046465790.5982555633455373.41
BIOCARTA RB PATHWAY82.40.057456521724338913.33
BIOCARTA TEL PATHWAY113.30.027441138626620413.33
PID ATM PATHWAY216.30.001528755149908540.05282777233745783.33
7 235
1.7 163.8
0.000000000005855 0.097067733115737
0.000000000005855 1.000000000000000
1.36 4.12
REACTOME G2 M DNA DAMAGE CHECKPOINT
BIOCARTA SRCRPTP PATHWAY
REACTOME E2F ENABLED INHIBITION OF PRE REPLICATION COMPLEX FORMATION
BIOCARTA EIF2 PATHWAY
BIOCARTA PTC1 PATHWAY
REACTOME SOS MEDIATED SIGNALLING
BIOCARTA G2 PATHWAY
BIOCARTA RB PATHWAY
BIOCARTA TEL PATHWAY
PID ATM PATHWAY
Showing 1 to 10 of 320 entries

7 Acknowledgements

RCAS is developed in the group of Altuna Akalin (head of the Scientific Bioinformatics Platform) by Bora Uyar (Bioinformatics Scientist), Dilmurat Yusuf (Bioinformatics Scientist) and Ricardo Wurmus (System Administrator) at the Berlin Institute of Medical Systems Biology (BIMSB) at the Max-Delbrueck-Center for Molecular Medicine (MDC) in Berlin.

RCAS is developed as a bioinformatics service as part of the RNA Bioinformatics Center, which is one of the eight centers of the German Network for Bioinformatics Infrastructure (de.NBI).

8 Session Information

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
##  [1] grid      stats4    parallel  methods   stats     graphics  grDevices
##  [8] utils     datasets  base     
## 
## other attached packages:
##  [1] org.Hs.eg.db_3.4.0                RCAS_1.1.1                       
##  [3] motifRG_1.18.0                    BSgenome.Hsapiens.UCSC.hg19_1.4.0
##  [5] BSgenome_1.42.0                   rtracklayer_1.34.1               
##  [7] GenomicRanges_1.26.1              GenomeInfoDb_1.10.1              
##  [9] seqLogo_1.40.0                    Biostrings_2.42.0                
## [11] XVector_0.14.0                    topGO_2.26.0                     
## [13] SparseM_1.74                      GO.db_3.4.0                      
## [15] AnnotationDbi_1.36.0              IRanges_2.8.1                    
## [17] S4Vectors_0.12.0                  Biobase_2.34.0                   
## [19] graph_1.52.0                      BiocGenerics_0.20.0              
## [21] data.table_1.9.8                  DT_0.2                           
## [23] plotly_4.5.6                      ggplot2_2.2.0                    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8                lattice_0.20-34           
##  [3] tidyr_0.6.0                Rsamtools_1.26.1          
##  [5] assertthat_0.1             rprojroot_1.1             
##  [7] digest_0.6.10              gridBase_0.4-7            
##  [9] R6_2.2.0                   plyr_1.8.4                
## [11] backports_1.0.4            RSQLite_1.0.0             
## [13] evaluate_0.10              httr_1.2.1                
## [15] zlibbioc_1.20.0            GenomicFeatures_1.26.0    
## [17] lazyeval_0.2.0             Matrix_1.2-7.1            
## [19] rmarkdown_1.2              BiocParallel_1.8.1        
## [21] readr_1.0.0                stringr_1.1.0             
## [23] htmlwidgets_0.8            RCurl_1.95-4.8            
## [25] biomaRt_2.30.0             munsell_0.4.3             
## [27] base64enc_0.1-3            htmltools_0.3.5           
## [29] SummarizedExperiment_1.4.0 tibble_1.2                
## [31] matrixStats_0.51.0         XML_3.98-1.5              
## [33] viridisLite_0.1.3          dplyr_0.5.0               
## [35] GenomicAlignments_1.10.0   bitops_1.0-6              
## [37] jsonlite_1.1               gtable_0.2.0              
## [39] DBI_0.5-1                  magrittr_1.5              
## [41] scales_0.4.1               KernSmooth_2.23-15        
## [43] stringi_1.1.2              impute_1.48.0             
## [45] reshape2_1.4.2             RColorBrewer_1.1-2        
## [47] tools_3.3.2                seqPattern_1.6.0          
## [49] purrr_0.2.2                yaml_2.1.14               
## [51] plotrix_3.6-3              colorspace_1.3-1          
## [53] genomation_1.6.0           knitr_1.15.1