• 1 Introduction
  • 2 Input Settings
  • 3 Annotation Summary for Query Regions
    • 3.1 Distribution of query regions across gene features
    • 3.2 Interactive table of genes that overlap query regions
    • 3.3 Distribution of query regions in the genome grouped by gene types
    • 3.4 Coverage profile of query regions at/around Transcription Start/End Sites
    • 3.5 Coverage profile of query regions at Exon - Intron Boundaries
    • 3.6 Coverage profile of query regions across the length of different gene features
  • 4 motifRG analysis results
    • 4.1 Top motifs discovered using motifRG
    • 4.2 motifRG motif discovery statistics
  • 5 GO Term Analysis Results
    • 5.1 Biological Processes
    • 5.2 Molecular Functions
    • 5.3 Cellular Compartments
  • 6 Gene Set Enrichment Analysis Results
  • 7 Acknowledgements
  • 8 Session Information

1 Introduction

RCAS is an automated system that provides dynamic genome annotations for custom input files that contain transcriptomic regions. Such transcriptomic regions could be, for instance, peak regions detected by CLIP-Seq analysis that detect protein-RNA interactions, RNA modifications (alias the epitranscriptome), CAGE-tag locations, or any other collection of target regions at the level of the transcriptome.

RCAS is designed as a reporting tool for the functional analysis of RNA-binding sites detected by high-throughput experiments. It takes as input a BED format file containing the genomic coordinates of the RNA binding sites and a GTF file that contains the genomic annotation features usually provided by publicly available databases such as Ensembl and UCSC. RCAS performs overlap operations between the genomic coordinates of the RNA binding sites and the genomic annotation features and produces in-depth annotation summaries such as the distribution of binding sites with respect to gene features (exons, introns, 5’/3’ UTR regions, exon-intron boundaries, promoter regions, and whole transcripts). Moreover, by detecting the collection of targeted transcripts, RCAS can carry out functional annotation tables for enriched gene sets (annotated by the Molecular Signatures Database) and GO terms. As one of the most important questions that arise during protein-RNA interaction analysis; RCAS has a module for detecting sequence motifs enriched in the targeted regions of the transcriptome. The final report of RCAS consists of high-quality dynamic figures and tables, which are readily applicable for publications or other academic usage.

2 Input Settings

3 Annotation Summary for Query Regions

3.1 Distribution of query regions across gene features

Figure 1 : The number of query regions that overlap different kinds of gene features are counted. The ‘y’ axis denotes the types of gene features included in the analysis and the ‘x’ axis denotes the percentage of query regions (out of total number of query regions denoted with ‘n’) that overlap at least one genomic interval that host the corresponding feature. Notice that the sum of the percentage values for different features don’t add up to 100%, because some query regions may overlap multiple kinds of features

cdsexonsfiveUTRsintronspromotersthreeUTRstranscripts020406080100
featurespercentage of query regions, n = 10000cdsexonsfiveUTRsintronspromotersthreeUTRstranscripts

3.2 Interactive table of genes that overlap query regions

Table 1 : Interactive table of top 100 genes that overlap query regions, grouped by gene features such as introns, exons, UTRs, etc.

tx_nametranscriptsexonspromotersfiveUTRsintronscdsthreeUTRs
tx_name
transcripts
exons
promoters
fiveUTRs
introns
cds
threeUTRs
8 17
0 17
0 2
0 7
0 11
0 17
0 12
1ENST000002628541717003161
2ENST000003421601717003161
3ENST000003979101717000170
4ENST000004270521616003151
5ENST000003280901514022140
6ENST00000397786141200284
7ENST000004068751414000212
8ENST00000429829141300100
9ENST0000039480013904463
10ENST0000036105912804562
8 17
0 17
0 2
0 7
0 11
0 17
0 12
1
2
3
4
5
6
7
8
9
10
Showing 1 to 10 of 100 entries

3.3 Distribution of query regions in the genome grouped by gene types

Figure 2 : The number of query regions that overlap different kinds of gene types are counted. The ‘x’ axis denotes the types of genes included in the analysis and the ‘y’ axis denotes the percentage of query regions (out of total number of query regions denoted with ‘n’) that overlap at least one genomic interval that host the corresponding gene type. If the query regions don’t overlap any known genes, they are classified as ‘Unknown’.

3prime_overlapping_ncrnaantisenselincRNAmiRNAmisc_RNAMt_rRNAMt_tRNApolymorphic_pseudogeneprocessed_transcriptprotein_codingpseudogenerRNAsense_intronicsense_overlappingsnoRNAsnRNA020406080
percentage of query regions, n = 100003prime_overlapping_ncrnaantisenselincRNAmiRNAmisc_RNAMt_rRNAMt_tRNApolymorphic_pseudogeneprocessed_transcriptprotein_codingpseudogenerRNAsense_intronicsense_overlappingsnoRNAsnRNA

3.4 Coverage profile of query regions at/around Transcription Start/End Sites

Figure 3 : The depth of coverage of query regions at and around Transcription Start/End Sites

−1000−500050000.0050.010.015−1000−5000500
Distance (bp) to 5' boundaryDistance (bp) to 3' boundaryMean Coverage Scoretranscripts 5' end coveragetranscripts 5' standard error (95% conf. int.)transcripts 3' end coveragetranscripts 3' standard error (95% conf. int.)

3.5 Coverage profile of query regions at Exon - Intron Boundaries

Figure 4 : The depth of coverage of query regions at exon - intron junctions

−1000−500050000.0020.0040.0060.0080.01−1000−5000500
Distance (bp) to 5' boundaryDistance (bp) to 3' boundaryMean Coverage ScoreInternal Exons 5' end coverageInternal Exons 5' standard error (95% conf. int.)Internal Exons 3' end coverageInternal Exons 3' standard error (95% conf. int.)

3.6 Coverage profile of query regions across the length of different gene features

Figure 5 : The query regions are overlaid with the genomic coordinates of features. Each entry corresponding to a feature is divided into 100 bins of equal length and for each bin the number of query regions that cover the corresponding bin is counted. Features shorter than 100bp are excluded. Thus, a coverage profile is obtained based on the distribution of the query regions. Mean coverage score for each bin is represented with ribbons where the thickness of the ribbon indicates the 95% confidence interval (mean +- standard error of the mean x 1.96). The strandedness of the features are taken into account. The coverage profile is plotted in the 5’ to 3’ direction.

2040608010000.0050.010.0150.020.0250.030.035
binsmeanCoveragetranscriptsexonspromotersfiveUTRsintronscdsthreeUTRs

4 motifRG analysis results

4.1 Top motifs discovered using motifRG

Figure 6 : Top motifs discovered in the sequences of the query regions

Motif 1 : Consensus: AACTCA

Motif 2 : Consensus: CTTCAT

Motif 3 : Consensus: TCAACA

Motif 4 : Consensus: CACATC

4.2 motifRG motif discovery statistics

Table 2 : motifRG motif discovery statistics. fg: foreground; bg: background; hits: number of motif hits; seq: number of sequences with motifs; frac: fraction of sequences that contain the motif compared to the all sequences; ratio: ratio of foreground motif fraction versus background motif fraction

patternsscoresfgHitsbgHitsfgSeqbgSeqratiofgFracbgFrac
patterns
scores
fgHits
bgHits
fgSeq
bgSeq
ratio
fgFrac
bgFrac
8.4 9.4
581 945
319 564
555 878
308 532
1.7 1.8
0.0555 0.0878
0.0308 0.0532
1AACTCA8.47344407034251.70.07030.0425
2CTTCAT9.49455648785321.70.08780.0532
3TCAACA9.17073946793781.80.06790.0378
4CACATC8.45813195553081.80.05550.0308
8.4 9.4
581 945
319 564
555 878
308 532
1.7 1.8
0.0555 0.0878
0.0308 0.0532
1
2
3
4
Showing 1 to 4 of 4 entries

5 GO Term Analysis Results

5.1 Biological Processes

Table 3 : Significant Biological Process GO terms (FDR < 0.1) enriched for genes that overlap query regions

TermSignificantExpectedbonferronibhfoldEnrichment
Term
Significant
Expected
bonferroni
bh
foldEnrichment
10 3808
5.01 3658.37
0.000000000000000 1.000000000000000
0.000000000000000 0.099332828719724
1.04 3.07
GO:0000083regulation of transcription involved in ...206.520.000185855.88132911392405e-73.07
GO:0033962cytoplasmic mRNA processing body assembl...155.020.01725750.00004168478260869572.99
GO:0006409tRNA export from nucleus237.780.0000575251.90480132450331e-72.96
GO:0071431tRNA-containing ribonucleoprotein comple...237.780.0000575251.90480132450331e-72.96
GO:0051031tRNA transport248.280.00005311.78187919463087e-72.9
GO:0007077mitotic nuclear envelope disassembly3110.795.31e-72.2987012987013e-92.87
GO:0030397membrane disassembly3311.541.6815e-77.57432432432432e-102.86
GO:0051081nuclear envelope disassembly3311.541.6815e-77.57432432432432e-102.86
GO:0042762regulation of sulfur metabolic process145.020.132750.0002742768595041322.79
GO:0007064mitotic sister chromatid cohesion165.770.04026750.00009069256756756762.77
10 3808
5.01 3658.37
0.000000000000000 1.000000000000000
0.000000000000000 0.099332828719724
1.04 3.07
GO:0000083
GO:0033962
GO:0006409
GO:0071431
GO:0051031
GO:0007077
GO:0030397
GO:0051081
GO:0042762
GO:0007064
Showing 1 to 10 of 1,156 entries

5.2 Molecular Functions

Table 4 : Significant Molecular Function GO terms (FDR < 0.1) enriched for genes that overlap query regions

TermSignificantExpectedbonferronibhfoldEnrichment
Term
Significant
Expected
bonferroni
bh
foldEnrichment
10 3550
5.03 3248.37
0.000000000000000 1.000000000000000
0.000000000000000 0.099775189873418
1.06 2.88
GO:0008536Ran GTPase binding217.30.000092160.000001645714285714292.88
GO:0061650ubiquitin-like protein conjugating enzym...197.050.00184320.00002393766233766232.7
GO:0061631ubiquitin conjugating enzyme activity186.80.00506880.00006106987951807232.65
GO:0005487nucleocytoplasmic transporter activity187.050.0107520.0001181538461538462.55
GO:0051721protein phosphatase 2A binding176.80.0284160.0002813465346534652.5
GO:0001098basal transcription machinery binding187.30.0215040.0002194285714285712.47
GO:0001099basal RNA polymerase II transcription ma...187.30.0215040.0002194285714285712.47
GO:0004402histone acetyltransferase activity3413.850.0000072961.58608695652174e-72.45
GO:0008139nuclear localization sequence binding166.550.0714240.0006493090909090912.44
GO:0046966thyroid hormone receptor binding166.550.0714240.0006493090909090912.44
10 3550
5.03 3248.37
0.000000000000000 1.000000000000000
0.000000000000000 0.099775189873418
1.06 2.88
GO:0008536
GO:0061650
GO:0061631
GO:0005487
GO:0051721
GO:0001098
GO:0001099
GO:0004402
GO:0008139
GO:0046966
Showing 1 to 10 of 237 entries

5.3 Cellular Compartments

Table 5 : Significant Cellular Compartment GO terms (FDR < 0.1) enriched for genes that overlap query regions

TermSignificantExpectedbonferronibhfoldEnrichment
Term
Significant
Expected
bonferroni
bh
foldEnrichment
9 4070
4.88 3843.14
0.000000000000000 1.000000000000000
0.000000000000000 0.096298125000000
1.06 3.04
GO:0010494cytoplasmic stress granule268.564.14e-76.36923076923077e-93.04
GO:1904115axon cytoplasm227.820.00008289.51724137931035e-72.81
GO:0000123histone acetyltransferase complex5420.546.072e-121.44571428571429e-132.63
GO:0030137COPI-coated vesicle145.380.060720.0004246153846153852.6
GO:0031248protein acetyltransferase complex5723.239.384e-111.99659574468085e-122.45
GO:1902493acetyltransferase complex5723.239.384e-111.99659574468085e-122.45
GO:1902562H4 histone acetyltransferase complex249.780.00099360.000009646601941747572.45
GO:0070461SAGA-type complex177.090.0491280.0003612352941176472.4
GO:0005643nuclear pore4519.313.5328e-75.52e-92.33
GO:0032838cell projection cytoplasm2912.470.000408480.0000042552.33
9 4070
4.88 3843.14
0.000000000000000 1.000000000000000
0.000000000000000 0.096298125000000
1.06 3.04
GO:0010494
GO:1904115
GO:0000123
GO:0030137
GO:0031248
GO:1902493
GO:1902562
GO:0070461
GO:0005643
GO:0032838
Showing 1 to 10 of 256 entries

6 Gene Set Enrichment Analysis Results

Table 6 : Significant MSigDB Gene Sets (FDR < 0.1) enriched for genes that overlap query regions

treatmentexpectedInTreatmentBHbonferronifoldEnrichment
treatment
expectedInTreatment
BH
bonferroni
foldEnrichment
5 282
1.3 168.9
0.000000000000000 0.099414993411634
0.000000000000000 1.000000000000000
1.22 5.00
REACTOME DOWNREGULATION OF SMAD2 3 SMAD4 TRANSCRIPTIONAL ACTIVITY183.60.00006754581257901220.004795752693109865
BIOCARTA AKAP95 PATHWAY112.30.002658349972415480.470527945117544.78
BIOCARTA RANMS PATHWAY91.90.0073755851511004314.74
REACTOME PURINE RIBONUCLEOSIDE MONOPHOSPHATE BIOSYNTHESIS92.10.010249778998938914.29
REACTOME TRANSCRIPTIONAL ACTIVITY OF SMAD2 SMAD3 SMAD4 HETEROTRIMER296.80.000001740201893609340.00006264726816993624.26
REACTOME NEP NS2 INTERACTS WITH THE CELLULAR EXPORT MACHINERY215.10.00008372189930980710.006446586246855154.12
REACTOME TRANSPORT OF RIBONUCLEOPROTEINS INTO THE HOST NUCLEUS215.10.00008372189930980710.006446586246855154.12
REACTOME INTEGRATION OF PROVIRUS61.50.046088184317600614
BIOCARTA VDR PATHWAY92.30.013860534391059213.91
REACTOME DEADENYLATION OF MRNA143.60.002197427671435220.3647729934582463.89
5 282
1.3 168.9
0.000000000000000 0.099414993411634
0.000000000000000 1.000000000000000
1.22 5.00
REACTOME DOWNREGULATION OF SMAD2 3 SMAD4 TRANSCRIPTIONAL ACTIVITY
BIOCARTA AKAP95 PATHWAY
BIOCARTA RANMS PATHWAY
REACTOME PURINE RIBONUCLEOSIDE MONOPHOSPHATE BIOSYNTHESIS
REACTOME TRANSCRIPTIONAL ACTIVITY OF SMAD2 SMAD3 SMAD4 HETEROTRIMER
REACTOME NEP NS2 INTERACTS WITH THE CELLULAR EXPORT MACHINERY
REACTOME TRANSPORT OF RIBONUCLEOPROTEINS INTO THE HOST NUCLEUS
REACTOME INTEGRATION OF PROVIRUS
BIOCARTA VDR PATHWAY
REACTOME DEADENYLATION OF MRNA
Showing 1 to 10 of 586 entries

7 Acknowledgements

RCAS is developed in the group of Altuna Akalin (head of the Scientific Bioinformatics Platform) by Bora Uyar (Bioinformatics Scientist), Dilmurat Yusuf (Bioinformatics Scientist) and Ricardo Wurmus (System Administrator) at the Berlin Institute of Medical Systems Biology (BIMSB) at the Max-Delbrueck-Center for Molecular Medicine (MDC) in Berlin.

RCAS is developed as a bioinformatics service as part of the RNA Bioinformatics Center, which is one of the eight centers of the German Network for Bioinformatics Infrastructure (de.NBI).

8 Session Information

## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
##  [1] grid      stats4    parallel  methods   stats     graphics  grDevices
##  [8] utils     datasets  base     
## 
## other attached packages:
##  [1] org.Hs.eg.db_3.4.0                RCAS_1.1.1                       
##  [3] motifRG_1.18.0                    BSgenome.Hsapiens.UCSC.hg19_1.4.0
##  [5] BSgenome_1.42.0                   rtracklayer_1.34.1               
##  [7] GenomicRanges_1.26.1              GenomeInfoDb_1.10.1              
##  [9] seqLogo_1.40.0                    Biostrings_2.42.0                
## [11] XVector_0.14.0                    topGO_2.26.0                     
## [13] SparseM_1.74                      GO.db_3.4.0                      
## [15] AnnotationDbi_1.36.0              IRanges_2.8.1                    
## [17] S4Vectors_0.12.0                  Biobase_2.34.0                   
## [19] graph_1.52.0                      BiocGenerics_0.20.0              
## [21] data.table_1.9.8                  DT_0.2                           
## [23] plotly_4.5.6                      ggplot2_2.2.0                    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.8                lattice_0.20-34           
##  [3] tidyr_0.6.0                Rsamtools_1.26.1          
##  [5] assertthat_0.1             rprojroot_1.1             
##  [7] digest_0.6.10              gridBase_0.4-7            
##  [9] R6_2.2.0                   plyr_1.8.4                
## [11] backports_1.0.4            RSQLite_1.0.0             
## [13] evaluate_0.10              httr_1.2.1                
## [15] zlibbioc_1.20.0            GenomicFeatures_1.26.0    
## [17] lazyeval_0.2.0             Matrix_1.2-7.1            
## [19] rmarkdown_1.2              BiocParallel_1.8.1        
## [21] readr_1.0.0                stringr_1.1.0             
## [23] htmlwidgets_0.8            RCurl_1.95-4.8            
## [25] biomaRt_2.30.0             munsell_0.4.3             
## [27] base64enc_0.1-3            htmltools_0.3.5           
## [29] SummarizedExperiment_1.4.0 tibble_1.2                
## [31] matrixStats_0.51.0         XML_3.98-1.5              
## [33] viridisLite_0.1.3          dplyr_0.5.0               
## [35] GenomicAlignments_1.10.0   bitops_1.0-6              
## [37] jsonlite_1.1               gtable_0.2.0              
## [39] DBI_0.5-1                  magrittr_1.5              
## [41] scales_0.4.1               KernSmooth_2.23-15        
## [43] stringi_1.1.2              impute_1.48.0             
## [45] reshape2_1.4.2             RColorBrewer_1.1-2        
## [47] tools_3.3.2                seqPattern_1.6.0          
## [49] purrr_0.2.2                yaml_2.1.14               
## [51] plotrix_3.6-3              colorspace_1.3-1          
## [53] genomation_1.6.0           knitr_1.15.1