Differential Methylation Report

Calling Differentially Methylated Cytosines

For each treatment vector specified in the settings file, the ratio of methylation between the control (i.e. the first entry), and the treatment (i.e. the second entry) is calculated across the genome. Logistic regression is then applied to model the log-odds probability of observing this ratio, by chance, in any given location (p-value) as well as the probability of observing it somewhere within the genome (q-value). The resulting probabilities are then tabulated below. After q-value calculation, differentially methylated bases are extracted based on q-value and percent methylation difference cutoffs. Here we select bases that have q-value < 0.01 and percent methylation difference larger than 25%. Futhermore, we calculate hyper-methylated or hypo-methylated bases. Overdispersion occurs when there is more variability in the data than assumed by the distribution and is here included in the differentially methylatation calculation. For more details about calculateDiffMeth() and getMethylDiff() functions see (Akalin et al. 2012) and for details about the logistic regression and overdispersion see (Wreczycka et al. 2017).

Input files were taken from the source directory: /scratch/AG_Akalin/bosberg/pigx_usecase/in and were compared based on parameters described in the following table. Here each sample is denoted with a sample.id that consists of the original sample ID appended with suffixes denoting operations that were carried out in the course of the pipeline (e.g. sorting, deduplication, alignment using bowtie-2 (bt2), etc.)

Sample.id Treatment Assembly Qvalue Min.meth.difference
WT_se_bt2.sorted.deduped 0 mm10 0.05 25
tet2_se_bt2.sorted.deduped 1 mm10 0.05 25

Output files:

Format Location
BEDfile [out]/09_differential_methylation/0_1.deduped__diffmeth.bed
RDSfile (diff. methyl. Cs) [out]/09_differential_methylation/0_1.deduped__diffmeth.RDS
RDSfile_hyper (hyper-methyl. Cs) [out]/09_differential_methylation/0_1.deduped__diffmethhyper.RDS
RDSfile_hypo (hypo-methyl. Cs) [out]/09_differential_methylation/0_1.deduped__diffmethhypo.RDS

Export Differentially Methylated Cytosines

We export differentially-methylated CpG sites to a BED file; it can be loaded into a genome browser such as IGV or UCSC to allow for further analysis, annotation and visualisation.

Annotation of Differentially Methylated Cytosines

Format Location
Input RDS Data (diff. methyl. Cs) [out]/09_differential_methylation/0_1.deduped__diffmeth.RDS
Input RDS Data (hyper-methyl. Cs) [out]/09_differential_methylation/0_1.deduped__diffmethhyper.RDS
Input RDS Data (hypo-methyl. Cs) [out]/09_differential_methylation/0_1.deduped__diffmethhypo.RDS
Assembly mm10

A summary of these findings is presented below:

Differentially Methylated Cytosines per Chromosome

Chromosome Number of diff. meth. cytosines
chr1 2680
chr2 351
chr3 161
chr4 238
chr5 225
chr6 727
chr7 182
chr8 169
chr9 706
chr10 171
chr11 391
chr12 930
chr13 402
chr14 265
chr15 307
chr16 310
chr17 736
chr18 155
chr19 135
chrX 17
chrY 1
Chromosome Number of hypermethylated meth. cytosines
chr1 124
chr2 197
chr3 90
chr4 94
chr5 118
chr6 329
chr7 98
chr8 123
chr9 83
chr10 92
chr11 263
chr12 73
chr13 119
chr14 89
chr15 259
chr16 58
chr17 98
chr18 73
chr19 50
chrX 4
chrY 1
Chromosome Number of hypomethylated meth. cytosines
chr1 2556
chr2 154
chr3 71
chr4 144
chr5 107
chr6 398
chr7 84
chr8 46
chr9 623
chr10 79
chr11 128
chr12 857
chr13 283
chr14 176
chr15 48
chr16 252
chr17 638
chr18 82
chr19 85
chrX 13
## ### Distribution of differential methylation

Overview of Hyper- and Hypo-Methylated CpGs Over the Genome

Below is a histogram of (statistically significant) differential CpG methylation, alongside CpG sites without statistically significant differences in methylation –each are normalized independently, as the latter are generally far more numerous than the former).

Session Information

## R version 3.5.0 (2018-04-23)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
## 
## Matrix products: default
## BLAS/LAPACK: /gnu/store/ccad09zgj85251ksp5xd71ds3cz3f7gp-openblas-0.2.20/lib/libopenblasp-r0.2.20.so
## 
## locale:
## [1] en_US.UTF-8
## 
## attached base packages:
##  [1] grid      parallel  stats4    stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] ggbio_1.28.0         AnnotationHub_2.12.0 ggplot2_2.2.1       
##  [4] rtracklayer_1.40.3   genomation_1.12.0    jsonlite_1.5        
##  [7] DT_0.4               methylKit_1.6.1      GenomicRanges_1.32.3
## [10] GenomeInfoDb_1.16.0  IRanges_2.14.10      S4Vectors_0.18.3    
## [13] BiocGenerics_0.26.0 
## 
## loaded via a namespace (and not attached):
##   [1] colorspace_1.3-2              mclust_5.4                   
##   [3] rprojroot_1.3-2               biovizBase_1.28.0            
##   [5] qvalue_2.12.0                 htmlTable_1.12               
##   [7] XVector_0.20.0                base64enc_0.1-3              
##   [9] dichromat_2.0-0               rstudioapi_0.7               
##  [11] bit64_0.9-7                   interactiveDisplayBase_1.18.0
##  [13] AnnotationDbi_1.42.1          splines_3.5.0                
##  [15] R.methodsS3_1.7.1             impute_1.54.0                
##  [17] knitr_1.20                    Formula_1.2-3                
##  [19] seqPattern_1.12.0             Rsamtools_1.32.0             
##  [21] gridBase_0.4-7                cluster_2.0.7-1              
##  [23] R.oo_1.22.0                   graph_1.58.0                 
##  [25] shiny_1.1.0                   readr_1.1.1                  
##  [27] compiler_3.5.0                httr_1.3.1                   
##  [29] backports_1.1.2               assertthat_0.2.0             
##  [31] Matrix_1.2-14                 lazyeval_0.2.1               
##  [33] limma_3.36.1                  later_0.7.3                  
##  [35] prettyunits_1.0.2             acepack_1.4.1                
##  [37] htmltools_0.3.6               tools_3.5.0                  
##  [39] coda_0.19-1                   gtable_0.2.0                 
##  [41] GenomeInfoDbData_0.99.1       reshape2_1.4.3               
##  [43] Rcpp_0.12.17                  bbmle_1.0.20                 
##  [45] Biobase_2.40.0                Biostrings_2.48.0            
##  [47] stringr_1.3.1                 fastseg_1.26.0               
##  [49] mime_0.5                      ensembldb_2.4.1              
##  [51] gtools_3.5.0                  XML_3.98-1.11                
##  [53] zlibbioc_1.26.0               MASS_7.3-50                  
##  [55] scales_0.5.0                  BSgenome_1.48.0              
##  [57] VariantAnnotation_1.26.0      BiocInstaller_1.30.0         
##  [59] ProtGenerics_1.12.0           hms_0.4.2                    
##  [61] promises_1.0.1                RBGL_1.56.0                  
##  [63] SummarizedExperiment_1.10.1   AnnotationFilter_1.4.0       
##  [65] RColorBrewer_1.1-2            curl_3.2                     
##  [67] yaml_2.1.19                   memoise_1.1.0                
##  [69] gridExtra_2.3                 emdbook_1.3.10               
##  [71] biomaRt_2.36.1                rpart_4.1-13                 
##  [73] reshape_0.8.7                 latticeExtra_0.6-28          
##  [75] stringi_1.2.3                 RSQLite_2.1.1                
##  [77] highr_0.7                     plotrix_3.7-2                
##  [79] checkmate_1.8.5               GenomicFeatures_1.32.0       
##  [81] BiocParallel_1.14.1           rlang_0.2.1                  
##  [83] pkgconfig_2.0.1               matrixStats_0.53.1           
##  [85] bitops_1.0-6                  evaluate_0.10.1              
##  [87] lattice_0.20-35               labeling_0.3                 
##  [89] GenomicAlignments_1.16.0      htmlwidgets_1.2              
##  [91] bit_1.1-14                    GGally_1.4.0                 
##  [93] plyr_1.8.4                    magrittr_1.5                 
##  [95] R6_2.2.2                      Hmisc_4.1-1                  
##  [97] DelayedArray_0.6.1            DBI_1.0.0                    
##  [99] pillar_1.2.3                  foreign_0.8-70               
## [101] survival_2.42-3               RCurl_1.95-0.1.2             
## [103] nnet_7.3-12                   tibble_1.4.2                 
## [105] crayon_1.3.4                  KernSmooth_2.23-15           
## [107] OrganismDbi_1.22.0            rmarkdown_1.10               
## [109] progress_1.2.0                data.table_1.11.4            
## [111] blob_1.1.1                    digest_0.6.15                
## [113] xtable_1.8-2                  httpuv_1.4.4.1               
## [115] numDeriv_2016.8-1             R.utils_2.6.0                
## [117] munsell_0.5.0

References

Akalin, Altuna, Matthias Kormaksson, Sheng Li, Francine E. Garrett-Bakelman, Maria E. Figueroa, Ari Melnick, and Christopher E Mason. 2012. “MethylKit: A Comprehensive R Package for the Analysis of Genome-Wide Dna Methylation Profiles.” Genome Biology 13 (10):R87.

Wreczycka, Katarzyna, Alexander Gosdschan, Dilmurat Yusuf, Bjoern Gruening, Yassen Assenov, and Altuna Akalin. 2017. “Strategies for Analyzing Bisulfite Sequencing Data.” bioRxiv. Cold Spring Harbor Labs Journals. https://doi.org/10.1101/109512.