For each treatment vector specified in the settings file, the ratio of methylation between the control (i.e. the first entry), and the treatment (i.e. the second entry) is calculated across the genome. Logistic regression is then applied to model the log-odds probability of observing this ratio, by chance, in any given location (p-value) as well as the probability of observing it somewhere within the genome (q-value). The resulting probabilities are then tabulated below. After q-value calculation, differentially methylated bases are extracted based on q-value and percent methylation difference cutoffs. Here we select bases that have q-value < 0.01 and percent methylation difference larger than 25%. Futhermore, we calculate hyper-methylated or hypo-methylated bases. Overdispersion occurs when there is more variability in the data than assumed by the distribution and is here included in the differentially methylatation calculation. For more details about calculateDiffMeth()
and getMethylDiff()
functions see (Akalin et al. 2012) and for details about the logistic regression and overdispersion see (Wreczycka et al. 2017).
Input files were taken from the source directory: /scratch/AG_Akalin/bosberg/pigx_usecase/in and were compared based on parameters described in the following table. Here each sample is denoted with a sample.id that consists of the original sample ID appended with suffixes denoting operations that were carried out in the course of the pipeline (e.g. sorting, deduplication, alignment using bowtie-2 (bt2), etc.)
Sample.id | Treatment | Assembly | Qvalue | Min.meth.difference |
---|---|---|---|---|
WT_se_bt2.sorted.deduped | 0 | mm10 | 0.05 | 25 |
tet2_se_bt2.sorted.deduped | 1 | mm10 | 0.05 | 25 |
Output files:
Format | Location |
---|---|
BEDfile | [out]/09_differential_methylation/0_1.deduped__diffmeth.bed |
RDSfile (diff. methyl. Cs) | [out]/09_differential_methylation/0_1.deduped__diffmeth.RDS |
RDSfile_hyper (hyper-methyl. Cs) | [out]/09_differential_methylation/0_1.deduped__diffmethhyper.RDS |
RDSfile_hypo (hypo-methyl. Cs) | [out]/09_differential_methylation/0_1.deduped__diffmethhypo.RDS |
Format | Location |
---|---|
Input RDS Data (diff. methyl. Cs) | [out]/09_differential_methylation/0_1.deduped__diffmeth.RDS |
Input RDS Data (hyper-methyl. Cs) | [out]/09_differential_methylation/0_1.deduped__diffmethhyper.RDS |
Input RDS Data (hypo-methyl. Cs) | [out]/09_differential_methylation/0_1.deduped__diffmethhypo.RDS |
Assembly | mm10 |
A summary of these findings is presented below:
Chromosome | Number of diff. meth. cytosines |
---|---|
chr1 | 2680 |
chr2 | 351 |
chr3 | 161 |
chr4 | 238 |
chr5 | 225 |
chr6 | 727 |
chr7 | 182 |
chr8 | 169 |
chr9 | 706 |
chr10 | 171 |
chr11 | 391 |
chr12 | 930 |
chr13 | 402 |
chr14 | 265 |
chr15 | 307 |
chr16 | 310 |
chr17 | 736 |
chr18 | 155 |
chr19 | 135 |
chrX | 17 |
chrY | 1 |
Chromosome | Number of hypermethylated meth. cytosines |
---|---|
chr1 | 124 |
chr2 | 197 |
chr3 | 90 |
chr4 | 94 |
chr5 | 118 |
chr6 | 329 |
chr7 | 98 |
chr8 | 123 |
chr9 | 83 |
chr10 | 92 |
chr11 | 263 |
chr12 | 73 |
chr13 | 119 |
chr14 | 89 |
chr15 | 259 |
chr16 | 58 |
chr17 | 98 |
chr18 | 73 |
chr19 | 50 |
chrX | 4 |
chrY | 1 |
Chromosome | Number of hypomethylated meth. cytosines |
---|---|
chr1 | 2556 |
chr2 | 154 |
chr3 | 71 |
chr4 | 144 |
chr5 | 107 |
chr6 | 398 |
chr7 | 84 |
chr8 | 46 |
chr9 | 623 |
chr10 | 79 |
chr11 | 128 |
chr12 | 857 |
chr13 | 283 |
chr14 | 176 |
chr15 | 48 |
chr16 | 252 |
chr17 | 638 |
chr18 | 82 |
chr19 | 85 |
chrX | 13 |
## ### Distribution of differential methylation
Below is a histogram of (statistically significant) differential CpG methylation, alongside CpG sites without statistically significant differences in methylation –each are normalized independently, as the latter are generally far more numerous than the former).
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-unknown-linux-gnu (64-bit)
## Running under: CentOS Linux 7 (Core)
##
## Matrix products: default
## BLAS/LAPACK: /gnu/store/ccad09zgj85251ksp5xd71ds3cz3f7gp-openblas-0.2.20/lib/libopenblasp-r0.2.20.so
##
## locale:
## [1] en_US.UTF-8
##
## attached base packages:
## [1] grid parallel stats4 stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] ggbio_1.28.0 AnnotationHub_2.12.0 ggplot2_2.2.1
## [4] rtracklayer_1.40.3 genomation_1.12.0 jsonlite_1.5
## [7] DT_0.4 methylKit_1.6.1 GenomicRanges_1.32.3
## [10] GenomeInfoDb_1.16.0 IRanges_2.14.10 S4Vectors_0.18.3
## [13] BiocGenerics_0.26.0
##
## loaded via a namespace (and not attached):
## [1] colorspace_1.3-2 mclust_5.4
## [3] rprojroot_1.3-2 biovizBase_1.28.0
## [5] qvalue_2.12.0 htmlTable_1.12
## [7] XVector_0.20.0 base64enc_0.1-3
## [9] dichromat_2.0-0 rstudioapi_0.7
## [11] bit64_0.9-7 interactiveDisplayBase_1.18.0
## [13] AnnotationDbi_1.42.1 splines_3.5.0
## [15] R.methodsS3_1.7.1 impute_1.54.0
## [17] knitr_1.20 Formula_1.2-3
## [19] seqPattern_1.12.0 Rsamtools_1.32.0
## [21] gridBase_0.4-7 cluster_2.0.7-1
## [23] R.oo_1.22.0 graph_1.58.0
## [25] shiny_1.1.0 readr_1.1.1
## [27] compiler_3.5.0 httr_1.3.1
## [29] backports_1.1.2 assertthat_0.2.0
## [31] Matrix_1.2-14 lazyeval_0.2.1
## [33] limma_3.36.1 later_0.7.3
## [35] prettyunits_1.0.2 acepack_1.4.1
## [37] htmltools_0.3.6 tools_3.5.0
## [39] coda_0.19-1 gtable_0.2.0
## [41] GenomeInfoDbData_0.99.1 reshape2_1.4.3
## [43] Rcpp_0.12.17 bbmle_1.0.20
## [45] Biobase_2.40.0 Biostrings_2.48.0
## [47] stringr_1.3.1 fastseg_1.26.0
## [49] mime_0.5 ensembldb_2.4.1
## [51] gtools_3.5.0 XML_3.98-1.11
## [53] zlibbioc_1.26.0 MASS_7.3-50
## [55] scales_0.5.0 BSgenome_1.48.0
## [57] VariantAnnotation_1.26.0 BiocInstaller_1.30.0
## [59] ProtGenerics_1.12.0 hms_0.4.2
## [61] promises_1.0.1 RBGL_1.56.0
## [63] SummarizedExperiment_1.10.1 AnnotationFilter_1.4.0
## [65] RColorBrewer_1.1-2 curl_3.2
## [67] yaml_2.1.19 memoise_1.1.0
## [69] gridExtra_2.3 emdbook_1.3.10
## [71] biomaRt_2.36.1 rpart_4.1-13
## [73] reshape_0.8.7 latticeExtra_0.6-28
## [75] stringi_1.2.3 RSQLite_2.1.1
## [77] highr_0.7 plotrix_3.7-2
## [79] checkmate_1.8.5 GenomicFeatures_1.32.0
## [81] BiocParallel_1.14.1 rlang_0.2.1
## [83] pkgconfig_2.0.1 matrixStats_0.53.1
## [85] bitops_1.0-6 evaluate_0.10.1
## [87] lattice_0.20-35 labeling_0.3
## [89] GenomicAlignments_1.16.0 htmlwidgets_1.2
## [91] bit_1.1-14 GGally_1.4.0
## [93] plyr_1.8.4 magrittr_1.5
## [95] R6_2.2.2 Hmisc_4.1-1
## [97] DelayedArray_0.6.1 DBI_1.0.0
## [99] pillar_1.2.3 foreign_0.8-70
## [101] survival_2.42-3 RCurl_1.95-0.1.2
## [103] nnet_7.3-12 tibble_1.4.2
## [105] crayon_1.3.4 KernSmooth_2.23-15
## [107] OrganismDbi_1.22.0 rmarkdown_1.10
## [109] progress_1.2.0 data.table_1.11.4
## [111] blob_1.1.1 digest_0.6.15
## [113] xtable_1.8-2 httpuv_1.4.4.1
## [115] numDeriv_2016.8-1 R.utils_2.6.0
## [117] munsell_0.5.0
Akalin, Altuna, Matthias Kormaksson, Sheng Li, Francine E. Garrett-Bakelman, Maria E. Figueroa, Ari Melnick, and Christopher E Mason. 2012. “MethylKit: A Comprehensive R Package for the Analysis of Genome-Wide Dna Methylation Profiles.” Genome Biology 13 (10):R87.
Wreczycka, Katarzyna, Alexander Gosdschan, Dilmurat Yusuf, Bjoern Gruening, Yassen Assenov, and Altuna Akalin. 2017. “Strategies for Analyzing Bisulfite Sequencing Data.” bioRxiv. Cold Spring Harbor Labs Journals. https://doi.org/10.1101/109512.