We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
Skip main navigation
Aging Health
Bioelectronics in Medicine
Biomarkers in Medicine
Breast Cancer Management
CNS Oncology
Colorectal Cancer
Concussion
Epigenomics
Future Cardiology
Future Microbiology
Future Neurology
Future Oncology
Future Rare Diseases
Future Virology
Hepatic Oncology
HIV Therapy
Immunotherapy
International Journal of Endocrine Oncology
International Journal of Hematologic Oncology
Journal of 3D Printing in Medicine
Journal of Comparative Effectiveness Research
Lung Cancer Management
Melanoma Management
Nanomedicine
Neurodegenerative Disease Management
Pain Management
Pediatric Health
Personalized Medicine
Pharmacogenomics
Regenerative Medicine

Increased correlation between methylation sites in epigenome-wide replication studies: impact on analysis and results

    Maja Popovic

    *Author for correspondence: Tel.: +39 (0) 116334628;

    E-mail Address: maja_popovic@hotmail.com

    Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy

    ,
    Francesca Fasanelli

    Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy

    ,
    Valentina Fiano

    Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy

    ,
    Annibale Biggeri

    Department of Statistics, Computer Science, Applications «G. Parenti», University of Florence, Florence, Italy

    &
    Lorenzo Richiardi

    Department of Medical Sciences, University of Turin & CPO Piemonte, Turin, Italy

    Published Online:https://doi.org/10.2217/epi-2017-0073

    Aim: To show that an increased correlation between CpGs after selection through an epigenome-wide association studies (EWAS) might translate into biased replication results. Methods: Pairwise correlation coefficients between CpGs selected in two published EWAS, the top hits replication, Bonferroni p-values, Benjamini–Hochberg (BH) false discovery rate (FDR) and directional FDR r-values were calculated in the NINFEA cohort data. Exposures’ random permutations were performed to show the empirical p-value distributions. Results: The average pairwise correlation coefficients between CpGs were enhanced after selection for the replication (e.g., from 0.12 at genome-wide level to 0.26 among the selected CpGs), affecting the empirical p-value distributions and the usual multiple testing control. Conclusion: Bonferroni and Benjamini–Hochberg FDR are inappropriate for the EWAS replication phase, and methods that account for the underlying correlation need to be used.

    References

    • 1 Eckhardt F, Lewin J, Cortese R et al. DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38(12), 1378–1385 (1987).CrossrefGoogle Scholar
    • 2 Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J. Mol. Biol. 196(2), 261–282 (1987).Crossref, Medline, CASGoogle Scholar
    • 3 Ong ML, Holbrook JD. Novel region discovery method for Infinium 450 K DNA methylation data reveals changes associated with aging in muscle and neuronal pathways. Aging Cell 13(1), 142–155 (2014).Crossref, Medline, CASGoogle Scholar
    • 4 Jaffe AE, Murakami P, Lee H et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41(1), 200–209 (2012).Crossref, MedlineGoogle Scholar
    • 5 Sofer T, Schifano ED, Hoppin JA, Hou L, Baccarelli AA. A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure. Bioinformatics 29(22), 2884–2891 (2013).Crossref, Medline, CASGoogle Scholar
    • 6 Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).Crossref, MedlineGoogle Scholar
    • 7 Lin X, Barton S, Holbrook JD. How to make DNA methylome wide association studies more powerful. Epigenomics 8(8), 1117–1129 (2016).Link, CASGoogle Scholar
    • 8 Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 57(1), 289–300 (1995).Google Scholar
    • 9 Heller R, Bogomolov M, Benjamini Y. Deciding whether follow-up studies have replicated findings in a preliminary large-scale omics study. Proc. Natl Acad. Sci. USA 111(46), 16262–16267 (2014).Crossref, Medline, CASGoogle Scholar
    • 10 Sofer T, Heller R, Bogomolov M et al. A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. Genet. Epidemiol. 41, 251–258 (2017).Crossref, MedlineGoogle Scholar
    • 11 Joubert BR, Felix JF, Yousefi P et al. DNA Methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis. Am. J. Hum. Genet. 98(4), 680–696 (2016).Crossref, Medline, CASGoogle Scholar
    • 12 Yousefi P, Huen K, Davé V, Barcellos L, Eskenazi B, Holland N. Sex differences in DNA methylation assessed by 450 K BeadChip in newborns. BMC Genomics 16, 911 (2015).Crossref, MedlineGoogle Scholar
    • 13 Rahmani E, Zaitlen N, Baran Y et al. Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods 13(5), 443–445 (2016).Crossref, Medline, CASGoogle Scholar
    • 14 Richiardi L, Baussano I, Vizzini L et al. Feasibility of recruiting a birth cohort through the Internet: the experience of the NINFEA cohort. Eur. J. Epidemiol. 22, 831–837 (2007).Crossref, MedlineGoogle Scholar
    • 15 Du P, Zhang X, Huang C-C et al. Comparison of beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11, 587 (2010).Crossref, Medline, CASGoogle Scholar
    • 16 R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2017). www.R-project.org/.Google Scholar
    • 17 Fisher RA. Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika 10(4), 507–521 (1915).Google Scholar
    • 18 Zeileis A. Econometric computing with HC and HAC covariance matrix estimators. J. Stat. Softw. 11(10), 1–17 (2004).CrossrefGoogle Scholar
    • 19 Houseman EA, Molitor J, Marsit CJ. Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics 30(10), 1431–1439 (2014).Crossref, Medline, CASGoogle Scholar
    • 20 McGregor K, Bernatsky S, Colmegna I et al. An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies. Genome Biol. 17, 84 (2016).Crossref, MedlineGoogle Scholar
    • 21 Massey FJJ. The Kolmogorov–Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46, 68–78 (1951).CrossrefGoogle Scholar
    • 22 Razali NM, Wah YB. Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. Stat. Modeling Anal 2, 21–33 (2011).Google Scholar
    • 23 Anderson TW, Darling DA. A test of goodness of fit. J. Am. Stat. Assoc. 49(268), 765–769 (1954).CrossrefGoogle Scholar
    • 24 Stephens MA. EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69, 730–737 (1974).CrossrefGoogle Scholar
    • 25 Warnes GR, Bolker B, Lumley T. gtools: various R programming Tools (2015). https://CRAN.R-project.org/package=gtools.Google Scholar
    • 26 Komsta L, Novomestky F. Moments: moments, cumulants, skewness, kurtosis and related tests (2015). https://CRAN.R-project.org/package=moments.Google Scholar
    • 27 Revolution Analytics, Weston S. Foreach: provides foreach looping construct for R (2015). https://CRAN.R-project.org/package=foreach.Google Scholar
    • 28 Heller R, Bogomolov M, Benjamini Y. Deciding whether follow-up studies have replicated findings in a preliminary large-scale “omics’ study”. www.runmycode.org/companion/view/542.Google Scholar
    • 29 Lin MF, Lucas HC, Shmueli G. Too big to fail: large samples and the p-value problem. Inform. Syst. Res. 24, 906–917 (2013).CrossrefGoogle Scholar
    • 30 Morales E, Vilahur N, Salas LA et al. Genome-wide DNA methylation study in human placenta identifies novel loci associated with maternal smoking during pregnancy. Int. J. Epidemiol. 45(5), 1644–1655 (2016).Crossref, MedlineGoogle Scholar
    • 31 Gruzieva O, Xu CJ, Breton CV et al. Epigenome-wide meta-analysis of methylation in children related to prenatal NO2 air pollution exposure. Environ. Health Perspect. 125(1), 104–110 (2017).Crossref, Medline, CASGoogle Scholar
    • 32 Good P. Permutation Tests: A Practical Guide To Resampling Methods For Testing Hypotheses (2nd edition). Springer-Verlag, NY, USA (1994).CrossrefGoogle Scholar
    • 33 Conneely KN, Boehnke M. So many correlated tests, so little time! Rapid adjustment of P-values for multiple correlated tests. Am. J. Hum. Genet. 81(6), 1158–1168 (2007).Crossref, Medline, CASGoogle Scholar
    • 34 Nyholt DR. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74(4), 765–769 (2004).Crossref, Medline, CASGoogle Scholar
    • 35 Dudbridge F, Koeleman BPC. Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75(3), 424–435 (2004).Crossref, Medline, CASGoogle Scholar
    • 36 Yekutieli D, Benjamini Y. Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Stat. Plan. Infer. 82, 171–196 (1999).CrossrefGoogle Scholar
    • 37 Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001).CrossrefGoogle Scholar
    • 38 Van Iterson M, van Zwet EW, Heijmans BT, BIOS Consortium. Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol. 18(1), 19 (2017).Crossref, MedlineGoogle Scholar