Abstract
Aim: To show that an increased correlation between CpGs after selection through an epigenome-wide association studies (EWAS) might translate into biased replication results. Methods: Pairwise correlation coefficients between CpGs selected in two published EWAS, the top hits replication, Bonferroni p-values, Benjamini–Hochberg (BH) false discovery rate (FDR) and directional FDR r-values were calculated in the NINFEA cohort data. Exposures’ random permutations were performed to show the empirical p-value distributions. Results: The average pairwise correlation coefficients between CpGs were enhanced after selection for the replication (e.g., from 0.12 at genome-wide level to 0.26 among the selected CpGs), affecting the empirical p-value distributions and the usual multiple testing control. Conclusion: Bonferroni and Benjamini–Hochberg FDR are inappropriate for the EWAS replication phase, and methods that account for the underlying correlation need to be used.
References
- 1 DNA methylation profiling of human chromosomes 6, 20 and 22. Nat. Genet. 38(12), 1378–1385 (1987).Crossref, Google Scholar
- 2 . CpG islands in vertebrate genomes. J. Mol. Biol. 196(2), 261–282 (1987).Crossref, Medline, CAS, Google Scholar
- 3 . Novel region discovery method for Infinium 450 K DNA methylation data reveals changes associated with aging in muscle and neuronal pathways. Aging Cell 13(1), 142–155 (2014).Crossref, Medline, CAS, Google Scholar
- 4 Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int. J. Epidemiol. 41(1), 200–209 (2012).Crossref, Medline, Google Scholar
- 5 . A-clustering: a novel method for the detection of co-regulated methylation regions, and regions associated with exposure. Bioinformatics 29(22), 2884–2891 (2013).Crossref, Medline, CAS, Google Scholar
- 6 . WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559 (2008).Crossref, Medline, Google Scholar
- 7 . How to make DNA methylome wide association studies more powerful. Epigenomics 8(8), 1117–1129 (2016).Link, CAS, Google Scholar
- 8 . Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. 57(1), 289–300 (1995).Google Scholar
- 9 . Deciding whether follow-up studies have replicated findings in a preliminary large-scale omics study. Proc. Natl Acad. Sci. USA 111(46), 16262–16267 (2014).Crossref, Medline, CAS, Google Scholar
- 10 A powerful statistical framework for generalization testing in GWAS, with application to the HCHS/SOL. Genet. Epidemiol. 41, 251–258 (2017).Crossref, Medline, Google Scholar
- 11 DNA Methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis. Am. J. Hum. Genet. 98(4), 680–696 (2016).Crossref, Medline, CAS, Google Scholar
- 12 . Sex differences in DNA methylation assessed by 450 K BeadChip in newborns. BMC Genomics 16, 911 (2015).Crossref, Medline, Google Scholar
- 13 Sparse PCA corrects for cell type heterogeneity in epigenome-wide association studies. Nat. Methods 13(5), 443–445 (2016).Crossref, Medline, CAS, Google Scholar
- 14 Feasibility of recruiting a birth cohort through the Internet: the experience of the NINFEA cohort. Eur. J. Epidemiol. 22, 831–837 (2007).Crossref, Medline, Google Scholar
- 15 Comparison of beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11, 587 (2010).Crossref, Medline, CAS, Google Scholar
- 16 R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2017). www.R-project.org/.Google Scholar
- 17 . Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika 10(4), 507–521 (1915).Google Scholar
- 18 . Econometric computing with HC and HAC covariance matrix estimators. J. Stat. Softw. 11(10), 1–17 (2004).Crossref, Google Scholar
- 19 . Reference-free cell mixture adjustments in analysis of DNA methylation data. Bioinformatics 30(10), 1431–1439 (2014).Crossref, Medline, CAS, Google Scholar
- 20 An evaluation of methods correcting for cell-type heterogeneity in DNA methylation studies. Genome Biol. 17, 84 (2016).Crossref, Medline, Google Scholar
- 21 . The Kolmogorov–Smirnov test for goodness of fit. J. Am. Stat. Assoc. 46, 68–78 (1951).Crossref, Google Scholar
- 22 . Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests. J. Stat. Modeling Anal 2, 21–33 (2011).Google Scholar
- 23 . A test of goodness of fit. J. Am. Stat. Assoc. 49(268), 765–769 (1954).Crossref, Google Scholar
- 24 . EDF statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69, 730–737 (1974).Crossref, Google Scholar
- 25 . gtools: various R programming Tools (2015). https://CRAN.R-project.org/package=gtools.Google Scholar
- 26 . Moments: moments, cumulants, skewness, kurtosis and related tests (2015). https://CRAN.R-project.org/package=moments.Google Scholar
- 27 Revolution Analytics, . Foreach: provides foreach looping construct for R (2015). https://CRAN.R-project.org/package=foreach.Google Scholar
- 28 . Deciding whether follow-up studies have replicated findings in a preliminary large-scale “omics’ study”. www.runmycode.org/companion/view/542.Google Scholar
- 29 . Too big to fail: large samples and the p-value problem. Inform. Syst. Res. 24, 906–917 (2013).Crossref, Google Scholar
- 30 Genome-wide DNA methylation study in human placenta identifies novel loci associated with maternal smoking during pregnancy. Int. J. Epidemiol. 45(5), 1644–1655 (2016).Crossref, Medline, Google Scholar
- 31 Epigenome-wide meta-analysis of methylation in children related to prenatal NO2 air pollution exposure. Environ. Health Perspect. 125(1), 104–110 (2017).Crossref, Medline, CAS, Google Scholar
- 32 . Permutation Tests: A Practical Guide To Resampling Methods For Testing Hypotheses (2nd edition). Springer-Verlag, NY, USA (1994).Crossref, Google Scholar
- 33 . So many correlated tests, so little time! Rapid adjustment of P-values for multiple correlated tests. Am. J. Hum. Genet. 81(6), 1158–1168 (2007).Crossref, Medline, CAS, Google Scholar
- 34 . A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74(4), 765–769 (2004).Crossref, Medline, CAS, Google Scholar
- 35 . Efficient computation of significance levels for multiple associations in large studies of correlated data, including genomewide association studies. Am. J. Hum. Genet. 75(3), 424–435 (2004).Crossref, Medline, CAS, Google Scholar
- 36 . Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics. J. Stat. Plan. Infer. 82, 171–196 (1999).Crossref, Google Scholar
- 37 . The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29(4), 1165–1188 (2001).Crossref, Google Scholar
- 38 , BIOS Consortium. Controlling bias and inflation in epigenome- and transcriptome-wide association studies using the empirical null distribution. Genome Biol. 18(1), 19 (2017).Crossref, Medline, Google Scholar

