Genome-wide DNA methylation in saliva and body size of adolescent girls

Trine B Rounge*,1,2, Christian M Page3,4, Maija Lepistö5, Pekka Ellonen5, Bettina K Andreassen2 & Elisabete Weiderpass1,2,6,7 Genetic Epidemiology Group, Folkhälsan Research Center, Helsinki, Finland Department of Research, Cancer Registry of Norway, Institute of Population-Based Cancer Research, Oslo, Norway Department of Neurology, Institute of Clinical Medicine, University of Oslo, Oslo, Norway Department of Noncommunicable Diseases, Norwegian Institute of Public Health, Oslo, Norway Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, Sweden Department of Community Medicine, Faculty of Health Sciences, University of Tromsø, The Arctic University of Norway, Tromsø, Norway *Author for correspondence: trine.rounge@kreftregisteret.no Research Article

The worldwide prevalence of obesity has more than doubled since 1980, and in 2013 there were 2.1 billion overweight and obese people worldwide [1].Obesity rates have increased in most populations and in all age groups worldwide, and Finland is no exception [2,3].Overweight and obese children are likely to stay obese into adulthood, and they tend to develop conditions such as Type 2 diabetes and cardiovascular diseases more frequently and at a younger age [4].Heritability of obesity has been estimated at about 50-70% [5], and more than 90 genetic loci have been associated with obesity-related traits [6].However, these genetic loci account for only 2.7% of variation in BMI [6].This suggests that nongenetic factors, such as epigenetics, may be important determinants of obesity.DNA methylation (DNAm) varies by genotype, sex, age [7,8], cell type [9] and environmental influences, such as smoking [10].DNAm is more plastic in utero and early life, potentially in response to environ-mental factors [11,12].DNAm is cell type specific and can vary in samples of mixed cell types, such as saliva and blood samples [13,14].However, most sites in the genome have a common methylation profile across cell types [15][16][17].
Associations between epigenetic marks and adiposity have been shown in animal models [18,19] and humans [14,20], although it is not always clear if these epigenetic changes preceded obesity, or vice versa [14].Differential DNAm between lean and heavy individuals in metabolically important genes has also been identified in several different cell types [14], and consistent findings have been shown for altered methylation in the IGF2/H19 imprinted region [21,22].Genomewide studies of peripheral blood cells have shown extensive, but small DNAm alterations in genes related to obesity, immune response, cell differentiation and regulation of transcription [14,[23][24].Tissue-specific differential DNAm has been found in leptin, For reprint orders, please contact: reprints@futuremedicine.com future science group Research Article Rounge, Page, Lepistö, Ellonen, Andreassen & Weiderpass pro-opiomeanocortin, glucose transporter-4 and insulin [20,25,26].A large epigenome-wide association study (EWAS) on whole blood from European cohorts showed obesity-associated DNAm in HIF3A, which was replicated in other studies and tissues [27,28].
In adolescents, five gene regions showed differential DNAm between those that lost weight in an intervention study and those that did not [29].One study reported that DNAm at the RXRA locus at birth accounted for 26% of the variation in obesity at age 9 years [30].Moreover, DNAm in cord blood has been associated with body size and composition in childhood, although few CpG sites were significant after correcting for multiple testing [31].
We performed an EWAS study of saliva DNAm in lean and heavy adolescent girls from the Finnish Health in Teens (Fin-HIT) cohort with the aim to identify differential DNAm and its association with body size.In addition, we aimed to show the usefulness of saliva as a source of DNA in EWAS studies.

Participants
The Fin-HIT study included about 11,000 adolescents aged 9-12 years from schools throughout Finland.Within the framework of the Fin-HIT study, all adolescents received a measuring tape and written detailed instructions, including pictures, on how to measure and report height, weight and waist circumference at home with the assistance of an adult.We used this information to calculate BMI (weight in kg/height in m 2 ).For the present analysis, we randomly selected 100 Fin-HIT cohort members aged 11 years: 50 from the tenth percentile of BMI (<15 kg/m 2 ) and 50 from the 90th percentile of BMI (>21.9 kg/m 2 ), referred to as lean and heavy girls, respectively (Table 1).The Fin-HIT participants reported their menarcheal status in the baseline electronic questionnaires, answered at school on tablets.Of the girls included in this study 60 had reached puberty (menarche = yes), 35 had not (menarche = no) and five did not answer the question (missing).The menarche status was included as a covariable in the model to adjust for weight differences due to puberty.
The Coordinating Ethics Committees of the Hospital Districts of Helsinki and Uusimaa approved the study, and informed consent was obtained from all adolescents and one of their legal guardians.

Procedures
All Fin-HIT cohort members provided saliva samples at baseline using the Oragene DNA Self-Collection Kit (OG-500, DNAgenotek).DNA was extracted from these samples using an automated protocol and chemagic DNA Saliva Kit (PerkinElmer, MA, USA).Three μg of DNA were used as an input in the targeted bisulfite sequencing (TBS) protocol.The SureSelect XT Human Methyl-Seq (Agilent, CA, USA) protocol, including DNA shearing, ligation of methylated adapters, capturing by hybridization, bisulfite conversion using Zymo Research's EZ DNA Methylation-Gold Kit (Zymo, CA, USA), PCR amplification, indexing and sample pooling, was used according to the manufacturer's recommendations.The predesigned target system captured a 84 Mb target and 3.7 million CpG sites.This includes CpG islands (20 Mb), cancer and tissue specific differential methylated regions (10 Mb), Gencode promoters (37 Mb), CpG island    After removal of duplicate sequences, the bisulfite-converted sequence reads were then mapped to the human genome (hg19) using Bismark (version 0.10) [33].This software transformed the sequences into a C-T and G-A version and each of the transformed sequences were aligned to equivalently preconverted reference genome using four parallel instances of the short read aligner Bowtie2 (version 2.0.5) [34].The Bismark methylation extractor and custom formatting scripts were used to calculate β-methylation values for CpG sites.
The first seven bases of each sequence were ignored, as a strong bias toward nonmethylation was caused by the insertion of unmethylated cytosines during end-repair in the sequencing library preparation.CpG sites at which more than 25% of the samples had less than 10× coverage were discarded.Cytosines in a non-CpG context and mitochondrial DNA is only methylated to a small extent, thus hypermethylation can be interpreted as less than optimal efficiency of the of bisulfite conversation reaction.Methylation of non-CpG cytosines and mitochondrial CpG were highly correlated among the samples (r 2 = 0.99) (Supplementary Information 2).We used the average level of mitochondrial DNAm in the sample as a measurement of bisulfite conversation efficiency in the model.Differential DNAm at CpG sites between lean and heavy girls was identified using a logistic regression model adjusting for puberty status and mitochondrial DNAm.The accumulation of the samples at the extreme end of the waist circumference or BMI range makes linear regression unsuitable.Thus, a case-control design with logistic regression was found most appropriate to identify differentially methylated CpG sites.After adjustment for multiple testing with a false discovery rate, no individual CpG site was significant.
To identify differentially methylated regions by combining near-by CpG sites, we used two different approaches: Bumphunter and Fisher's method [35,36].Bumphunter was used to identify differentially methylated regions independent of gene annotation, as implemented in R by Aryee et al. [37].We used the same covariates as in the logistic regression model described above, and set a cut-off for the first screening step at the top 0.1% of signals.The maximum allowed distance between CpG sites within the same regions was set to 1500 bp, and the number of bootstraps to 500.Since this approach is independent of gene annotation, and can give results that are difficult to interpret in a biological setting, we also tested each CpG island for any enrichment of p-values.This was done by aggregating the p-values from all CpG sites within each island, and applying an extension of Fisher's method [36], to summarize the individual CpG-wise p-values in to one p-value for each CpG Island.The correlation between the p-values was assumed to be the same as the correlation between the methylation values, estimated using the Spearman's rank correlation coefficient.The qq-man application in R was used to produce Manhattan plots and QQ-plots [38].
Pathways enriched for differential DNAm were evaluated using the improved gene set enrichment analysis for genome-wide association studies (i-GSEA4GWAS v1.1) [39].Predefined biological pathways and processes, including Biocarta, KEGG, Reactome pathways and gene ontology gene sets were assessed for enrichment.CpG sites up to 10 kb upstream and downstream of a given gene were mapped to the gene and the maximum -log (p-value) of all the CpG sites mapped to that gene was used to represent the gene.i-GSEA4GWAS uses site label permutation.We report the p-value and false discovery rate for the statistically significant gene sets < 0.001 and the associations for each CpG site and island for the outcome variable BMI.

Results
The difference in mean BMI between lean and heavy girls was 9.7 kg/m 2 , and the difference in waist circumference was 21.4 cm.Menarche was reported by 64% of lean girls and 78% of heavy girls (Table 1).
We sequenced 3.1 million CpG sites with the minimum 10× coverage, and 1.2 million of these sites had more than 50× coverage.The mean DNAm on autosomal CpG sites for all samples was 49.2%, and we could not identify a statistically significant difference between the global autosomal DNAm levels between the groups.As expected, most mitochondrial CpGs were hypomethylated.
The genome-wide differential DNAm analyses between lean and heavy girls, adjusted for menarcheal status and bisulfite conversion, resulted in a top list of 100 CpG sites with p-values below 5.28 × 10 -4 (Figure 1), but none that were genome-wide significant.More than 300 CpG had a p-value lower than 10 -3 .The two differential CpG sites (chr12: 78227663; p-value 4.90 × 10 -5 and chr18: 13868825; p-value 9.53 × 10 -05 ) most strongly associated with BMI were located within the NAV3 intronic region and within the CpG island upstream of the obesity-related MC2R gene (Table 2).For both sites a significant hypermethylation in heavy relative to lean girls was observed (Figure 2A & B).The difference in DNAm varied across the groups in at least three closely related sites between 13868800 and 13868800 on chromosome 18 (MC2R region), with position 13868825 (cg27031837) showing the strongest association (Figure 2D).No notable difference in DNAm was observed surrounding chromosome 12 position 78227663 (Figure 2C).The top ten list of strongest associated CpG sites included four sites that were located in or nearby obesity-related genes (MC2R, TMOD1, SLC35D3 and IGFBPL1), and two sites (MICAL2 and NRP2) were located in or nearby genes of interest in obesity research.Four CpG sites over-lapped with targets on the 450K array (cg27031837, cg24371251, cg02203665 and cg01154445).
Significant differences in DNAm were observed in genomic regions and CpG islands associated with BMI.Seven regions, independent of annotation, were identified with the software bumphunter to have a significant aggregation of closely located differential DNAm sites (Table 3).The region with the strongest association with BMI was located in a gene-dense region upstream of the diabetes-related IP6K1 gene.'Bumphunting' also identified a region including 18 CpG sites within the CpG island upstream of the MPL gene to be significantly different methylated in lean and heavy girls.The annotation specific analyses using the extended Fishers method identified five CpG islands to have genome-wide significant differential DNAm associated with BMI (Table 3).Notably, the CpG islands upstream of the IGF2BP1 gene was among the top five hits (Table 3).The TBS allowed a detailed view of correlation between differential DNAm within genomic locations.A consecutive mean difference in DNAm throughout the regions was observed in proximity of NXN, MPL, ABLIM2 and IGF2BD1 (Figure 3C, E Gene set enrichment analysis identified pathways enriched for genes in a 10-kb range of CpG sites with  Epigenome-wide association study of body size in adolescent girls Research Article differential DNAm.Pathways such as metabolism of androgen and estrogen, starch and sucrose, and porphyrin and chlorophyll were enriched with nominally significant CpG sites (Table 4).All these gene sets contained the genes UGT1A1-10.

Discussion
We identified genomic sites, regions and pathways that differed in saliva DNAm between 11-year-old lean and heavy Finnish girls by investigating 3.1 million CpG sites.Many of the sites and regions were colocated with known obesity-related genes, predominantly in the insulin-melanocotin pathway.Considering the limited sample size of 50 girls in each group, we identified a number of noteworthy CpG sites and regions.Colocation of the ten CpG sites most strongly associated with BMI and obesity-related gene regions [40][41][42][43][44][45], as well as the fact that our results replicate previous findings [31], strengthening the probability that the observed associations were true associations.We observed differential DNAm at chr12:78227663 nearby the gene NAV3, which, to our knowledge, has not been previously associated with body size.It is plausible that differential DNAm at chr12:78227663 may affect the expression of SYT1, which is located 500 kb downstream of NAV3.SYT1 is a regulator of exocytosis and endocytosis of insulin-containing vesicles [46] and is associated with eating behavior in pigs [47].
The MC2R CpG island harbored the second strongest signal from a CpG site.The melanocortin system plays an important role in the pathogenesis of obesity and in metabolic syndrome [40].Mutations in the MC4R gene are the most common cause of monogenic obesity in humans, and play a role in regulating energy expenditure and energy intake through the control of satiety.MC3R has a more subtle role in energy homeostasis [48], whereas MC2R is involved in the regulation of steroidogenesis and has recently been indicated to have a direct effect on adipose insulin sensitivity, endocrine function and thermogenesis [40].Our findings support a role for MC2R in key adipose functions and further suggest that all melanocortin receptors play crucial roles in the regulation of energy balance [48].Our data show that DNAm differs in at least three CpG sites within the CpG island upstream of MC2R, thus these entire regions should be investigated further.Most of the genes with the strongest associations (MICAL2, TMOD1, SLC35D3, NRP2, IGFBPL1) have been linked to obesity either directly or as a receptor of an obesity-related gene.IGF has consistently been identified as a differentially methylated region associated with BMI regardless of cell type [21,22], which we have now replicated in saliva.
TBS enables DNAm association analyses of single CpG sites, but also enables assessment of aggregated or correlated methylation within genes and regulatory regions, or regional assessment independent of annotation.With increased statistical power from aggregation of DNAm independently of annotation using 'Bumphunting' the strongest association was found upstream of the insulin-stimulated gene IP6K1 [49].We replicated the findings of Relton et al., which showed associations between core blood MPL differential DNAm and childhood obesity [31], using 'Bumphunting'.Since our results based on saliva DNA replicated associations identified in core-blood DNAm [31], we might speculate that the differential DNAm found in saliva cells may already be present in utero.A detailed look at this region (Figure 3E) showed that lean adolescents had lower DNAm throughout a 200 bp region.A recent study indicated that a variant in the GRK4 gene and high intake of sodium increase the risk of obesity [50].Our CpG island association results, also benefitting from increased statistical power due to aggregation of methylation at CpG islands, indicate that DNAm in the upstream CpG island GRK4 (Figure 3I) might also be of interest.The CpG island nearby IGF2BP1 also demonstrated an association between the insulin pathway and DNAm in saliva.This finding replicates previous associations between IGF2BP1 DNAm and Table 3.The differential DNA methylation regions most strongly associated with BMI identified by Bumphunter or CpG island analyses.Epigenome-wide association study of body size in adolescent girls Research Article body mass [51], as well as a recent study on monozygotic twins, which indicated that the differences in DNAm are due to nonshared environmental factors that influence body weight [52].

Start
Three pathways were enriched for differential DNAm, and previous studies have also shown obesity-related epigenetic changes in some of these pathways [53,54].However, the enrichment in all three pathways was driven mainly by DNAm differences in the genes UGT1A1-10; thus further studies on these genes are warranted.
Several studies have identified associations between BMI and DNAm in the H19/IGF2 region on chromosome 11 [21,22] and more recently in the ABCG1 gene region on chromosome 21 [27,55].None of these gene regions had signals among our top 100 findings, although two signals downstream of H19 (chromosome 11, 2020577 and 2020553) were nominal significant (both with p-value = 0.002).We also identified a signal downstream of the ABCG1 CpG island (p-value = 0.003).Lack of power and difference in tissue type (blood vs saliva) may explain why we did not replicate these findings.
All individuals in this study were girls born in the same year, removing potential confounding by age [8] and sex.11-year-olds are also less likely to have a history of potential DNAm confounders such as dieting, medication use and smoking than older girls and adults [56,57].Menarche was used as an indicator of puberty onset and likely effects DNAm levels [58,59]; hence, menarche status was included in the analytical models.This adjustment might conceal associations with regions such as the CYP19A1 promoter CpG island, which has previously been found to be associated with puberty [58].The study design used the extremes of the BMI curve.This design was chosen to compensate for the relatively small sample size but may mask similarities in DNAm among lean and heavy girls that differ in normal-weight adolescents.Although BMI is only an indicator of body fat in adolescents [60], it is likely that the substantial difference in BMI between the tenth and 90th percentiles indicates a clear distinction in body size between these groups.
Gene expression in saliva and DNAm may be influenced by factors such as diet.However, evaluating the potential role of reverse causation is an issue outside the scope of this study.Varying cell-type composition is a challenge in analyses of the saliva methylome, as it is in analyses of the blood methylome.There is a good reference cell-type composition for the blood methylome.A similar good reference does not exist for the saliva methylome.This, in combination with a lack of knowledge on the CpG sites outside the 450K array target [61], made a reference-free cell-type adjustment approach [62] impossible.A correlation between BMI in adolescents and cell-type composition in saliva is unlikely, thus we believe the results without this adjustment are valid.
Most recent EWAS studies were based on the Infinium human Methylation 450K or the older 27K array.However, our study used a TBS approach, which covered 84 Mb of the genome and enabled the detection of differential DNAm with high genomic resolution.While the 450K covers about approximately 1.7%, the TBS covered about 11.1% of the CpG sites in the human genome.Considering that most of the identified regions of interest presented here are not covered by 450K, this difference in resolution is important.It will be of great clinical interest to confirm the saliva DNAm associations to body size and the directionality of these findings.Further investigations of the epigenetic mechanisms of body mass regulation for each of the associations are warranted.

Conclusion & future perspective
In summary, the saliva methylome EWAS of 50 lean and 50 heavy Finnish girls identified differentially methylated CpG sites.The two strongest signals were located near NAV3 and MC2R.The substantial overlap with obesity-and insulin-related genes, including MC2R, IGFBPL1, IP6K1 and IGF2BP1, and the replication of MPL, indicate true associations between the saliva methylome and BMI.Saliva is a readily available source of DNA that can shed light on the etiology of DNAm and obesity.In the future, saliva DNAm could

Figure 1 .
Figure 1.Manhattan plot of CpG sites with differential DNA methylation in lean and heavy adolescents.

Figure 2 .
Figure 2. Detailed overview of DNA methylation of the two associated CpG sites with the strongest association with BMI and the surrounding regions.A boxplot of DNAm values for the lean and heavy group for: (A) position chr12:78227663 and (B) chr18:13868825.The targeted bisulfite sequencing also captures DNAm of the surrounding regions, here showing difference in mean DNAm in the lean group (dashed line) and heavy group (solid line) for: (C) chr12:78227586-78227757 and (D) chr18:13868730-13868997. chr: Chromosome; DNAm: DNA methylation.

Figure 3 .
Figure 3.The mean CpG DNA methylation in lean (dashed line) and heavy (solid line) throughout the five regions with the strongest associations with BMI identified by Bumphunter and CpGi analyses.The difference in DNA methylation means are due to a consecutive difference in the same direction (C), (E), (F) and (J), differences with intermediate non-different CpGs in (D) and (I), DNA methylation difference in a few CpGs illustrated by (B), or alternating methylation levels illustrated by (A), (G) and (H).chr: Chromosome; DNAm: DNA methylation.

Table 1 . Overview of the characteristics of the 100 11-year-old girls from the Finnish Health in Teens study and corresponding saliva samples. Phenotype data Mean (min/max) Lean mean (n = 50) (min/max) Heavy (n = 50) (min/max) p-value
Max: Maximum; Mill: Million; Min: Minimum.

Table 2 .
The ten DNA methylation sites most strongly associated with BMI.

Table 4 .
Gene set enrichment analyses of CpG sites and islands associated to BMI and waist circumference.Distance between gene and CpG site or island were 10 kb.FDR: False discovery rate.