Evidence for DNA methylation mediating genetic liability to non-syndromic cleft lip/palate

Aim: To determine if nonsyndromic cleft lip with or without cleft palate (nsCL/P) genetic risk variants influence liability to nsCL/P through gene regulation pathways, such as those involving DNA methylation. Materials & methods: nsCL/P genetic summary data and methylation data from four studies were used in conjunction with Mendelian randomization and joint likelihood mapping to investigate potential mediation of nsCL/P genetic variants. Results & conclusion: Evidence was found at VAX1 (10q25.3), LOC146880 (17q23.3) and NTN1 (17p13.1), that liability to nsCL/P and variation in DNA methylation might be driven by the same genetic variant, suggesting that genetic variation at these loci may increase liability to nsCL/P by influencing DNA methylation. Follow-up analyses using different tissues and gene expression data provided further insight into possible biological mechanisms.

the (potentially false) assumption that methylation signatures in these tissues might reflect those of the developing orofacial tissues more accurately.

Data sources nsCL/P genetic risk variants
We identified single nucleotide polymorphisms (SNPs) associated with nsCL/P in European populations by conducting a meta-analysis of summary statistics from two nsCL/P GWAS. Summary statistics for the first GWAS came from a case-control study of 399 cases and 1,318 controls of central European descent [8]. For the second GWAS, we generated summary statistics by conducting a GWAS using individual level data from 638 parent-offspring trios and 178 parent-offspring duos of European descent from the International Consortium to Identify Genes and Interactions Controlling Oral Clefts (ICC). These data were available to download from dbGaP (Study accession phs000094.v1.p1) [26]. Full GWAS methods are described in the Supplementary Material, but briefly, we performed a transmission disequilibrium test [27] on the pedigree data. We then performed a fixedeffects inverse-variance-weighted meta-analysis of the summary statistics from both GWAS using METAL [28] on the total sample of 1215 cases and 2772 controls. The results compared well with those previously published using a very similar dataset but slightly different quality control and analysis methods [7]. We used LiftOver (genome.sph.umich.edu/wiki/LiftOver) to convert the genome positions in the nsCL/P summary statistics to the most recent genome build 37. Finally, we used PLINK [29] and data from the ALSPAC as a reference panel to clump the results according to LD (r 2 < 0.001), within a 250 kb region around each index variant, and generate a set of independent SNPs for the pipeline. ALSPAC To identify mQTL (SNPs associated with DNA methylation), we used data from ALSPAC [30,31]. In addition to collecting detailed questionnaire and clinic data for the whole cohort, the study has generated genome-wide DNA methylation and genotype data for subsets of the cohort [32] (methods described in the Supplementary Material). These data have previously been used to generate a database of mQTL [22]. The database contains summary statistics for all mQTL with a p-value < 1 × 10 -7 for the association between SNP and CpG. For the purposes of this study, we focused on the mQTL identified in cord blood samples collected at birth (the closest available time point to the orofacial developmental period). For part of our study (the reverse two sample MR), we required specific CpG-SNP associations that were unavailable from mQTLdb.org. Therefore, for required CpGs, we replicated the methods in the original study: we excluded individuals with missing genotype or covariate data, leaving 787 children. We then rank-normalized the methylation data to remove outliers and controlled for covariates, potential batch effects and the influence of cell heterogeneity by regressing data points on sex, the first 10 ancestry principal components, bisulfite-converted DNA batch and blood cell proportions estimated using the Houseman method [33,34]. We then calculated residuals, which were used as the outcome variable in a linear regression model in PLINK [29] to calculate the relevant CpG-SNP associations.

Methylation genetic risk variants (mQTL)
Finally, we excluded any mQTL acting in trans (i.e., any SNP associated with a CpG site more than 1 M base pairs away) and excluded any CpGs that have been flagged as potentially problematic (e.g., cross-hybridizing probes) according to a previous publication [35]. GOYA We attempted to replicate mQTL from ALSPAC using genotype and cord blood DNA methylation data from the Genetics of Overweight Young Adults (GOYA) cohort, which is a subset of the Danish National Birth Cohort (DNBC) [36]. Genotype and cord blood DNA methylation data were available for 1000 children. We replicated the methods described above for ALSPAC by first excluding individuals with missing genotype or covariate data, leaving 889 children and also removing SNPs with missingness (>5%) using PLINK. As in ALSPAC, we rank-normalised the methylation data to remove outliers and adjusted for covariates, potential batch effects and the influence of cell heterogeneity by regressing data points on sex, the first ten ancestry principal components, DNA batch and blood cell proportions estimated using the Houseman method [33,34]. Residuals were then used as the outcome variable in a linear regression model in PLINK to calculate the relevant CpG-SNP associations.
future science group www.futuremedicine.com

Expression quantitative trait loci as genetic risk variants
Genotype-Tissue Expression project To identify expression quantitative trait loci (eQTL; SNPs associated with gene expression), we used the Genotype-Tissue Expression project (GTex) data, which is a database of eQTL generated using genotype and RNA sequencing gene expression data for 43 distinct tissue types from 175 individuals [37,38].

NESDA NTR conditional eQTL catalog
To explore the consistency of our findings, we also identified eQTL using a second database: the NESDA NTR Conditional eQTL Catalog. For this database, eQTL were identified using genotype and gene expression microarray data from blood samples from 4896 individuals across two Dutch biobanks. Conditional eQTL analysis was applied to distinguish between dependent and independent eQTL [39].

DNA methylation in children with orofacial clefts
Brazilian cohort If genetic liability to nsCL/P is associated with DNA methylation at certain CpGs, we would expect these CpGs to be differentially methylated between nsCL/P cases and controls. To assess whether this was the case at nsCL/Passociated CpGs (identified through MR), we performed a look-up of results from a recently-published case-control EWAS [16]. This EWAS compared blood DNA methylation profiles in 67 nonfamilial, nsCL/P cases and 59 ageand sex-matched controls from a Brazilian population. The average age at sampling was 5.29 years for cases and 6.45 years for controls. DNA methylation was measured using the Illumina Infinium HumanMethylation450 BeadChip platform.

The Cleft Collective
To explore whether methylation at nsCL/P-associated CpGs differs by cleft subtype, we compared mean methylation values in blood and matched lip/palate tissue samples from 150 children from the UK enrolled in the Cleft Collective birth cohort study. Methylation data were generated for a separate study, as previously described [3]. Briefly, a sample of 150 believed to be nonsyndromic children was randomly selected and stratified by cleft subtype: 50 with CLO, 50 with CPO and 50 with CLP. These children have been classified as nonsyndromic because they have not been diagnosed as having any other anomaly, however, since the children are still very young, we cannot be completely sure of their nonsyndromic status. Blood and either lip or palate tissue samples were available for each of the 150 children in this study. The orofacial tissue type was dependent on the orofacial cleft subtype; therefore, lip samples were available for children with CLO and palate samples for children with CPO. Of the 50 children with CLP, 43 contributed a lip sample and seven contributed a palate sample. Genome-wide DNA methylation was measured using the Illumina Infinium HumanMethylation450 BeadChip platform and functional normalization was performed on the blood and tissue samples together. Of the original 300 samples, three blood and two lip samples failed quality control. Surrogate variables were generated using the sva package in R to capture variation in the methylation data associated with technical batch and cellular heterogeneity [40].

Analysis pipeline
Testing for mediation: MR of the effect of methylation on liability to nsCL/P nsCL/P meta-GWAS summary statistics for 543,150 SNPs were LD-pruned (r 2 < 0.001) to 17,090 independent SNPs. These independent SNPs were then merged with 127,215 mQTL from the ALSPAC mQTL database. After removing potentially problematic CpGs and CpGs acting in trans (which may increase the likelihood of horizontal pleiotropy), there were 7091 independent CpG-SNP pairings for 6425 distinct CpGs. We then used the MR-base R package [41] to perform two-sample MR on all CpGs, using mQTL as the exposure variables and nsCL/P as the outcome. In initial analysis, CpGs with one mQTL were tested using the Wald test and CpGs with more than one mQTL were tested using the Inverse variance weighted (IVW) method. To account for possible residual LD between mQTL, CpGs with more than one mQTL, were retested adjusting for LD between the SNPs using a likelihood-based method [42]. Pair-wise SNP LD was computed using the Caucasian European (CEU) and British (GBR) populations in LDlink [43]. As a sensitivity analysis, we attempted to replicate the SNP-CpG associations with a Bonferroni-corrected MR p-value < 0.05 in GOYA, an independent sample, to further evaluate the evidence that the SNP-CpG associations of interest are genuine.
Testing for reverse causation: MR of the effect of genetic liability to nsCL/P on methylation The MR analysis of methylation on nsCL/P used a single genetic instrument for the majority of CpGs, so it is difficult to discern the direction of effect. Therefore, we used MR-base to conduct the reverse two-sample MR to assess possibility B), reverse causation, that liability to nsCL/P causes downstream effects in methylation. Six genome-wide significant nsCL/P SNPs in Europeans [7] were used as the exposure and mQTL from ALSPAC were used as the outcome. The IVW method was used as the primary analysis.
Testing for linkage: joint-likelihood mapping to assess colocalization We used the Joint likelihood mapping (JLIM) package in R ([jlim.R] [44] to test possibility C), liability to nsCL/P and methylation are driven by the same causal effect in each region of interest, in other words, the mQTL and nsCL/P risk variant are colocalized rather than simply being in LD. To distinguish between separate causal variants, we set the limit of genetic resolution in terms of r 2 to 0.8. 1000 Genomes [45] CEU data were used as the reference dataset for LD. Most CpGs were associated with only one independent mQTL, so we were not able to distinguish mediation/vertical pleiotropy ( Figure 1A) from horizontal pleiotropy ( Figure 1D).

Comparison to gene expression
The previous steps identified CpGs that potentially mediate the effect of genetic variation on susceptibility to nsCL/P. Further evidence for a functional effect would be provided if mQTL also affected gene expression. Therefore, we looked up relevant SNPs in two eQTL databases (GTEx [38] and NESDA NTR Conditional eQTL Catalog [39]) and noted the estimated effect size and p-values for eQTL in various tissues.

Comparison to EWAS results
At identified CpGs, we looked-up the mean methylation values in nsCL/P cases and controls from the Brazilian EWAS study to determine if CpGs of interest from the MR were differentially methylated between nsCL/P cases and controls. Evidence for differential methylation would further support an etiological role in nsCL/P. We compared the direction of estimated effect and p-values obtained using the observational EWAS and MR approaches.
Tissue and cleft-subtype-specific variation At identified CpGs, we used data from the Cleft Collective to explore whether methylation in blood was correlated with methylation at the site of the cleft (lip/palate), and whether mean methylation varied according to cleft subtype (CLO, CPO or CLP). One-way ANOVA was used to compare the mean methylation of subtypes, adjusting for sex and surrogate variables designed to capture technical batch and cell composition effects.

Results
Testing for mediation: MR of the effect of methylation on liability to nsCL/P To identify CpGs where methylation may mediate genetic liability to nsCL/P, we used two sample MR with mQTL from the ALSPAC data as the exposure and liability to nsCL/P as the outcome (from the nsCL/P GWAS metaanalysis summary statistics). We found evidence for an effect of methylation on liability to nsCL/P at 26 CpGs after a multiple testing correction for the number of different CpGs tested (Bonferroni-corrected p-value < 0.05, corresponding to an uncorrected p-value < 7.8 × 10 -6 ). Of these 26 CpGs, 20 were instrumented by single mQTL and six were instrumented by two mQTL each. When the six CpGs with two mQTL each were re-tested, taking into account LD between the SNPs, only one (cg02598441 at LOC146880) survived correction for multiple testing. These 21 mQTL were therefore taken forward to the reverse-causation step.
As a sensitivity analysis, we investigated all 21 of the ALSPAC mQTL in data from the GOYA cohort. 17 of the 21 CpG-SNP pairings passed quality control and were present in the GOYA data, of which 16 replicated in the same direction with p < 0.05 (Supplementary Table 1).
Testing for reverse causation: MR of the effect of genetic liability to nsCL/P on methylation Next, we tested if the association between the mQTL and liability to nsCL/P arose because genetic liability to nsCL/P influences variation in methylation, by using two sample MR with genetic liability to nsCL/P as the exposure and methylation as the outcome. We found no strong evidence that genetic liability to nsCL/P influences variation in methylation at the 21 CpGs (Table 1). However, it should be noted that this step is very likely to be limited by statistical power.
future science group www.futuremedicine.com   Testing for linkage: joint-likelihood mapping to assess colocalization We used JLIM, a colocalization method, to assess if there was evidence that methylation and liability to nsCL/P are driven by the same causal effect at each locus. Of the 20 CpGs instrumented by one mQTL each, we found evidence for colocalization at four CpGs (cg11398452, cg01862363, cg02481697 and cg16107528). With the addition of the CpG site associated with two mQTL (cg02598441), we found strongest evidence that methylation at five CpGs are putative mediators of genetic liability to nsCL/P at four different SNPs (Table 1).
Of these four SNPs, three were available and tested in the imputed GOYA data (rs807647, rs1808191 and rs4752028). Two of the SNPs (intergenic rs8076457 and rs1808191 near PLEKHM1P1) consistently replicated as mQTL in GOYA, with the third SNP rs4752028 replicating as an mQTL for two out of three CpG sites but not the CpG-SNP pairing with evidence of colocalization (Supplementary Table 1).

Comparison between methylation & gene expression
To assess further evidence for a functional effect of our identified mQTL on gene expression, we performed a look-up of the four identified SNPs in the GTex and NESDA NTR Conditional eQTL databases. We found strong evidence that rs4752028 at SHTN1 is an eQTL for SHTN1 in whole blood (Table 2). There was also strong evidence that both rs1808191 at PLEKHM1P1 and rs1991401 at CEP95/DDX5 are eQTL for six nearby genes, including CEP95 (whole blood) and DDX5 (whole blood), which were identified through both databases ( Table 2). There was no available evidence in either catalogue that intergenic SNP rs8076457 is associated with gene expression (Table 2).

Comparison to EWAS results
To provide further evidence for an association between methylation and nsCL/P at our identified CpGs, we compared the MR results (relating to liability to nsCL/P in the general population) to results from an nsCL/P case-control EWAS study. At cg02598441 (LOC146880), the direction of effect estimated in our first (forward) MR analysis was concordant with that in the EWAS study, with an EWAS p-value (2.4 × 10 -3 ) that survived Bonferroni correction for five tests ( Table 3). The direction of estimated effect was also concordant between studies at the three CpGs at NTN1, but the smallest EWAS p-value was 0.12. At cg11398452 (VAX1), the direction of estimated effect was discordant between our MR analysis and the Brazilian EWAS, with a small EWAS p-value (9.4 × 10 -3 ; Table 3).
future science group www.futuremedicine.com

Tissue & cleft-subtype-specific variation
We assessed correlations in methylation at our identified CpGs between different tissues (blood and lip/palate) and cleft subtypes (CPO, CLP, CLO) to explore tissue-and subtype-specific variation. At most of the five identified CpGs, there was evidence of weak correlation (correlation coefficients ranging -0.11 to 0.32) between methylation in blood, lip and palate tissues, particularly between blood and lip tissue. We found weak evidence that mean methylation values in any of the three tissues differed between cleft subtypes. However, the analysis was likely underpowered to give precise correlation estimates (Table 3).

Discussion
In this study, we employed a framework that aims to identify putative mediators of genetic influences on nsCL/P via DNA methylation. We found five CpG sites, in three independent regions (VAX1, LOC146880, NTN1), where bidirectional MR and colocalization analyses suggested that the same variant affects both methylation and nsCL/P or where evidence was found that two independent variants affect both nsCL/P and methylation (Figure 1). We found lower methylation at the CpG at VAX1 (cg11398452) in association with the nsCL/P risk allele C of the SNP rs4752028 at SHTN1. This SNP is strongly associated with lower expression of SHTN1 according to two eQTL databases. VAX1 is a homeobox containing gene that has been shown to be expressed in the developing brain [46,47] and SNPs in VAX1 have been shown to be associated with nsCL/P in multiple independent GWAS across distinct populations [4,8,9,48,49]. VAX1 knock-out mice have been shown to develop cleft palate, suggesting VAX1 has a potentially important role in nsCL/P etiology [4]. SHTN1, sometimes known as KIAA1598, codes for the protein shootin1 that is involved in neuronal polarization [50] and has also been reported to be relevant to the etiology of nsCL/P in several studies [9,51,52]. It is difficult to distinguish the more significant locus between the VAX1 and SHTN1 genes because of their close proximity and similar expression profiles in mice [9,47,52,53]. We did not replicate the association between methylation at cg11398452 and rs4752028 in the GOYA data, but we did find that the SNP was strongly associated with methylation at nearby probes. Similarly, the direction of association between cg11398452 methylation and nsCL/P was the opposite in our MR study, compared with a previously published observational EWAS [16].
We found higher methylation at the CpG at LOC146880 (cg02598441) in association with the G allele of rs1991401 in DDX5 and the C allele of rs1808191 in PLEKHM1P. rs1991401 in DDX5 was associated with reduced expression of DDX5 and increased expression of CEP95 and MILR1 while rs1808191 in PLEKHM1P was associated with increased expression of SMURF2 but decreased expression of PLEKHM1P, RP13-104F24.3 and has-mir-6080. However, there was weak evidence that the SNPs affected expression of the same genes. DDX5 is involved in RNA helicase processes that are highly relevant to important cellular processes while PLEKHM1P and LOC146880 are pseudogenes [47]. There is no robust evidence from previous literature to support an association between genetic variation in these genes and nsCL/P. The SNP in PLEKHM1P replicated as an mQTL in the GOYA dataset but there were insufficient data to test the SNP in DDX5.
We found lower methylation at three CpGs at NTN1 in association with the nsCL/P risk allele T of the SNP rs8076457, an intergenic SNP close to NTN1. rs8076457, the mQTL for cg08162363, cg02481697 and cg16107528, did not robustly associate with gene expression levels in two datasets. The function of NTN1 is still largely unknown, but is thought to be involved in cell migration during development [47]. NTN1 has been previously discussed as a strong candidate gene for nsCL/P [13]; NTN1 may affect liability to nsCL/P via epistatic interactions, there is some evidence that NTN1 knock-out mice show consistency with the cleft palate phenotype and NTN1 expression is localized to the palate [13,54]. rs8076457 replicated as an mQTL across all relevant CpGs in the GOYA dataset.
Previous work has identified many functional possibilities for genetic risk variants for nsCL/P [15,54,55] but this study adds to the current evidence for DNA methylation playing a role in the etiology of nsCL/P. Additional strengths of this study include the integration of multiple data sources, for example ALSPAC, which provided access to detailed phenotype, genotype and epigenetic data. The nsCL/P GWAS summary statistics allowed a comprehensive genome-wide analysis in a large dataset. The Brazilian cohort EWAS results allowed a comparison of the influence of methylation on nsCL/P according to observational and MR studies. The use of the GOYA replication cohort, allowed triangulation of evidence for mQTL across different studies. Finally, the Cleft Collective data allowed us to compare genome-wide DNA methylation in different tissues and subtypes of cleft.
There are, several limitations to this study. First, methylation and expression in the studied tissues (postnatal cord blood, whole blood, lip and palate tissue) are unlikely to accurately reflect that in the developing orofacial tissue where epigenetic processes could feasibly influence susceptibility to nsCL/P. Since it is not possible to study DNA methylation in the developing orofacial region of affected individuals, we explored correlations between blood and lip/palate as a best available alternative. We found weak correlation between DNA methylation at our putative causal regions in blood and lip/palate samples from individuals in the Cleft Collective. A previous study found high blood-lip/palate correlation at CpGs identified as differentially methylated between nsCL/P cases and controls [16]. Correlations in methylation signatures across tissues in the same individuals provide support for effects that are not specific to certain tissues. For example, a recent study showed strong correlation of cis-eQTL and mQTL effects between two tissues (brain and blood) [56]. However, without data on the relevant tissue, it remains possible that the methylation and expression signatures of blood, lip and palate could be poorly representative of those in the developing orofacial tissues. Second, CLO and CLP cases were analyzed together as one group in the GWAS, MR analyses and the previously published EWAS. Increasingly, evidence suggests that these subtypes are molecularly and etiologically distinct and should be analyzed separately [2,3], but we were limited by the data available from previous studies. Although we found no evidence of differential methylation between subtypes at our five identified CpGs, there may be other loci where methylation mediates genetic influences on more specific cleft subtypes. Third, although efforts were made to select only nonsyndromic cases for the Cleft Collective analysis, we cannot guarantee that no syndromic cases were included, and children with syndromes may have very different methylation profiles. Fourth, a major limitation of this study is that some of the steps, particularly the reverse MR (testing the effect of liability to nsCL/P on methylation), are likely to be statistically underpowered. Liability to nsCL/P has been previously shown to have a causal affect on facial morphology [57] suggesting that it could also affect methylation. Fifth, as the majority of mQTL were instrumented by just a single genetic variant, we were unable to distinguish between mediation and horizontal pleiotropy and therefore proposed mediation is putative. Finally, our comparisons with other cohorts may be affected by technical differences, tissue differences (cord blood vs whole blood), ancestral differences between cohorts, age of participants (newborns vs children over 6 years old, a lack of statistical power giving rise to spurious associations or the enrichment of GOYA for overweight and obese mothers, which may introduce selection bias. Indeed, although mQTL were largely concordant between ALSPAC and GOYA, the mQTL and CpG site (in SHTN1) found to colocalize with liability to nsCL/P in ALSPAC did not replicate in GOYA. Similarly, the CpG site cg11398452, close to SHTN1, was directionally discordant between our analyses and the results of a Brazilian EWAS.

Conclusion
We identified three putative loci where DNA methylation may mediate genetic susceptibility to nsCL/P with stronger conclusions limited by the use of blood as a proxy for developing orofacial tissues. Future work determining the function of these genes and the epigenetic modulation of their expression relevant to prenatal orofacial development could provide important etiological insights. One possibility, warranting further investigation, is that identified DNA methylation differences are related to environmental exposures.

Future perspective
Increase in the availability and size of integrated DNA methylation, gene expression and genotype datasets will enable comprehensive evaluation of the role of epigenetic processes in disease etiology. In particular for congenital traits like nsCL/P, the unavailability of the most 'relevant' embryonic tissue is always likely to be a limitation for human studies. However, large-scale efforts to identify genetic variants that influence epigenetic processes in a tissue-independent manner will be invaluable to MR studies that will help mitigate this concern. Relating nsCL/P-associated epigenetic processes to environmental exposures that are risk factors for nsCL/P, may lead to the development of targeted interventions.

Supplementary data
To view the supplementary data that accompany this paper please visit the journal website at: www.futuremedicine.com/doi/full/ Ethical approval relevant to GOYA was obtained from the regional scientific ethics committee and by the Danish Data Protection Board.

Availability of data and material
In this study, we constructed nsCL/P summary statistics using data from the International Consortium to Identify Genes and Interactions Controlling Oral Clefts (ICC) and from the Bonn-II study. The authors do not have the necessary permissions to publicly release the nsCL/P summary statistics used in this study because of the use of individual level data downloaded from dbGAP. The dbGAP nsCL/P data are available to download from dbGaP (Study Accession phs000094.v1.p1) [37]. Data access proposals for individual level data can be submitted for ALSPAC, the Cleft Collective and GOYA (a subset of the Danish National Birth Cohort).
The authors declare that they have no competing interests.  The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.

Open access
This work is licensed under the Creative Commons Attribution 4.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Summary points
• Nonsyndromic cleft lip with or without cleft palate (nsCL/P) is a complex trait with genetic and environmental risk factors. • Around 40 distinct genetic risk loci have been identified for nsCL/P, but many reside in nonprotein-coding regions with an unclear function. • We hypothesize that epigenetic processes may play an important role in the etiology of nsCL/P. • To explore whether nsCL/P genetic risk variants influence liability to nsCL/P through gene regulation pathways involving DNA methylation, we combined data from multiple sources and used an analysis framework involving Mendelian randomization and joint likelihood mapping. • We find putative evidence that genetic variation in VAX1, LOC146880 and NTN1 may increase liability to nsCL/P via changes in DNA methylation. • Methylation in blood, lip and palate tissue may not be strongly representative of that in the developing orofacial tissues, which limits stronger conclusions. • The study highlights the potential role for DNA methylation in the etiology of nsCL/P. • Future work exploring the influence of prenatal environmental factors on DNA methylation related to nsCL/P is warranted.