Genome-wide analysis of nuclear magnetic resonance metabolites revealed parent-of-origin effect on triglycerides in medium very low-density lipoprotein in PTPRD gene

N Pervjakova*,1,2,3, V Kukushkina1,2, T Haller1, S Kasela1, A Joensuu4,5, K Kristiansson4,5, T Annilo1,2, M Perola1,4,5, V Salomaa4, P Jousilahti4, A Metspalu1,2 & R Mägi1 1Estonian Genome Center, Institute of Genomics, University of Tartu, Tartu 51010, Estonia 2Department of Biotechnology, Institute of Molecular & Cell Biology, University of Tartu, Tartu 51010, Estonia 3Department of Genomics of Common Disease, School of Public Health, Imperial College London, Hammersmith Hospital, UK 4National Institute for Health & Welfare (THL), Department of Public Health Solutions, Helsinki, Finland 5Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Finland *Author for correspondence: Tel.: +372 534 88 567; natalia.pervjakova@ut.ee

Circulating metabolites are intermediate products of metabolic processes that are known to participate in a wide spectrum of biological pathways and consequently associated with many diseases and pathological conditions [1].In this context, metabolites could be used as a system-wide connection from genetic variants to disease-or phenotypic outcome.Therefore, identification of genetic variants associated with these molecules is an important task of genome-wide association studies (GWAS) [2][3][4][5][6].The advances in proton nuclear magnetic resonance ( 1 H-NMR or shortened as NMR) spectroscopy provide a method for quantifying of a wide range of high and low molecular weight compounds without any prior selection [7].To date, the NHGRI-EBI Catalog of published GWAS reports approximately 1529 genetic variants in 58 studies associated with different kind of metabolic phenotypes measured from diverse range of different tissue samples [8].The most comprehensive GWAS of NMR metabolites to date was performed by Kettunen et al.where profiles of metabolic measures were analyzed in up to 25,000 individuals and 62 associations with various metabolites were described [9].
Although GWAS have been more successful discovering genetic variants associated with human metabolites, there is still a large proportion of 'hidden heritability'.As a source of genetic variance, several analyses have been proposed, including those focusing on rare variants, structural variants, gene-gene and gene-environment interactions and parent-of-origin effects (POEs) [10,11].POEs reflect a combination of genetic and epigenetic mechanisms, modulating different complex traits associated mainly with embryonic growth and development [12] and these effects have been described with many complex diseases such as diabetes mellitus, disglycemia and retinoblastoma [13-15].In case of POEs, not only the presence of an allele, but also its parental origin affects the phenotypic outcome [16].As an example, maternal allele-specific association of a common variant (rs1367117) located in the APOB gene was demonstrated in the analysis of adiposity [17] and borderline significant effect for fasting glucose and insulin levels [17].This nonsynonymous missense variant that cause a Thr > Ile amino acid change was also observed in non-POE association with serum ApoB levels [18].
Genomic imprinting is considered as the primary process that leads to POEs [19].To date, approximately 100 genes are verified to be imprinted in humans and many more genes have been proposed for testing.Also, both imprinting and POEs can be tissue specific and depend on stage of the embryo development.Mott et al.  demonstrated that the interaction between nonimprinted genes with imprinted ones can cause POEs on complex traits [20].The role of genomic imprinting in human metabolic processes is relatively poorly understood, although convincing examples come from mice models where the process of imprinting is denoted as a key regulator of metabolism from infancy to adulthood [21].Several imprinted genes in mice such as Igf2r and Grb10 are known to influence energy homeostasis and have been proposed to coordinate glucose-regulated metabolism [22][23][24] while Dlk1, Dio3 and Gnas are associated with thermogenesis in brown adipose tissue in mice progeny [25][26][27][28][29].
POE analysis are usually conducted using family trios and are often underpowered due to the lack of that kind of data.In contrast, huge amount of genotype data of unrelated individuals has become available in the era of biobanks.Hoggart et al. proposed a novel method for finding POEs in genotype data of unrelated individuals harboring SNPs at which such effects occur with increased variability to the phenotypic outcome [30].This method is based on the assumption that POEs can be identified as increased variance of phenotype in heterozygotes group compared with homozygous individuals groups due to genetic effect differences among maternal and paternal mutation carriers.This approach was successfully tested on anthropometric phenotypes and two genes were identified having POE on BMI [30].As POEs are often associated with metabolic diseases such as gestational diabetes [31] and metabolic syndrome [32], we aimed to investigate POEs of NMR metabolites measured in blood serum.

Cohort description
The study sample contained 14,815 adult individuals of European descent from three population-based cohortsone from Estonia (EGCUT) and two from Finland (DILGOM and FINRISK1997) -all being genotyped with a genome-wide genotyping array and imputed using 1000 Genomes Project multi-ethnic reference panel (Phase III, March 2012 release) (Supplementary Table 3).
The EGCUT (Estonian Biobank from Estonian Genome Center, University of Tartu) is a prospective, volunteerbased sample of the Estonian resident adult population (aged ≥ 18 years).The current number of participants is close to 52,000, which represents about 5% of the Estonian adult population.General practitioners and medical personnel in the special recruitment offices recruited adult volunteers throughout the country in the period of 2002-2010.At baseline, the GPs performed the standardized health examination of the participants, who also donated blood samples, where DNA, plasma, serum and white blood cells were immediately isolated [32].
The National FINRISK studies, held every 5 years, are population-based chronic disease risk factor monitoring surveys, conducted to monitor the health of the Finnish population among persons aged 25-74 at recruitment [33].The participants are a random sample drawn from the population register, stratified according to geographical areas, sex and 10-year age groups to form a representative sample.
The FINRISK1997 (The National FINRISK 1997 Study) includes in total, 8444 individuals that were recruited to represent the population of five study areas across Finland.Standard clinical laboratory measurements were collected, and participants filled out questionnaires on physical activity and socioeconomical status [34].
The Dietary, Lifestyle and Genetic determinants of Obesity and the Metabolic syndrome (DILGOM, N = 5024) study was implemented within the framework of the National FINRISK 2007 Study.In 2007, all FINRISK health examinations participants were invited to a more detailed health examination (included in the DILGOM Study) concerning obesity and metabolic syndrome.
All participants from all three cohorts have given a written informed consent.

NMR biomarkers quantification, quality control & imputation
In total, 135 circulating metabolites were set to be quantified using proton NMR spectroscopy of native blood plasma in the Estonian cohort and serum in the Finnish cohorts.All cohorts were analyzed using the same high-throughput NMR metabolomics platform in the same analysis laboratory using a method described by Soininen et al. [35].This methodology provided information on different NMR metabolites including lipoproteins, proteins, low molecular weight metabolites, such as amino acids, glycolysis precursors and other small molecules.All measured metabolites were already described here [9,36].As a part of our NMR data quality control (QC), 23 phenotypes were discarded due to the missignes more than 10% of measurements.NMR measurement outliers were defined as measurement values, which were deviating more than three standard deviations from the mean and these values were removed from the data.As the second step, imputation was performed to fill the missing NMR values (call rate = 0.9) for the remaining metabolites, using Multivariate Imputation by Chained Equations (MICE) method implemented into the R package MICE [37].MICE imputation method required two data modifications performed before imputation: the individual values needed to be scaled to the same mean; in case of highly correlated phenotypes, one of them needed to be removed.The pairwise correlation threshold was set to 0.995, and 30 phenotypes were removed as a result, leaving us 82 phenotypes in total for the further analyses.As the final step, residuals for all 82 metabolites were calculated adjusting for sex, age, age 2 , BMI and 10 principal components, calculated based on the genetic data, to correct for population stratification (Supplementary Table 4).

Genotyping, quality control & imputation
All cohorts were genotyped using Illumina genotyping arrays: Illumina Human370CNV and Illumina OMNI-Express for the EGCUT data and Illumina Infinium HumanHap 610K and Illumina OMNI-Express for the DILGOM and FINRISK1997 cohorts, respectively.Sample quality control covered exclusion on the basis of genome-wide call rate, minor allele frequency, extreme heterozygosity, sex discordance, cryptic relatedness and ethnical outliers (Supplementary Table 3).SNP quality control was done on the basis of call rate across samples and deviation from Hardy-Weinberg equilibrium.Before the imputation, nonautosomal SNPs, SNPs with minor allele frequency <1% and palindromic SNPs were removed.All studies were imputed using 1000 Genomes Project multi-ethnic reference panel (Phase III, March 2012 release) using IMPUTEv2 software [38].

POE analysis
The POE analyses in all three population-based cohorts were performed using QUICKTEST software [30].This software implements a method for POE analysis using genetic data of unrelated individuals, based on comparing the phenotypic variance of the heterozygous genotype group to the variance observed in the homozygous groups [30].The method is based on the assumption that increased variance in the heterozygous group in unrelated individuals arises because that group contains two subgroups (paternal reference allele/maternal alternative allele and maternal reference allele/paternal alternative allele) each with different means.The POE test utilize the Brown-Forsythe test, modified to test the mean absolute deviation from the median in the heterozygous and homozygous groups [30] (Supplementary Figure 2).The POE test was performed on 5861 individuals in the EGCUT, 516 in the DILGOM and 8438 individuals in the FINRISK1997 cohort.

Meta-analysis
Meta-analysis for each analyzed 82 NMR metabolites were performed in up to 14,815 participants from 3 studies.Before meta-analysis, SNPs with minor allele frequency <1% and poorly imputed variants (IMPUTE info <0.8) were excluded from further analysis.Summary statistics for autosomal SNPs that passed quality control for all studies were combined in inverse-variance weighted fixed effect meta-analysis for each trait, as implemented in GWAMA [39].

Principal component analysis
The significance level threshold was calculated using principal component analysis.As NMR data are highly intercorrelated, simple Bonferroni correction would be too conservative, and therefore we used principal component analysis to estimate the number of independent phenotypes [40] using the R package Prcomp [41].In total, we found 26 principal components describing more than 99% of the variability in the NMR data.Therefore, we used genome-wide threshold divided by the number of independent phenotypes (5 × 10 -8 /26 = 1.92 × 10 -9 ) as the significance threshold corrected for the number of independent tests.

Results
We performed a genome-wide analysis to determine plausible POEs for the NMR metabolites.Genotype data of 9,362,347 SNPs was available in up to 14,815 individuals of European descent from three cohorts: one from Estonia (EGCUT) and two from Finland (DILGOM and FINRISK1997).We found 9 loci, with possible POEs, affecting levels of 11 metabolites (Table 1).Nine associations identified in the initial POE analysis reached the genome-wide significant threshold (p ≤ 5 × 10 -8 ).Principal components analysis was used to calculate the genome-wide p-value threshold for the number of independent phenotypes analyzed, resulting in the new threshold p ≤ 1.92 × 10 -9 (see methods for the multiple correction calculation).After multiple testing correction only the variant rs1412727 remained significant (MAF = 0.17; p = 8.29 × 10 -11 ) located in the intron of the PTPRD (protein tyrosine phosphatase, receptor-type, D) gene and associated with metabolites called triglycerides in medium-size very lowdensity lipoprotein (VLDL) (Supplementary Figure 1).Also no heterogeneity was observed for this variant among all cohorts (i 2 = 0; HetP = 0.836304; Cochran's Q statistic = 0.854866).
In addition to the genome-wide analysis, we selected previously established associations from the study by Kettunen et al. [9], where meta-analysis of 123 metabolites in 24,925 individuals revealed 62 associations with NMR metabolite levels, and tested if any of these associations show POEs.In total, five loci had p-values less than 0.05 (Supplementary Table 1), but none of them reached the threshold considered for significance after multiple testing correction (p corrected < 0.001).
We also analyzed a variant rs1367117, which has been reported to have POE on glucose level [6,42].No significant association between the level of glucose from NMR metabolite panel and this variant was detected in our data.

Discussion
We performed a genome-wide analysis to find POEs of common variants for NMR metabolites in humans.We identified 11 signals that reached genome-wide significance threshold, but only 1 variant remained significant after the multiple testing correction considering the number of phenotypes analyzed.We assume that even though NMR metabolites have a very strong genetic component, POEs in NMR metabolites are small and require larger sample sizes to be detected.
We report a novel plausible POE association with the genetic variant rs1412727 that affects the level of the metabolite triglycerides in medium-size VLDL particle.This variant is a nonsynonymous intronic variant located in PTPRD gene, and thus we report for this gene as our main candidate.Measurement of associated molecule represents the number of an endogenously secreted triglycerides into the core of VLDL particles [35].After being produced in liver, the VLDLs then circulate in the bloodstream to go to the peripheral adipose and muscle tissue, where they are further broken down by lipoprotein lipases releasing triglycerides that can be used to supply fatty acids and glycerol to the cell as a source of energy [43,44].Several studies have provided evidence that the level of lipoproteins associated with triglycerides is causally related to the development of coronary heart disease [45][46][47].
PTPRD gene itself is already established gene from GWAS and known to be associated with several traits and diseases, such as Type 2 diabetes [48], age at menarche [49], epilepsy [50] and obsessive-compulsive disorder [51].To date, it is known that the protein encoded by this gene is a member of the protein tyrosine phosphatase (PTP) family that has a phosphorylation of tyrosine residues activity.This is the key regulatory mechanism important for cellular signaling that affects cellular events including cell metabolism, proliferation, adhesion, differentiation, migration and development [52][53][54].Although changing the phosphorylation activity of proteins is among the most well-studied mechanisms and well-used modification methods for studying the regulation of protein structure and function in cells and organisms, the function of PTPRD gene is yet poorly understood.The mechanism of this gene could be similar to another gene that belongs to the same PTP family -PTP1B gene is implicated as a negative regulation factor for fat triglycerides [55].Briefly, expression reduction of PTP1B results in a decrease of fat deposition and triglyceride levels by SREBP1, followed by a further downregulation of SPOT14, FAS, LPL and PPAR -important genes known to be involved in lipogenesis and fatty acid synthesis [55].Moreover, PTPRD gene has been demonstrated in association with weight gain caused by antipsychotic medications, which are used for patients with bipolar disorder and schizophrenia.The antipsychotic-induced weight gain phenotype may be involved in multiple pathways related to metabolic process.Thus, we believe that identification of the POE in the PTPRD gene could help in uncovering etiology of related diseases.
Discovering of POEs can help to reveal some of the hidden heritability.It is worth noting that it is not only important to find novel associations on the genotype-phenotype level, but also to consider already established variants for having more complex effects on phenotypes, which could be missed while testing association with using only additive genetic models.Therefore, we also investigated whether POEs can also be found for the genetic loci that have been previously associated with NMR metabolites by Kettunen et al. [9].Using the genetic additive model and a smaller dataset, we demonstrated the same associations with circulating metabolites while we did not detect any POEs.Also, we were not able to confirm POE of a previously established signal on the level of fasting glucose demonstrated by Hochner et al. located in the APOB gene [17].Although our study has revealed the POE on human circulating metabolites, future studies are required to provide replication and deeper understanding about this regulation in brain tissue of adult organisms.

Conclusion
POEs that include the process of genomic imprinting are important phenomena that are found for genes associated with the range of metabolic traits, such as Type 2 diabetes, menarche, metabolic syndrome and others.We, therefore, aimed to identify the POE on human circulating metabolites that serve as an intermediate trait between genetic variants and pathogenic outcomes.We performed genome-wide scan for POEs and also investigated them in already established genetic loci associated with NMR metabolites.We were not able to detect POEs for variants described in previously published comprehensive GWAS.On the other hand, the genome-wide POE analysis revealed the association of PTPRD gene with metabolite called triglycerides in medium VLDL molecule.In conclusion, we believe that POEs can be found for metabolic measurements, although expected effect is probably small.

Summary points
• Parent-of-origin effects (POEs) are present for several metabolic traits, for example, Type 2 diabetes, metabolic syndrome, BMI, menarche, gestational diabetes and many others.
• We used the modified Brown-Forsythe test to detect POEs in population-based data.
• We conducted genome-wide association POE analysis for 82 metabolites in 14,815 individuals of European ancestry.
• The analysis did not reveal POE in genetic variants previously associated with nuclear magnetic resonance metabolites.
• Genome-wide association POE analysis demonstrated association of PTPRD gene with metabolite called triglycerides in medium very low-density lipoprotein molecule.
• Variants in PTPRD gene region has been previously associated with epilepsy, age at menarche, Type 2 diabetes and obsessive-compulsive disorder.
• The function of PTPRD gene is yet poorly understood, although it was suggested to be implicated as a negative regulation factor for fat triglycerides.

Table 1 .
Parent-of-origin meta-analysis results for circulating metabolites identified from three cohorts.