We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
Skip main navigation
Aging Health
Bioelectronics in Medicine
Biomarkers in Medicine
Breast Cancer Management
CNS Oncology
Colorectal Cancer
Concussion
Epigenomics
Future Cardiology
Future Medicine AI
Future Microbiology
Future Neurology
Future Oncology
Future Rare Diseases
Future Virology
Hepatic Oncology
HIV Therapy
Immunotherapy
International Journal of Endocrine Oncology
International Journal of Hematologic Oncology
Journal of 3D Printing in Medicine
Lung Cancer Management
Melanoma Management
Nanomedicine
Neurodegenerative Disease Management
Pain Management
Pediatric Health
Personalized Medicine
Pharmacogenomics
Regenerative Medicine
Research ArticleOpen Accesscc iconby iconnc iconnd icon

Comparative analysis of sperm DNA methylation supports evolutionary acquired epigenetic plasticity for organ speciation

    Farideh Moharrek

    The Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health & Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark

    ,
    Lars R Ingerslev

    The Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health & Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark

    ,
    Ali Altıntaş

    The Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health & Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark

    ,
    Leonidas Lundell

    The Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health & Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark

    ,
    Ann N Hansen

    The Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health & Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark

    ,
    Lewin Small

    The Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health & Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark

    ,
    Christopher T Workman

    Department of Biotechnology & Biomedicine, Technical University of Denmark, Lyngby, 2800, Denmark

    &
    Romain Barrès

    *Author for correspondence: Tel.: +45 35 33 72 88;

    E-mail Address: barres@sund.ku.dk

    The Novo Nordisk Foundation Centre for Basic Metabolic Research, Faculty of Health & Medical Sciences, University of Copenhagen, Copenhagen, 2200, Denmark

    Institut de Pharmacologie Moléculaire et Cellulaire, Université Côte d’Azur & Centre National pour la Recherche Scientifique (CNRS), Valbonne, 06560, France

    Published Online:https://doi.org/10.2217/epi-2022-0168

    Abstract

    Aim: To perform a comparative epigenomic analysis of DNA methylation in spermatozoa from humans, mice, rats and mini-pigs. Materials & methods: Genome-wide DNA methylation analysis was used to compare the methylation profiles of orthologous CpG sites. Transcription profiles of early embryo development were analyzed to provide insight into the association between sperm methylation and gene expression programming. Results: We identified DNA methylation variation near genes related to the central nervous system and signal transduction. Gene expression dynamics at different time points of preimplantation stages were modestly associated with spermatozoal DNA methylation at the nearest promoters. Conclusion: Conserved genomic regions subject to epigenetic variation across different species were associated with specific organ functions, suggesting their potential contribution to organ speciation and long-term adaptation to the environment.

    Methylation of DNA is a well-studied epigenetic mark that is evolutionary conserved among most eukaryotic organisms such as fungi, plants and animals [1,2]. It involves adding a methyl group to a cytosine base within the DNA molecule and is associated with a number of key processes, including regulation of gene expression [3], genomic imprinting [4], X-chromosome inactivation [5], preservation of chromosome stability [6], repression of transposable elements [7,8], aging [9,10] and gametogenesis [11]. Aberrant DNA methylation is observed in a plethora of pathological conditions, such as carcinogenesis [12], kidney disease [13] and metabolic disease [14,15], supporting the notion that DNA methylation also participates in the risk to develop disease.

    The role of DNA methylation in gametogenesis is of particular importance: methylation marks in the gametes establish the developmental programming, thereby influencing cell fate, organ formation and the nature of the response to environmental factors later in life [11]. DNA methylation patterns are substantially lost during gametogenesis and subsequently restored at the fully differentiated gamete stage [16]. While such epigenetic reprogramming is conserved in mammalian species [16,17], alterations may influence gene expression and the predisposition to disease [18]. Epigenetic modifications of sperm cells in different phases of spermatogenesis take part in different aspects of embryonic development upon fertilization [19]. Environmental factors like dietary habits, physical activity and pollutants modulate the DNA methylation signature of spermatozoa and the phenotype of the offspring [20–22], indicating that epigenetic variation in sperm may contribute to environmentally driven phenotypic plasticity in the next generation of offspring.

    While several studies have mapped the epigenetic patterns of spermatozoa from humans and rodents separately [23–25], little is known about the comparative differences across these species. In the present study we compared the DNA methylomes of spermatozoa from humans, mice, rats and mini-pigs, with a specific interest in conserved genomic regions. Species-specific epigenetic signatures were analyzed for their potential role during early embryo development by investigating the expression of genes near epigenetically variable regions in human and mouse embryos at the preimplantation stage. Our analysis identified epigenetic variation near genes referenced as participating in brain function and the perception of smell, supporting species-specific means to develop certain physiological functions.

    Materials & methods

    Biological samples & bisulfite sequencing

    We performed our analyses on a panel of publicly available data sets, generated for different studies. Human (Gene Expression Omnibus [GEO] accession: GSE49624, GSE109478), mouse (GSE49624, GSE174093) and rat (GSE79566, GSE173898) sperm [20,25–28] are included in whole-genome bisulfite sequencing (WGBS) and reduced-representation bisulfite sequencing (RRBS) data sets, respectively. Samples were only taken from the control groups of the studies in which an exposure/intervention was conducted.

    For RRBS on mini-pig sperm, semen samples were collected from four 8-month-old genetically inbred 100% miniature pigs from Specipig® (Pioneer breeding and biomedical research centre, Barcelona, Spain). Genomic DNA was extracted from the sperm cells using an AllPrep DNA/RNA/miRNA Universal Kit (Qiagen, Hilden, Germany; cat. no. 802224). The procedure was changed for processing of sperm according to the recommendations of the manufacturer. RRBS libraries were constructed from 100 ng genomic input DNA using the Ovation® RRBS Methyl-Seq library preparation kit (Tecan, Männedorf, Switzerland; cat no. 0553-32). Briefly, genomic DNA was digested with MspI enzyme, followed by adaptor ligation to Ovation RRBS Methyl-Seq adaptors containing unique molecular barcodes, followed by a final end-repair step. The ligated DNA was subjected to bisulfite conversion and left in the thermal cycler overnight at room temperature. The following day, the bisulfite-converted DNA was amplified by PCR followed by a final clean-up step. Sequencing of RRBS libraries was performed on a NovaSeq 6000 platform (Illumina, CA, USA) as 101-bp single-end reads. The results are available in the NCBI GEO database, with the accession number GSE213548.

    For human WGBS data, samples were obtained from men attending the University of Utah Andrology laboratory consented for research (Institutional Review Board approved protocol # 7790). For mouse WGBS data, the study was done under an approved Institutional Animal Care and Use Committee protocol at University of Utah School of Medicine, UT, USA. For rat WGBS data, part of the study was approved by the Zoos scientific review committee and also by the Institutional Animal Care and Use Committee at University of Southern California, CA, USA. For human RRBS data, the study was approved by the Ethics Committee from the Capital Region of Denmark (reference H-1-2013-064) and informed consent was obtained from all participants. For mouse RRBS data, all procedures followed the guidelines of the National Institutes of Health Guide for the Care and Use of Laboratory Animals and the approval for the study was received from the Institutional Animal Care and Use Committee at University of Massachusetts, Amherst, USA. For rat RRBS data, the study was performed following the National Institutes of Health guidelines for the care and use of laboratory animals and approved by the Institutional Animal Care and Use Committee at The Lundquist Institute for Biomedical Innovation at Harbor–UCLA Medical Center, CA, USA. For mini-pig data, all animal procedures were performed following the National Institutes of Health guidelines for the care and use of laboratory animals at Bionea Lab and approved by the French Ministry of Education and Research (reference APAFIS#22853-201911212455784v1).

    Processing bisulfite sequencing data sets

    Trim Galore! v.0.4.3 (www.bioinformatics.babraham.ac.uk/projects/trim_galore/) was used to trim adapter sequences incorporated in the bisulfite sequencing reads with default parameters, except that we applied a stringency of 3 for both WGBS and RRBS data sets with the –rrbs flag only for the RRBS data set. We eliminated reads shorter than 15 bp after trimming. MultiQC v.0.8 [29] was used for quality control. The trimmed reads were aligned to the human (hg38), mouse (mm10), rat (rn6) or pig (susScr11) genomes, using the Bismark aligner v.0.18.1 [30]. We applied the Bismark deduplication script to the WGBS data set to remove reads mapped to identical starting positions [31]. RRBS reads from mini-pigs were deduplicated using the deduplicate_bismark tool with the –barcode flag because they contained unique molecular identifiers. The bismark_methylation_extractor tool was used to estimate the methylation level at each particular cytosine site. SNPs from the dbSNP database [32] (hg38 [build 151]; mm10 [build 142]; rn6 [build 149]; susScr11[build 150]) and CpGs with coverage higher than ten-times the 95th percentile of coverage were filtered out.

    Orthologous CpGs were assigned using the liftOver tool from the University of California Santa Cruz (UCSC) genome browser [33] using the recently published Perl scripts [34] and the following procedure. The mouse, rat and mini-pig CpG sites were mapped to human coordinates (hg38) using chain files available in ftp://hgdownload.cse.ucsc.edu/goldenPath/hg38/liftOver/ and then mapped back to the first genome to ensure that it had a unique correspondent in both genomes. CpGs with a minimum of six counts were considered for further analysis. Multidimensional scaling (MDS) analysis was performed using the ratio of methylated to unmethylated CpGs (M value). The methylation percentages for different genomic regions, including gene bodies, promoters, exons and introns as well as global DNA methylation were obtained by aggregating the individual CpG methylation levels using the R package genomation v.1.20 [35]. Given that the assumptions of parametric testing were violated, the Scheirer–Ray–Hare test (similar to a two-way analysis of variance setup) was performed as previously described [28]. Subsequently, the Dunn post hoc test was performed to identify pairwise comparisons. The R package VennDiagram v.1.7.3 [36] was used to plot Venn diagrams, and Fisher’s exact test was carried out to determine the significance of the overlaps in a pairwise fashion. The density profiles of genes and gene features methylation were performed using the sm.density.compare function of the R package sm v.2.2-5.7.1 [37], applying 10,000 bootstrap for testing significant difference.

    Differentially methylated gene components

    The comparisons between the human, mouse, rat and mini-pig data sets were carried out using the R package GeneDMRs v.1 [38]. RefGene database hg38 and CpG island annotation from the UCSC genome browser (http://genome.ucsc.edu) were used. We obtained weighted methylation levels for each group, Q-values with the false discovery rate method [39] and methylation differences between species for all promoters, exons and introns. The differentially methylated regions (DMRs), including promoters (DMPs), exons (DMEs) and introns (DMIs), were defined as those with Q-values of less than 0.05.

    Functional annotations of cluster networks of all DMRs & biological enrichment of DMP-related genes

    To decipher the mechanism of action of DMPs, DMEs and DMIs identified from WGBS and RRBS data in each comparison, we made use of the ClueGO plug-in of Cytoscape v.3.9 [40], which illustrates functionally grouped networks of the nonredundant biological terms for large gene clusters. Enrichment analyses were performed using 20 genes for WGBS and ten genes for RRBS data sets in the cluster with gene ontology (GO) tree interval range of 6–14, a κ-score of 0.4 for network connectivity, two-sided hypergeometric test with p < 0.05 and Bonferroni correction. GO biological processes (GO:BP) of DMP related genes were determined with the Database for Annotation, Visualization and Integrated Discovery website [41] and visualized by the networkD3 v.0.4 [42] and GOplot v.1.0.2 R packages [43].

    Identification of hypomethylated regions & visualization of methylation profiles

    We additionally used MethPipe v.3.4.3 [44] to identify hypomethylated regions (HMRs) in each species using the WGBS data sets. Briefly, the alignment BAM files from Bismark were converted to the standardized SAM format using the MethPipe ‘format reads’ program. The MethPipe ‘methcounts’ function was then used to count the methylation level at each cytosine site. The ‘merge-methcounts’ program was used to merge the individual methcounts files from multiple replicates to produce a single estimate with higher coverage for each species. The ‘hmr’ program was then used to identify HMRs in each species. This program defines the methylation levels at single sites while considering the number of reads affecting those levels using a hidden Markov model method with a β-binomial distribution. We used MethBase [45] from the UCSC Genome Browser track hub to visualize the methylation profiles and HMRs.

    Expression analysis

    To investigate the association between DNA methylation level in DMP-related genes and early gene expression, we used stage-specific modules and looked at gene expression levels at different time points of human and mouse early embryos. Given that the zygotic genome activation (ZGA) starts during the one-to-two-cell stage in mice [46,47] and the four-to-eight-cell stage in humans [47–49], we obtained single-cell RNA sequencing reads for the oocyte and two-cell stages in mouse and for the oocyte and eight-cell stages in human from the GEO database (accession: GSE44183) [50].

    We aligned the reads to the human (hg38) and mouse (mm10) genomes, using STAR v.2.5.3a [51]. FeatureCounts v.1.5.3 [52] was used to conduct read summation onto genes. Differential expression testing between the oocyte and ZGA stages was performed using edgeR v.3.14.0 [53] with the quasi-likelihood test, then normalized values of transcripts per million (TPM; adjusted with gene length) were calculated for those genes that were overexpressed by the ZGA in at least one species. OrthoFinder [54] was used to identify orthologous gene groups using all available protein sequences (www.ensembl.org/) for human and mouse. Average TPM values of the cells or embryos from the same stage for the orthologous DMP-related genes were used for visualization.

    Results

    DNA methylation patterns at CpG sites orthologous to humans, mice, rats & mini-pigs

    To compare the DNA methylation pattern of spermatozoa from human, mouse and rat, we extracted WGBS and RRBS public data from the GEO database. For mini-pigs, we performed RRBS on spermatozoa isolated from mini-pig semen samples (Figure 1A). A summary describing the basic features of the data is shown in Supplementary Table 1. Sequencing reads were mapped to in silico-generated bisulfite-converted genomes corresponding to each species. For WGBS, we measured a DNA methylation signal from both strands in 28.8–44.4 million CpG sites per sample, while for RRBS we measured 2.1–7.3 million CpG sites per sample (Supplementary Table 2).

    Figure 1. Surveying DNA methylation in sperm cells across humans, mice and rats using whole-genome and reduced-representation bisulfite sequencing data sets and mini-pigs using the reduced-representation bisulfite sequencing data set.

    (A) Study design. (B) Multidimensional scaling plot of DNA methylation profiles in humans, rats, mice and mini-pig sperm cells using orthologous CpG sites. The CpG profiles of different species formed different groups, as expected. (C) Methylation percentage across different genomic regions as all CpGs (global DNA methylation), gene bodies, promoters, exons, introns, first exon and intron and rest of exons and introns of human, mouse and rat orthologous CpG sites using WGBS data set and (D) of human, mouse, rat and mini-pig orthologous CpG sites using RRBS data set. Significant differences in methylation percentages were seen across different genomic regions. (pregion = 1.06E-89 and 1.86E-24 for WGBS and RRBS data sets, respectively; Scheirer–Ray–Hare test). X-axis: genomic regions; Y-axis: methylation percentage. Mean and standard deviation are shown by solid horizontal and vertical lines, respectively. Dunn test results for the genomic regions are shown in Table 1. (E & F) Venn diagrams of all identified genes, promoters, exons and introns using (E) WGBS and (F) RRBS data sets for human–mouse and human–rat comparisons. (G & H) Density plots for DNA methylation in genes, promoters, introns and exons for (G) human–mouse and (H) human–rat using WGBS data sets.

    RRBS: Reduced-representation bisulfite sequencing; WGBS: Whole-genome bisulfite sequencing.

    Next, we mapped nonhuman DNA methylomes to the human reference genome at single-CpG resolution (see Methods section). CpG sites from mice and rats were mapped to the human genome, even when their orthologous positions did not correspond to a CpG site in humans; for example, in the context of CHG, CHH (where ‘H’ represents any base except G). This allowed us to map 12.5–17.6 million CpG sites per sample for WGBS data, and 1.6–4.2 million CpG sites per sample for RRBS data (Supplementary Table 2). We annotated 0.9–1.9 million fully conserved orthologous CpGs in the human and mouse genomes (i.e., CpG in both genomes) and 0.6–2.5 million fully conserved CpGs in the human and rat genomes using the RRBS and WGBS data sets, respectively, as well as 1.4 million fully conserved CpGs in the human and mini-pig genomes using the RRBS data set (Supplementary Table 2). For the subsequent analyses, we opted to only use fully conserved orthologous CpG sites.

    The methylation levels of orthologous CpGs were mapped against densities of genes, as shown in Supplementary Figure 1A. Comparing human and mouse genomes using WGBS data, we identified a respective distribution of orthologous CpG sites of 14.8, 37.3, 34.4 and 13.5% within promoters, exons, introns and intergenic regions, respectively, and 16.1, 14.1 and 69.8% within CpG islands, CpG island shores and other regions, respectively. Using the RRBS data, the distribution was 54.3, 21.8, 20.9 and 3.0% within promoters, exons, introns and intergenic regions, respectively, and 83.8, 8.1 and 8.1% within CpG islands, CpG island shores and other regions, respectively (Supplementary Figure 1B). Comparing the human and rat genomes and using WGBS data, the respective distribution within promoter, exon, intron and intergenic regions was 14.4, 42.7, 32.1 and 10.8%, and 20.4, 14.3 and 65.3% for CpG islands, CpG island shores and other regions. Using the RRBS data, the distribution was respectively 51.8, 23.5, 21.1 and 3.6% for promoter, exon, intron and intergenic regions and 78.8, 10.4 and 10.8% for CpG islands, CpG island shores and other regions (Supplementary Figure 1B). Comparing human and mini-pig genomes using the RRBS data, the distribution was respectively 50.0, 22.2, 23.8 and 4.0% within promoters, exons, introns and intergenic regions, and 84.8, 6.4 and 8.8% within CpG islands, CpG island shores and other regions (Supplementary Figure 1B). Thus our analysis showed a trend for a higher CpG density in promoter and CpG islands when using RRBS compared with WGBS data.

    Multidimensional scaling analysis of methylation levels for orthologous CpGs in humans, mice, rats and mini-pigs showed a very distinct separation of data (Figure 1B). While we found no significant differences between species in CpG methylation across the entire genomes (Scheirer–Ray–Hare test, pgroups = 0.26 and 0.31 based on WGBS and RRBS data sets, respectively), differences were highly significant for specific regions (Scheirer–Ray–Hare test, pregion = 1.06E-89 and 1.86E-24 based on WGBS and RRBS data sets, respectively). Most genomic regions displayed different DNA methylation levels, according to the Dunn test for pairwise comparisons (Table 1). Based on both WGBS and RRBS data sets and across species, we found that the annotation with the highest methylation levels were the exons, particularly the exons beyond the first exons (annotated ‘Rest of exons’ in Figure 1C & D). On the other side of the spectrum, promoter regions exhibited the lowest methylation levels at orthologous CpGs (Figure 1C & D).

    Table 1. Dunn post hoc test comparison of methylation levels in different genomic regions.
      WGBSRRBSWGBS/RRBS
    Region 1Region 2zp-valueAdjusted p-valuezp-valueAdjusted p-valueSignificance
    ExonsAll CpGs-1.1228730.2614910.313789-3.0146780.0034760.007227ns/
    First exonAll CpGs0.3942000.6934330.756472-0.8998760.4478490.447153ns/ns
    First exonExons1.5170730.1292480.1723301.7995380.0614440.078509ns/
    First intronAll CpGs2.1262920.0334780.0669572.911130.0047580.008622/
    First intronExons3.2491650.0011570.0037875.6726181.67E-081.05E-07/
    First intronFirst exon1.7320910.0832570.1303153.8760770.0001670.000424ns/
    GeneAll CpGs3.5597470.0003710.0014840.9482130.3254320.335646/ns
    GeneExons4.6826212.83E-062.54E-053.670770.0001880.000446/
    GeneFirst exon3.1655470.0015470.0046431.9779530.0601720.080592/
    GeneFirst intron1.4334550.1517270.195078-1.788930.0612720.083692ns/
    IntronsAll CpGs2.0307280.0422820.0761081.6970530.0634120.08704/
    IntronsExons3.1536010.0016120.0044654.8846722.55E-069.62E-06/
    IntronsFirst exon1.6365280.1017290.1464892.9651630.0042760.009155ns/
    IntronsFirst intron-0.0955630.9238670.923867-0.9766970.3579130.368311ns/ns
    IntronsGene-1.5290190.1262590.1748210.9587540.3591630.36978ns/ns
    PromoterAll CpGs4.3242571.53E-050.0001103.7876540.0001650.00047/
    PromoterExons5.4471305.11E-089.21E-076.7765474.67E-125.88E-11/
    PromoterFirst exon3.9300578.49E-050.0005094.8965432.11E-061.02E-06/
    PromoterFirst intron2.1979650.0279510.0591910.9399860.3341860.387544/ns
    PromoterGene0.7645090.4445630.5162672.9986430.00478530.009645ns/
    PromoterIntrons2.2935280.0218170.0490891.9975430.0603340.090555§/
    Rest of exonsAll CpGs-1.6962550.0898370.134756-3.9986750.0001130.00022ns/
    Rest of exonsExons-0.5733820.5663850.6371840.98764530.2449730.401239ns/ns
    Rest of exonsFirst exon-2.0904550.0365760.069303-2.9754320.00412450.010127§/
    Rest of exonsFirst intron-3.8225470.000130.000679-6.7891784.77E-118.22E-10/
    Rest of exonsGene-5.2560031.47E-071.76E-06-4.8765432.54E-061.11E-05/
    Rest of exonsIntrons-3.7269840.0001930.000872-5.7659861.55E-081.22E-07/
    Rest of exonsPromoter-6.020511.73E-096.25E-08-7.7654325.56E-142.12E-12/
    Rest of intronsAll CpGs1.7798730.0750960.122885-1.8875430.0601230.091357ns/
    Rest of intronsExons2.9027470.0036990.0095110.9678530.3345750.434678/ns
    Rest of intronsFirst exon1.3856730.1658460.205878-0.9268640.3345910.454326ns/ns
    Rest of intronsFirst intron-0.346410.7290280.771912-4.7894822.45E-061.23E-05ns/
    Rest of intronsGene-1.7798730.0750960.128737-2.8280930.0047890.010887ns/§
    Rest of intronsIntrons-0.2508540.8019260.82483-3.8077780.0001280.000412ns/
    Rest of intronsPromoter-2.5443830.0109470.026273-5.7681661.54E-081.3E-07§/
    Rest of intronsRest of exons3.4761290.0005080.0018311.7978540.0613440.099954/

    Regions 1 and 2 represent the genomic regions compared; z is the Dunn z-test statistic.

    †padj < 0.001.

    ‡padj = 0.001–0.01.

    §padj = 0.01–0.05.

    ¶padj = 0.05–0.1.

    ns: Not significant; RRBS: Reduced-representation bisulfite sequencing; WGBS: Whole-genome bisulfite sequencing.

    Human and mouse orthologous CpG sites identified in WGBS data were found in 13,134 genes, 4839 promoters, 15,147 exons and 10,769 introns. From these CpG sites, we found sites in common with the RRBS data for 6778 genes, 2207 promoters, 3575 exons and 2337 introns (Fisher’s exact test, all p < 0.05) (Figure 1E). When analyzing human and rat orthologous CpG sites based on the WGBS data, we found orthologous CpG sites in 7844 genes with 2224 promoters, 7052 exons and 4679 introns, of which we found sites in common within the RRBS data for 3214 genes, 773 promoters, 1439 exons and 970 introns (Fisher’s exact test, all p < 0.05) (Figure 1F). Human and mini-pig orthologous CpG sites identified in RRBS data were found in 16,028 genes, 14,513 promoters, 12,872 exons and 10,497 introns.

    Using WGBS data for each genomic region of interest, the DNA methylation patterns had a bimodal distribution, with important densities of hypermethylated or hypomethylated regions but a low density of intermediate methylation levels in both human–mouse (Figure 1G) and human–rat (Figure 1H) comparisons. For each annotation, the bimodal distribution was differently balanced between hypermethylation and hypomethylation: while introns and exons showed a comparable bimodal DNA methylation pattern with a higher density of hypermethylation, in promoters, CpG sites were largely unmethylated (Figure 1G & H). For each annotation, the methylation density of orthologous CpG methylation showed significant hypomethylation in humans compared with both mice and rats (all p < 0.001). Methylation patterns of all genes/cytosine sites across the different species comparisons and gene bodies in different CpG island regions confirmed that methylation levels within CpG islands are lower compared with CpG island shores (all p < 0.0001) (Supplementary Figure 2).

    We next aimed to study the distribution of DNA methylation across gene bodies. As the density of exons and introns containing orthologous CpGs decreased within a gene as the ordinal positions of the exons or introns increased, we opted to compare the methylation levels at promoter regions versus the first 20 exons/introns (Supplementary Figure 3). We observed that the methylation levels of promoters were lower than those of the first 20 exons (all p < 0.0001) (Figure 2A). We found that exons/introns at different ordinal positions have different methylation levels, with the first exon’s methylation levels being lower than those of the remaining exons in all species (all p < 0.0001) (Figure 2A). Likewise, the first introns had lower levels of methylation than the other introns (all p < 0.0001) (Figure 2B).

    Figure 2. Methylation patterns of gene components.

    (A) Violin plots of methylation levels (%) of promoters and the first 20 ordinal positions of exons for the human, mouse, rat and mini-pig groups. (B) Scatter plots with marginal histograms of methylation levels (%) of the first 20 ordinal positions of introns for human–mouse and human–rat WGBS data sets and human–mini-pig RRBS data set.

    RRBS: Reduced-representation bisulfite sequencing; WGBS: Whole-genome bisulfite sequencing.

    Differential methylation identifies pathways related to the function of the central nervous system

    We next aimed to identify differential DNA methylation for the genomic regions carrying the orthologous CpG sites characterized above. We focused on regions with methylation differences higher than 20% across all comparisons, because we found that most methylation differences were between 0 and 20% in all species (Supplementary Figure 4). The detailed information regarding the DMPs, DMEs and DMIs is listed in Supplementary Tables 3–7.

    Comparing humans and mice using WGBS and RRBS, we identified, respectively, 558 and 195 DMPs, 1769 and 578 DMEs, and 1602 and 266 DMIs. We found that the numbers of genes in common between DMPs, DMEs and DMIs were 51 and 18 using WGBS and RRBS, respectively (Fisher’s exact test; all p < 0.05) (Figure 3A). Methylation differences within promoters, exons and introns were similar using WGBS and RRBS data sets, which supports that these two techniques are analogous to detect DNA methylation differences (Figure 3B). Accordingly, our results revealed a good relationship in DNA methylation differences from the same DMRs (Pearson’s r = 0.87) between WGBS and RRBS (Figure 3C). To get insight into the functional consequence of species-specific methylation at conserved regions, we analyzed the GO:BP networks of the genes with proximity to the DMRs that we identified. We found that DMRs were concentrated within 20 and ten GO:BP groups based on the WGBS and RRBS data sets, respectively (Figure 3D & E & Supplementary Tables 8 & 9). A substantial number of GO:BP terms relating to the central nervous system were shared between the data sets, including ‘chemical synaptic transmission’ and ‘nervous system development’.

    Figure 3. All differentially methylated regions identified from whole-genome and reduced-representation bisulfite sequencing data sets.

    (A) Venn diagrams of all DMPs, DMEs and DMIs for human–mouse. (B) Box plots of methylation differences (%) for DMPs, DMEs and DMIs for human–mouse. Note: Methylation difference (%) describes the variation in methylation levels between the groups (e.g., human–mouse). The width of the box plots indicates the proportion to the square root of the number of observations. (C) Correlation plot of methylation differences (%) of differently methylated regions (DMRs) among WGBS and RRBS data sets for human–mouse. (D & E) Functionally grouped network of enriched categories using all DMRs for human–mouse using (D) WGBS and (E) RRBS data sets. Gene ontology terms are displayed as nodes, and the node size indicates the term enrichment significance. The most significant terms are shown. (F–J) The same structure for the human–rat data set. (K) Venn diagrams of all DMPs, DMEs and DMIs for the human–mini-pig RRBS data set. (L) Functionally grouped network of enriched categories using all DMRs for the human–mini-pig RRBS data set.

    DME: Differentially methylated exon; DMI: Ddifferentially methylated intron; DMP: Differentially methylated promoter; RRBS: Reduced-representation bisulfite sequencing; WGBS: Whole-genome bisulfite sequencing.

    Comparing humans and rats, we identified 184 and 163 DMPs, 643 and 419 DMEs, and 398 and 175 DMIs using WGBS and RRBS, respectively. We identified 15 and 20 common genes related to DMPs, DMEs and DMIs (Fisher’s exact test; all p < 0.05) (Figure 3F). Similar to the human–mouse comparison, methylation differences within promoters, exons and introns showed a similar distribution using the WGBS and RRBS data sets (Figure 3G). Here also, methylation differences of DMRs identified in the WGBS and RRBS data sets were highly correlated (Pearson’s r = 0.91) (Figure 3H). The biological process network of all identified DMRs mainly consisted of 12 groups based on both WGBS and RRBS data sets (Figure 3I & J & Supplementary Tables 10 & 11). As in the human–rat comparison, many of the biological processes shared were related to the central nervous system, including ‘neurogenesis’ and ‘modulation of chemical synaptic transmission’. Comparing human and mini-pig RRBS data, we identified 609 DMPs, 2223 DMEs and 1460 DMIs, of which 96 were common (Fisher’s exact test; p < 0.05) (Figure 3K). DMRs were enriched in 40 GO:BP groups, several of which were related to signal transduction and central nervous system development (Figure 3L & Supplementary Table 12), similar to human and rodent comparisons. We identified some specific terms including ‘regulation of hair follicle development’, ‘sensory perception of sound’, ‘eye development’ and ‘ear morphogenesis’ (Figure 3L & Supplementary Table 12).

    Biological enrichment of DMP & DME1-related genes in human–mouse comparison

    Our analysis of WGBS data from human and mouse sperm identified the promoters of MIR127 (Q-value: 1.6 × 10-162; methylation difference: - 93%), MBD1 (Q-value: 2.3 × 10-146; methylation difference: - 92%), LRRC30 (Q-value: 8.6 × 10-101; methylation difference: - 97%) and AMIGO3 (Q-value: 1.3 × 10-86; methylation difference: - 89%) as the most differentially methylated between species. The genes AMIGO3 and LRRC30 were common between the most DMPs and first exon (Figure 4A & B). GO enrichment analysis of the promoters harboring lower methylation in humans compared with mice returned terms related to muscle development, RNA processing and the development and function of the nervous system, while the DMPs with lower methylation levels in mice than in humans were largely related to genes controlling muscle function, intracellular signaling pathways and the perception of smell (Figure 4C & Supplementary Table 13). To get further insight into the cell functions possibly affected by distinct epigenetic signatures in different species, we plotted links between the genes nearest to the DMRs common to DMPs and DME1 and their corresponding GO terms. Out of the 171 DMRs common to DMPs and DME1, we noticed an important fraction of DMRs near genes related to signal transduction (35 genes) and linked to significant GO terms like ‘olfactory receptor activity’ (GO: 0004984) and ‘detection of chemical stimulus involved in sensory perception of smell’ (GO: 0050911), while four DMRs were near genes linked to GO terms related to neuronal function (e.g., ‘synapse assembly’ [GO: 0007416]) (Figure 4D). Thus, our analysis supports that regions carrying orthologous CpG sites have distinct epigenetic patterns in human and mouse.

    Figure 4. Differentially methylated promoters and differently methylated first exon identified from the human–mouse whole-genome bisulfite sequencing data set.

    (A & B) Manhattan plot representing the top 20 (A) DMPs and (B) DME_1s for all the chromosomes. The red line represents the significant level of Q-value < 0.05. (C) Sankey plots of significantly enriched terms using DMPs. (D) Circos plot of relationships between DMP/DME_1 common genes and significant gene ontology terms. Note: Methylation difference (%) refers to the average methylation difference in differentially methylated regions common to DMP and DME_1.

    DME_1: Differentially methylated exon 1; DMP: Differentially methylated promoter.

    Mapping of HMRs in sperm

    Taking advantage of the higher resolution provided by WGBS compared with RRBS, we used a hidden Markov model approach to identify HMRs in each species individually. We identified 89,761, 72,917 and 73,185 HMRs with an average size of 2.4, 1.8 and 1.3 kbp for humans, mice and rats, respectively (Supplementary Table 14). Promoters were overlapping more with the HMRs compared with exons and introns, and the density of HMRs decreased toward the 3′ end of the genes (Supplementary Figure 5 & Supplementary Table 15).

    We next aimed to study the cross-species differences in the size of HMRs that overlapped with the identified DMPs in the human–mouse comparison. Our results showed that species have distinct HMR boundaries. HMRs that overlapped with the promoters of genes harboring lower methylation in humans compared with mice covered a larger region in humans than in rodents. Five genes showed methylation gain (loss of HMRs) in rodents (Supplementary Figure 6A). Conversely, the promoters of genes with lower methylation levels in mice than in humans largely corresponded to lost HMRs in humans (i.e., HMR gain in rodents) (Supplementary Figure 6B). Thus, our analysis reveals a general evolutionary extension of the HMR sizes in humans compared with rodents for the genes with lower methylation levels in human promoters than in mouse promoters (Supplementary Figure 6C). HMR gain in rodents occurred more frequently in the genes with lower methylation levels in mice than in humans, and the HMRs were not significantly larger in size for these genes in rodents (Supplementary Figure 6D).

    Four examples of methylation levels and identified HMRs across the AMIGO3, TAS2R41, SHANK2 and OR6S1 genes in humans and their orthologs in rodents are shown in Figure 5. AMIGO3, identified as a common gene between the top 20 DMP and DME1 genes (Figure 4A & B) is a gene well characterized for its role in brain development [55]. AMIGO3 overlapped with an HMR in humans (light blue line) but had a hypermethylated pattern in mice and rats (Figure 5A). We identified TAS2R41, a gene that encodes a bitter taste receptor, as linked to G-protein-coupled receptor (GPCR) activity and as harboring both DMP and DME1 (Figure 4D). In contrast to AMIGO3, TAS2R41 harbors a hypomethylated promoter and exon in mice and rats, and a hypermethylated promoter and exon in humans (Figure 5B). SHANK2, which was found in three GO terms – ‘memory’, ‘synapse assembly’ and ‘long-term synaptic potentiation’ (Figure 4C–D) – contains an HMR at the start of the gene in humans, but hypermethylated regions in mice and rats (Figure 5C). OR6S1, which showed a higher methylation level in humans than in mice and was involved in three significant GO terms – ‘G-protein-coupled receptor activity’, ‘olfactory receptor activity’ and ‘detection of chemical stimulus involved in sensory perception of smell’ (Figure 4C & D) – contains an HMR in mice and rats but is mostly hypermethylated in the same regions in humans (Figure 5D).

    Figure 5. The differential presence of hypomethylated regions in the gene structure between human and rodents in MethBase through the University of California Santa Cruz Genome Browser track hub.

    (A)AMIGO3, a gene well characterized for its role in brain development, overlaps with an HMR in humans but has a hypermethylated pattern in mice and rats. (B)TAS2R41 harbors a hypomethylated promoter and exon in mice and rats, and a hypermethylated promoter and exon in humans. TAS2R41 encodes a bitter taste receptor, as linked to G-protein-coupled receptor activity and as harboring both a differentially methylated promoter and differentially methylated first exon. (C)SHANK2 contains an HMR at the promoter, first exon and part of the first intron in humans, but hypermethylated regions in mice and rats. SHANK2 was found in three gene ontology terms: memory, synapse assembly and long-term synaptic potentiation. (D)OR6S1 has a hypomethylated structure in mice and rats but is mainly hypermethylated in humans. OR6S1 was involved in three significant gene ontology terms: G-protein-coupled receptor activity, olfactory receptor activity and detection of chemical stimulus involved in sensory perception of smell. Blue bars: HMRs. Note: The methylation data of rat were converted to the corresponding locations in the mouse genome using the LiftOver tool.

    HMR: Hypomethylated region.

    Gene expression in the early embryo is moderately associated with DNA methylation in sperm

    Given that methylation of promoters controls gene expression programming [56], we investigated whether the DMPs that we identified in spermatozoa are linked to the expression levels of their nearest genes in early embryos. We studied the expression of genes near human–mouse DMPs in the oocyte and eight-cell stages in humans and in the oocyte and two-cell stages in mice. We used the OrthoFinder program to identify orthologous gene groups in the human and mouse genomes. OrthoFinder returned 16,552 orthologs with one or more genes in mouse and human, of which 14,045 were one-to-one (∼85%). OrthoFinder found only 355 and 270 unique genes in human and mouse, respectively (Supplementary Table 16). Out of 30 genes upregulated by ZGA in humans at the eight-cell stage and with lower methylation levels in humans compared with mice, 19 were similarly upregulated in mice by ZGA at the two-cell stage. Of these genes, eight did not show a significant pattern in mouse by ZGA, and three were significantly downregulated between the oocyte and two-cell stages in mice, showing an expression pattern loaded from the maternal side (Figure 6A & Table 2). Out of 15 genes with lower methylation levels in mice compared with humans, 12 were upregulated in mice by ZGA but did not show a significant pattern in humans, and three were similarly upregulated in humans by ZGA (Figure 6B & Table 2). Promoters of genes with lower methylation levels in mice compared with humans consistently exhibited higher transcription abundance (Figure 6C), thereby validating our approach.

    Figure 6. Comparison of early-stage gene expression profiles in human and mouse.

    (A & B) Expression values for the genes with lower methylation levels (A) in human promoters compared with mouse promoters and (B) in mouse promoters compared with human promoters. (C) Violin plots displaying the expression values distribution of the genes in human and mouse oocyte and zygotic genome activation stages. Note: Gene names are based on human gene symbols, and mouse orthologous counterparts are considered.

    ****p < 0.0001; **p = 0.001–0.01; *p = 0.01–0.05.

    TPM: Transcripts per million; ZGA: Zygotic genome activation.

    Table 2. Integrative analysis between the early upregulated genes in human and mouse and methylation differences in differentially methylated promoters identified for the human–mouse data set based on the whole-genome bisulfite sequencing approach.
    HumanMouseHuman–mouse
    SymbolChromosomeLog2 (TPM) (oocyte)Log2 (TPM) (eight-cell)SymbolChromosomeLog2 (TPM) (oocyte)Log2 (TPM) (two-cell)Methdiff(%)
    APBA3194.766308.26442Apba3107.0920711.5555-21.4
    BEX3X7.0975410.4654Bex3§X6.607056.60705-59.3
    BRD953.995349.37145Brd91310.571513.00183-28.8
    C6orf6269.7427612.8082BC0055371310.771914.57155-87.5
    CALM31911.1580914.8753Calm376.7681913.72426-69.04
    CCAR285.5669211.1335Ccar21410.134411.44240-57.1
    CCDC8033.542735.97639Ccdc80§164.465585.714132-59.6
    CNP176.4289912.2511Cnp§114.471475.721982-89.9
    DDX23125.2490312.3972Ddx23§154.243994.900492-70.1
    DDX4156.3489812.2512Ddx411310.345013.22101-44.6
    DHX916.6273614.8219Dhx917.2652111.33542-33.3
    DUT158.7015614.6849Dut216.505813.75487-90.3
    GEMIN4174.7278910.1160Gemin4§114.666844.666842-35.9
    GPN126.4995014.1979Gpn156.8252312.46800-42.5
    HNF4A204.374528.76420Hnf4a27.638144.296142-91.1
    JMJD8165.856739.61954Jmjd8§175.103425.103422-38.0
    MAP1B53.6022310.1704Map1b1312.61089.759949-27.2
    MBD1184.7695213.2673Mbd1186.9796410.41811-92.8
    MRPL54197.2725011.9715Mrpl54108.5264313.07831-21.4
    NOB1169.7881814.936Nob189.7595315.09489-50.8
    PPP1CA1111.001115.028Ppp1ca1913.479715.13145-60.0
    PXMP4204.734939.82065Pxmp429.9499911.16558-40.9
    RPL10L147.6240215.3050Rpl10l128.2987615.59592-67.5
    STMN3205.651859.06783Stmn326.9177515.51350-21.9
    TMEM92175.6362214.2066Tmem92115.7540413.77150-44.8
    TRNAU1AP17.8937713.84522Trnau1ap§413.358812.23958-38.9
    TSEN54175.470389.94774Tsen54115.2700811.38114-20.8
    TXNL4B1610.850615.37286Txnl4b§88.820598.784712-96.3
    ZMYND82011.231114.14285Zmynd8210.914811.65082-51.5
    ZPR1114.724018.217162Zpr198.3837415.94807-47.1
    CARD6§55.154775.15477Card6159.363911.7449987.8
    CCT21210.549815.57197Cct21012.036415.3669343.0
    CRAT§96.560415.956541Crat25.5020810.7001633.5
    CRYBG2§14.75775.679407Crybg243.60398.61233570.2
    DNPEP26.8500611.47705Dnpep15.0010711.0765128.1
    KLB§44.665255.553977Klb55.805010.4552394.0
    LAT§164.103065.651857Lat75.421309.8738720.6
    MAN2C1§154.061035.344306Man2c194.4512810.7376732.6
    NAALADL1§115.233485.276181Naaladl1195.2508411.7944383.1
    PTPA§97.346898.948670Ptpa27.4088310.8903633.506
    RPL3L§165.787076.598129Rpl3l175.8512312.2072687.5
    S100A1316.1027310.98989S100a1337.0515510.9596194.2
    SLC39A1§15.490266.303590Slc39a136.0397013.3071331.2
    TEF§227.783928.672529Tef156.3480411.6327742.6
    VAMP8§29.3863210.04429Vamp868.8761912.6858444.5

    †This table only lists genes which are overexpressed by the ZGA in at least one species (with false discovery rates <0.05).

    ‡Methylation difference (%) in differentially methylated promoters.

    §No significant pattern was found between the oocyte and ZGA stages.

    ¶Significant downregulation pattern was found between the oocyte and ZGA stages.

    TPM: Transcripts per million; ZGA: Zygotic genome activation.

    Discussion

    Here we performed an epigenomic comparison of DNA methylation using WGBS and RRBS data from spermatozoa of humans, mice and rats and RRBS data from spermatozoa of mini-pigs. We identified that, while a substantial fraction of epigenome methylation marks are conserved across evolution, some orthologous CpG sites are differentially methylated. Strikingly, the pathways of genes differentially methylated between species were related to distinct functions. Our analysis supports an evolutionary selection of species-specific DNA methylation footprint in orthologous genomic sequences that allows speciation of organ function.

    Our study confirmed a bimodal distribution in sperm-cell DNA methylation, consistent with previous reports [23,25]. This bimodal distribution pattern may be responsible for maintaining the basal transcription profile of the preimplantation embryo [57]. Moreover, our study showed that the first exons and first introns had lower levels of methylation than the other exons and introns. This is in accordance with the results of previous findings in pig [58] and fish [59] that reported a lower methylation level on the first exons and introns. Expression levels of the exon gene product are negatively associated with the DNA methylation levels of the first exons [60] and the first introns [59], thus supporting a functional role of sperm DNA methylation on gene expression post-fertilization in the early phase of embryogenesis.

    GO terms of orthologous genomic regions found to be differentially methylated when comparing humans with mice, rats and mini-pigs were largely found near genes associated with the development of the central nervous system and signal transduction. Thus genomic regions under epigenetic evolution may be involved in species-specific phenotypic variation, with humans using this means for adapting cognitive function.

    We found that promoters with lower methylation in humans compared with rodents are near genes related to the development and function of the nervous system. Conversely, promoters with hypomethylation in rodents compared with humans appeared to be enriched in the proximity of genes related to intracellular signaling and the perception of smell. These results are in line with a study showing that genes with human-specific hypomethylated promoters are related to the development of the central nervous system and enriched for genome-wide association study signals of brain-related traits [23]. In that study, genes with cattle-specific hypomethylated promoters were found to be primarily involved in lipid storage and metabolism [23]. Similarly, a study comparing human and chimpanzee sperm methylomes identified that genes linked to human-specific HMRs are linked to neuronal functions [61]. Previously, we discovered that the differences in DNA methylation in spermatozoa from obese individuals after gastric bypass-induced weight loss are located near genes linked to brain function [62]. Genes referenced as participating in brain function and development appear to carry hot spots of epigenetic variation, as endurance exercise training also altered the DNA methylation signature of human spermatozoa at gene hot spots related to brain function and development [20]. Given that DNA methylation at promoter and CpG-rich regions plays a role in gene repression, our results suggest that promoter hypomethylation in sperm is associated with transcriptional activation of genes after fertilization. A function of DNA methylation in spermatozoa on developmental programming and brain function is further supported by the observation that age-related DNA methylation changes in sperm are passed on to the next generation and contribute to the children of older fathers having an elevated incidence of neuropsychiatric illnesses [63,64]. Thus, we speculate that the hypomethylation of genes controlling the central nervous system in humans contributes to the speciation of brain function in humans compared with rodents.

    We identified that genes hypomethylated in rodents are enriched for pathways related to the perception of smell. It was previously posited that genes with promoters that lack HMRs in both human sperm and human embryonic stem cells have a strong enrichment for GPCR and sensory receptors that detect and respond to smell [61]. Odor molecules are detected by olfactory receptors (ORs), which constitute the largest group of GPCR genes. ORs include approximately 350 intact members in humans and approximately 1000 intact members in mice and rats [65]. These genes are largely expressed in sensory neurons found in the main olfactory epithelia of the nasal cavity. The difference in the number of genes between these species appears to be the consequence of both a significant increase in OR genes in the rodent lineage and a large deactivation of OR genes in the human lineage following the primate–rodent divergence [66]. It has been suggested that inactivation of OR genes in higher primates may be the consequence of the development of a full trichromatic vision which aids higher primates in finding food, mates and territory, potentially explaining a decreased need for olfaction [67]. Here we identified TAS2R41, a gene that encodes a bitter taste receptor and which is linked to GPCR activity, which harbors a hypomethylated promoter and exon in mice and rats and a hypermethylated promoter in humans. These findings support that the TAS2R41 gene is selectively silenced in humans, which is in line with previous studies showing a downregulation of taste receptor genes in humans compared with rodents [68,69].

    Our results identified a global trend for size extension in HMRs of humans which overlapped with promoters of the genes with lower methylation levels in humans compared with mice. We found rodent-specific HMR gains for genes with lower methylation levels in mice compared with humans. These observations are consistent with a previous study showing that rodent genomes exhibit a stronger tendency toward the birth of novel HMRs, whereas HMR boundaries of promoters have evolved to be significantly longer in primates than in rodents [25].

    It is fascinating to hypothesize that HMR size extension in human promoters might be associated with the evolution of new regulatory elements to enhance specificity of the transcriptional regulation of the highly specialized genes involved in the development of the nervous system. A particularly intricate regulation of gene expression is required for the highly specialized functions of neurons [70], which may be accomplished, at least partially, by utilizing the adaptability of transcription factor binding sites. A previous theoretical study on evolution of these sites showed that gain of transcription factor binding sites can be facilitated by the availability of longer regulatory sequences in which the sites can evolve simultaneously and by the biophysical cooperation between transcription factors, which can also match the theoretical time-scale computations derived from comparative genomics [71]. Lineage-specific promoter HMR extensions revealed enhanced RNA polymerase II and EP300 binding sites, and enrichment of numerous transcriptional activators [25].

    Our findings that humans and rodents have different types of genes that are hypomethylated supports a mechanism by which orthologous genes are selectively used between species to allow organ specialization. Whether species-specific epigenetic reprogramming of orthologous genes has occurred specifically in response to environmental factors, or may have been selected from stochastic epigenetic variation, remains to be established.

    Previous studies suggest that the epigenomic status of embryonic cells and the early activation of developmental genes are influenced by the methylation of germ-cell DNA [72]. Based on our comparison of the gene expression profiles during early embryo development for genes near DMPs, we identified that there are slightly more gene expression differences between the oocyte and eight-cell stages in human than between the oocyte and two-cell stages in mouse when only considering genes that display lower promoter methylation in human compared with mouse. Additionally, as expected, we confirmed that promoters of genes with a lower methylation level in mouse than in human consistently exhibited more gene expression differences between the oocyte and two-cell stages in mouse than between the oocyte and eight-cell stages in human. Together, interspecies differences in gene expression levels during early embryogenesis are modestly associated with promoter DNA methylation profiles in spermatozoa. It is worth mentioning that because the RNA expression and DNA methylation data sets were from independent studies, and therefore distinct tissues or cells, we were unable to make a paired analysis where we could more directly link changes in methylation to changes in gene expression. On the other hand, by using independent samples, our analysis was less affected by internal experimental biases.

    Here we found that, when overlapping CpGs annotated by both WGBS and RRBS within gene features, the proportion of cytosines annotated to non-promoter regions decreased, whereas the percentage of cytosines annotated to the promoter section of the genome increased. Additionally, the proportion of cytosines annotated among CpG island shores decreased, whereas the percentage of cytosines annotated within CpG islands increased. This discrepancy in mapping using the two sequencing techniques may be attributed to low coverage of CpGs within CpG-poor regions by the RRBS technique due to the restriction enzyme cut sites used in RRBS. Indeed, RRBS mainly targets promoters and CpG islands because the regions sequenced are flanked by the CCGG sequence, a CG-rich site [73]. It should be noted that the RRBS method superficially covers CpG shores and other intergenic regions and that RRBS data sets have a lower average methylation level compared with that of WGBS data sets, as large stretches of repeated elements in noncoding DNA regions are mainly methylated [74]. Given that the WGBS approach can generate a lot of reads in poorly assembled noncoding DNA regions, this could explain why WGBS data have a lower mapping yield than RRBS data [75]. For instance, even though all CpG sites should, in theory, be covered, some of them have low coverage (one to ten counts) or are not sequenced using the WGBS method [73]. Therefore the average read depth of the RRBS method was greater compared with the WGBS method in our study. Our results highlight the relevance of RRBS as a sequencing method for measuring CpG methylation in comparative studies among species, when sequencing costs, read coverage and the availability of enough methylation information are taken into account.

    Conclusion

    In conclusion, our detailed comparative analysis identified a divergence in DNA methylation patterns in spermatozoa from rodents, mini-pigs and humans, although the underlying DNA sequence is highly conserved. Our results revealed that the differences in the methylation footprint between human, rodents and mini-pigs are associated with species-specific phenotypic traits. Divergent epigenetic marks on orthologous sequences suggest a mechanism driving phenotypic plasticity and organ speciation, allowing long-term and stable adaptation to the environment.

    Future perspective

    The growing availability of epigenetic editing techniques may provide a way to test the function of hypomethylated orthologous sequences in sperm on the phenotype of the offspring. Future studies comparing the sperm DNA methylome of additional species will be useful to identify the universal factors driving species adaptation by epigenetic means.

    Summary points
    • Spermatozoa from rodents and humans have a bimodal distribution of DNA methylation levels.

    • DNA methylation is generally lower around promoters and first exons and gradually increases toward the 3′ end of the gene.

    • Human hypomethylated promoters are associated with neurological and brain-related functions, whereas mouse hypomethylated promoters are near genes involved in signal transduction and perception of smell.

    • Gene expression dynamics at different time points of preimplantation stages are only modestly associated with spermatozoal DNA methylation at the nearest promoters.

    • Reduced-representation bisulfite sequencing constitutes an effective method for comparative epigenomic analysis of spermatozoa.

    Supplementary data

    To view the supplementary data that accompany this paper please visit the journal website at: www.futuremedicine.com/doi/suppl/10.2217/epi-2022-0168

    Author contributions

    R Barrès conceived and designed the study. F Moharrek performed all the bioinformatics and statistical analyses with contributions from C Workman, L Ingerslev, A Altıntaş and L Lundell. A Hansen and L Small contributed to the material preparation. F Moharrek and R Barrès wrote the manuscript. All authors have reviewed and approved the final version of the manuscript.

    Acknowledgments

    The authors acknowledge the Single-Cell Omics platform at the Centre for Basic Metabolic Research (CBMR) for the technical expertise and support, and also thank X Wang for his constructive feedback on statistical matters.

    Financial & competing interests disclosure

    This research was funded by a Challenge Programme Grant from the Novo Nordisk Foundation under grant agreement NNF18OC0033754. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent research center at the University of Copenhagen, partially funded by an unrestricted donation from the Novo Nordisk Foundation (NNF18CC0034900). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

    No writing assistance was utilized in the production of this manuscript.

    Ethical conduct of research

    For human WGBS data, samples were obtained from men attending the University of Utah Andrology laboratory consented for research (IRB approved protocol # 7790). For mouse WGBS data, the study was done under an approved Institutional Animal Care and Use Committee protocol at University of Utah School of Medicine, USA. For rat WGBS data, part of the study was approved by the Zoos scientific review committee and also by the Institutional Animal Care and Use Committee at University of Southern California, USA. For human RRBS data, the study was approved by the Ethics Committee from the Capital Region of Denmark (reference H-1-2013-064) and informed consent was obtained from all participants. For mouse RRBS data, all procedures followed the guidelines of the National Institutes of Health Guide for the Care and Use of Laboratory Animals and the approval for the study was received from the Institutional Animal Care and Use Committee at University of Massachusetts, Amherst, USA. For rat RRBS data, the study was performed following the National Institutes of Health guidelines for the care and use of laboratory animals and approved by the Institutional Animal Care and Use Committee at The Lundquist Institute for Biomedical Innovation at Harbor–UCLA Medical Center, CA, USA. For mini-pig data, all animal procedures were performed following the National Institutes of Health guidelines for the care and use of laboratory animals at Bionea Lab and approved by the French Ministry of Education and Research (reference APAFIS#22853-201911212455784v1).

    Open access

    This work is licensed under theAttribution-NonCommercial-NoDerivatives 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

    Papers of special note have been highlighted as: • of interest; •• of considerable interest

    References

    • 1. Chan SW-L, Henderson IR, Jacobsen SE. Gardening the genome: DNA methylation in Arabidopsis thaliana. Nat. Rev. Genet. 6(5), 351–360 (2005).
    • 2. Law JA, Jacobsen SE. Establishing, maintaining and modifying DNA methylation patterns in plants and animals. Nat. Rev. Genet. 11(3), 204–220 (2010).
    • 3. Deaton AM, Bird A. CpG islands and the regulation of transcription. Genes Dev. 25(10), 1010–1022 (2011).
    • 4. Li E, Beard C, Jaenisch R. Role for DNA methylation in genomic imprinting. Nature 366(6453), 362–365 (1993).
    • 5. Boumil RM, Lee JT. Forty years of decoding the silence in X-chromosome inactivation. Hum. Mol. Genet. 10(20), 2225–2232 (2001).
    • 6. Smith SS, Crocitto L. DNA methylation in eukaryotic chromosome stability revisited: DNA methyltransferase in the management of DNA conformation space. Mol. Carcinog. 26(1), 1–9 (1999).
    • 7. Slotkin RK, Martienssen R. Transposable elements and the epigenetic regulation of the genome. Nat. Rev. Genet. 8(4), 272–285 (2007).
    • 8. Zemach A, McDaniel IE, Silva P, Zilberman D. Genome-wide evolutionary analysis of eukaryotic DNA methylation. Science 328(5980), 916–919 (2010).
    • 9. Voisin S, Jacques M, Landen S et al. Meta-analysis of genome-wide DNA methylation and integrative omics of age in human skeletal muscle. J. Cachexia Sarcopenia Muscle 12(4), 1064–1078 (2021).
    • 10. Gonzalo S. Epigenetic alterations in aging. J. Appl. Physiol. 109(2), 586–597 (2010).
    • 11. Stewart KR, Veselovska L, Kelsey G. Establishment and functions of DNA methylation in the germline. Epigenomics 8(10), 1399–1413 (2016).
    • 12. Wang Y-P, Lei QY. Metabolic recoding of epigenetics in cancer. Cancer Commun. 38(1), 25 (2018).
    • 13. Gluck C, Qiu C, Han SY et al. Kidney cytosine methylation changes improve renal function decline estimation in patients with diabetic kidney disease. Nat. Commun. 10(1), 2461 (2019).
    • 14. Simar D, Versteyhe S, Donkin I et al. DNA methylation is altered in B and NK lymphocytes in obese and type 2 diabetic human. Metabolism 63(9), 1188–1197 (2014).
    • 15. de Castro Barbosa T, Ingerslev LR, Alm PS et al. High-fat diet reprograms the epigenome of rat spermatozoa and transgenerationally affects metabolism of the offspring. Mol. Metab. 5(3), 184–197 (2016).
    • 16. Dean W, Santos F, Stojkovic M et al. Conservation of methylation reprogramming in mammalian development: aberrant reprogramming in cloned embryos. Proc. Natl Acad. Sci. USA 98(24), 13734–13738 (2001).
    • 17. Fulka H, Mrazek M, Tepla O, Fulka J. DNA methylation pattern in human zygotes and developing embryos. Reproduction 128(6), 703–708 (2004).
    • 18. Steele W, Allegrucci C, Singh R et al. Human embryonic stem cell methyl cycle enzyme expression: modelling epigenetic programming in assisted reproduction? Reprod. Biomed. Online 10(6), 755–766 (2005).
    • 19. Marques CJ, Francisco T, Sousa S, Carvalho F, Barros A, Sousa M. Methylation defects of imprinted genes in human testicular spermatozoa. Fertil. Steril. 94(2), 585–594 (2010).
    • 20. Ingerslev LR, Donkin I, Fabre O et al. Endurance training remodels sperm-borne small RNA expression and methylation at neurological gene hotspots. Clin. Epigenetics 10(1), 12 (2018). • Shows that genes controlling brain development carry hot spots of epigenetic variation in sperm after lifestyle intervention.
    • 21. Carone BR, Fauquier L, Habib N et al. Paternally induced transgenerational environmental reprogramming of metabolic gene expression in mammals. Cell 143(7), 1084–1096 (2010).
    • 22. Liu Y, Chen S, Pang D et al. Effects of paternal exposure to cigarette smoke on sperm DNA methylation and long-term metabolic syndrome in offspring. Epigenetics Chromatin 15(1), 3 (2022).
    • 23. Fang L, Zhou Y, Liu S et al. Comparative analyses of sperm DNA methylomes among human, mouse and cattle provide insights into epigenomic evolution and complex traits. Epigenetics 14(3), 260–276 (2019). •• Compared methylation profiles of sperm and large-scale genome-wide association study signals between human and cattle.
    • 24. Zhou J, Sears RL, Xing X et al. Tissue-specific DNA methylation is conserved across human, mouse, and rat, and driven by primary sequence conservation. BMC Genet. 18(1), 724 (2017).
    • 25. Qu J, Hodges E, Molaro A et al. Evolutionary expansion of DNA hypomethylation in the mammalian germline genome. Genome Res. 28(2), 145–158 (2018). •• Describes new insights into the evolutionary rate of the DNA methylation in sperm of seven vertebrate species.
    • 26. Hammoud SS, Low DH, Yi C, Carrell DT, Guccione E, Cairns BR. Chromatin and transcription transitions of mammalian adult germline stem cells and spermatogenesis. Cell Stem Cell 15(2), 239–253 (2014).
    • 27. Oluwayiose OA, Marcho C, Wu H et al. Paternal preconception phthalate exposure alters sperm methylome and embryonic programming. Environ. Int. 155, 106693 (2021).
    • 28. Altıntaş A, Liu J, Fabre O et al. Perinatal exposure to nicotine alters spermatozoal DNA methylation near genes controlling nicotine action. FASEB J. 35(7), e21702 (2021).
    • 29. Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19), 3047–3048 (2016).
    • 30. Krueger F, Andrews SR. Bismark: a flexible aligner and methylation caller for bisulfite-seq applications. Bioinformatics 27(11), 1571–1572 (2011).
    • 31. Bock C. Analysing and interpreting DNA methylation data. Nat. Rev. Genet. 13(10), 705–719 (2012).
    • 32. Sherry ST. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29(1), 308–311 (2001).
    • 33. Navarro Gonzalez J, Zweig AS, Speir ML et al. The UCSC genome browser database: 2021 update. Nucleic Acids Res. 49(D1), D1046–D1057 (2021).
    • 34. Blake LE, Roux J, Hernando-Herraez I et al. A comparison of gene expression and DNA methylation patterns across tissues and species. Genome Res. 30(2), 250–262 (2020). •• Similarities and differences in gene expression and DNA methylation levels between multiple tissues in primates, with a comparative analysis pipeline that reduces biases due to sequence divergence.
    • 35. Akalin A, Franke V, Vlahoviček K, Mason SE, Schübeler D. Genomation: a toolkit to summarize, annotate and visualize genomic intervals. Bioinformatics 31(7), 1127–1129 (2015).
    • 36. Chen H, Boutros P. VennDiagram: a package for the generation of highly customizable Venn and Euler diagrams in R. BMC Bioinform. 12(1), 35 (2011).
    • 37. Bowman AW, Azzalini A. ‘sm’: nonparametric smoothing methods. R package version 2.2-5.7 (2018). https://cran.r-project.org/web/packages/sm/sm.pdf
    • 38. Wang X, Hao D, Kadarmideen HN. GeneDMRs: an R package for gene-based differentially methylated regions analysis. J. Comput. Biol. 28(3), 304–316 (2021).
    • 39. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Methodol. 57(1), 289–300 (1995).
    • 40. Bindea G, Mlecnik B, Hackl H et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics 25(8), 1091–1093 (2009).
    • 41. Sherman BT, Hao M, Qiu J et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50(W1), W216–W221 (2022).
    • 42. Allaire JJ, Gandrud C, Russell K, Yetman CJ. Package networkD3. D3 JavaScript network graphs from R. R package version 0.4 (2017). https://cran.r-project.org/web/packages/networkD3/networkD3.pdf
    • 43. Walter W, Sánchez-Cabo F, Ricote M. GOplot: an R package for visually combining expression data with functional analysis. Bioinformatics 31(17), 2912–2914 (2015).
    • 44. Fang F, Hodges E, Molaro A, Dean M, Hannon GJ, Smith AD. Genomic landscape of human allele-specific DNA methylation. Proc. Natl Acad. Sci. USA 109(19), 7332–7337 (2012).
    • 45. Song Q, Decato B, Hong EE et al. A reference methylome database and analysis pipeline to facilitate integrative and comparative epigenomics. PLOS ONE 8(12), e81148 (2013). • Presents MethPipe, a pipeline for methylome analysis, and MethBase, a companion database of publicly available annotated methylomes.
    • 46. Bouniol C, Nguyen E, Debey P. Endogenous transcription occurs at the 1-cell stage in the mouse embryo. Exp. Cell Res. 218(1), 57–62 (1995).
    • 47. Madissoon E, Tohonen V, Vesterlund L et al. Differences in gene expression between mouse and human for dynamically regulated genes in early embryo. PLOS ONE 9(8), e102949 (2014). • Highlights substantial differences in gene expression patterns throughout the preimplantation development of mice and humans.
    • 48. Braude P, Bolton V, Moore S. Human gene expression first occurs between the four- and eight-cell stages of preimplantation development. Nature 332(6163), 459–461 (1988).
    • 49. Zhang P, Zucchelli M, Bruce S et al. Transcriptome profiling of human pre-implantation development. PLOS ONE 4(11), e7844 (2009).
    • 50. Xue Z, Huang K, Cai C et al. Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing. Nature 500(7464), 593–597 (2013).
    • 51. Dobin A, Davis CA, Schlesinger F et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013).
    • 52. Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30(7), 923–930 (2014).
    • 53. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1), 139–140 (2010).
    • 54. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16(1), 157 (2015).
    • 55. Kuja-Panula J, KiiltomäKi M, Yamashiro T, Rouhiainen A, Rauvala H. AMIGO, a transmembrane protein implicated in axon tract development, defines a novel protein family with leucine-rich repeats. J. Cell Biol. 160(6), 963–973 (2003).
    • 56. Levine M, Tjian R. Transcription regulation and animal diversity. Nature 424(6945), 147–151 (2003).
    • 57. Cedar H, Bergman Y. Programming of DNA methylation patterns. Annu. Rev. Biochem. 81(1), 97–117 (2012).
    • 58. Wang X, Kadarmideen HN. Characterization of global DNA methylation in different gene regions reveals candidate biomarkers in pigs with high and low levels of boar taint. Vet. Sci. 7(2), 77 (2020).
    • 59. Anastasiadi D, Esteve-Codina A, Piferrer F. Consistent inverse correlation between DNA methylation of the first intron and gene expression across tissues and species. Epigenetics Chromatin 11(1), 37 (2018).
    • 60. Chuang T-J, Chen F-C, Chen Y-Z. Position-dependent correlations between DNA methylation and the evolutionary rates of mammalian coding exons. Proc. Natl Acad. Sci. USA 109(39), 15841–15846 (2012).
    • 61. Molaro A, Hodges E, Fang F et al. Sperm methylation profiles reveal features of epigenetic inheritance and evolution in primates. Cell 146(6), 1029–1041 (2011).
    • 62. Donkin I, Versteyhe S, Ingerslev LR et al. Obesity and bariatric surgery drive epigenetic variation of spermatozoa in humans. Cell Metab. 23(2), 369–378 (2016).
    • 63. Atsem S, Reichenbach J, Potabattula R et al. Paternal age effects on sperm FOXK1 and KCNA7 methylation and transmission into the next generation. Hum. Mol. Genet. 25(22), 4996–5005 (2016).
    • 64. Jenkins TG, Aston KI, Pflueger C, Cairns BR, Carrell DT. Age-associated sperm DNA methylation alterations: possible implications in offspring disease susceptibility. PLoS Genet. 10(7), e1004458 (2014).
    • 65. Malnic B, Godfrey PA, Buck LB. The human olfactory receptor gene family. Proc. Natl Acad. Sci. USA 101(8), 2584–2589 (2004).
    • 66. Young JM, Trask BJ. The sense of smell: Genomics of vertebrate odorant receptors. Hum. Mol. Genet. 11(10), 1153–1160 (2002).
    • 67. Gilad Y, Wiebe V, Przeworski M, Lancet D, Pääbo S. Loss of olfactory receptor genes coincides with the acquisition of full trichromatic vision in primates. PLOS Biol. 2(1), e5 (2004).
    • 68. Wang X, Thomas SD, Zhang J. Relaxation of selective constraint and loss of function in the evolution of human bitter taste receptor genes. Hum. Mol. Genet. 13(21), 2671–2678 (2004).
    • 69. Go Y, Satta Y, Takenaka O, Takahata N. Lineage-specific loss of function of bitter taste receptor genes in humans and nonhuman primates. Genetics 170(1), 313–326 (2005).
    • 70. Naro C, Cesari E, Sette C. Splicing regulation in brain and testis: common themes for highly specialized organs. Cell Cycle 20(5), 480–489 (2021).
    • 71. Tuğrul M, Paixão T, Barton NH, Tkačik G. Dynamics of transcription factor binding site evolution. PLOS Genet. 11(11), e1005639 (2015).
    • 72. Smith ZD, Chan MM, Mikkelsen TS et al. A unique regulatory phase of DNA methylation in the early mammalian embryo. Nature 484(7394), 339–344 (2012).
    • 73. Sun Z, Cunningham J, Slager S, Kocher J-P. Base resolution methylome profiling: considerations in platform selection, data preprocessing and analysis. Epigenomics 7(5), 813–828 (2015).
    • 74. Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 16(1), 6–21 (2002).
    • 75. Doherty R, Couldrey C. Exploring genome wide bisulfite sequencing for DNA methylation analysis in livestock: a technical assessment. Front. Genet. 5(8), 126 (2014).