Leveraging the urban–rural divide for epigenetic research

Héléne T Cronjé*,1 , Hannah R Elliott2,3 , Cornelie Nienaber-Rousseau1 & Marlien Pieters1 1Centre of Excellence for Nutrition, North-West University, Potchefstroom Campus, Potchefstroom, 2520, North-West Province, South Africa 2MRC Integrative Epidemiology Unit, University of Bristol, Bristol BS8 2BN, UK 3Population Health Sciences, Bristol Medical School, University of Bristol, Bristol BS8 2BN, UK *Author for correspondence: toinet.cronje@sund.ku.dk


Global income comparative models
most studied epigenetic modification and has been implicated in the etiology and progression of several NCDs over and above genetic predisposition [9]. Methylation differences can be: biomarkers of exposure that do not affect disease, part of the causal pathway between exposures and disease or a biomarker of current disease [9]. Apart from its potential role as a disease mediator, the plasticity of DNAm also makes it a valuable topic of investigation owing to its intervention potential in preventative care [10]. To date, most methylation studies have investigated only single exposures or disease outcomes. This does not take into account that an individual, in any given environment, experiences a combination of exposures simultaneously such as those clustered together in rural and urban landscapes. In this review, we evaluate the ability of three epidemiological study designs to investigate the role of DNAm in the association between urbanization and the NCDs. Investigating urbanization, as a well-defined cluster of exposures, could allow for a better understanding of methylation's role in the global urbanization-NCD trend. First, we review the current evidence for the urbanization-NCD, urbanization-DNAm and DNAm-NCD relationships as the theoretical backdrop to the research models discussed. We then discuss and compare the migration, income-comparative and urban-rural study designs in terms of the questions they are best suited to answer, suitable cohorts and their respective strengths and weaknesses. Figure 1 provides an illustrated summary of this review.

Contextualizing methylation
Genetics and environmental exposures both influence phenotype, partly through epigenetic modifications. Genetic architecture affects DNAm in terms of the availability of cytosine-phosphate-guanine (CpG) sites, the efficacy of methylation-related enzyme expression [8,11], how DNAm responds to exposure [12] and inherent NCD risk [2]. Environmental exposures can be either external (behavioral or lifestyle-related factors such as diet, exercise and air purity) or internal (metabolic and biochemical factors such as inflammation and adiposity). The external environment's role in methylomic variance is best observed in monozygotic twins that start their lives methylomically indistinguishable (genetically determined), but grow ever more discordant throughout the life course [13]. Additionally, it has been observed that unrelated spouse-pairs that share an environment have more similar methylation profiles than those who live apart [14]. Methylomic variance attributed to the internal environment, on the other hand, is most prominent in the presence of disease, as a result of disease-related metabolic and biochemical changes [15,16]. Three key aspects of the exposure-methylation-disease framework will be discussed in the sections to follow. These are also depicted in Figure 1.
Urbanization is associated with NCD risk The association between urbanization and NCDs is predominantly driven by characteristics of the urban environment and behavioral factors. Urbanization-associated environmental factors known to increase NCD risk include increased exposure to pollution and occupational toxins [3,4]. Urbanization-associated behavioral factors contributing to increased NCD risk include increased food availability [17], decreased nonrecreational physical activity [18], a higher portion of energy intake from fat [19], protein [20] and processed foods [21], a reduction in relative energy from carbohydrates [19] and increased adiposity [21]. While smoking and alcohol consumption are known contributors to NCD risk, their relationship with urbanization is more complex [3]. Urbanization coincides with increased purchase of commercial tobacco products and exposure to tobacco-encouraging advertizing, while second-hand smoke inhalation tends to be reduced [20,22,23]. Although urban individuals are less likely to be subject to alcohol abuse, they are more likely to consume alcohol during their lifetime [20,24]. Urbanization is also associated with increased psychosocial stress as a result of social inequity and exclusion, job insecurity and growing concerns of violence and crime [25]. These are particularly prevalent when urbanization coincides with the growth of informal settlements within the urban landscapes [5]. Access to education, on the other hand, decreases NCD risk [26].

Urbanization is associated with DNAm
The same types of exposure related to NCD risk, described above, have also been independently associated with altered DNAm. In this context, mostly noncausal associations between exposures and DNAm have been investigated, although evidence for causal associations are accumulating ( Figure 1). Evidence that DNAm is causally affected by urbanization-related exposures (i.e., the exposure alters DNAm, not vice versa) have been published for BMI [27] and smoking [28]. Genome-wide [29] and gene-specific [27] investigations have reported adiposity-related methylation changes. These methylation signatures have been used successfully to predict the efficacy of weight-loss interventions [30]. Smoking status, cumulative smoking exposure and smoking intensity can also be determined using DNAm as biomarker [31,32]. Smoking-associated differential DNAm is only partially reversible upon cessation [31,32]. Similarly, heavy drinking can be identified using a methylation-based biomarker [33]. Methylation differences associated with alcohol consumption seem to be completely reversible, indicating possible causality [33].
Regarding noncasual associations, dietary patterns, such as high fat [34] and western diets [35], have been associated with methylation differences in genes involved in lipid metabolism, adipogenesis, inflammation and glucose regulation. Physical activity-based intervention studies reported beneficial methylation and transcription changes in genes related to longevity, inflammation and metabolism in blood, skeletal muscle and adipose tissue [36][37][38]. Methylation levels at specific CpGs have also been incorporated into methylation-derived biological age predictors [39,40]. The discrepancy between DNAmAge and chronological age is referred to as biological/methylation age acceleration (DNAmAgeAccel) and, when positive, is used as a biomarker of accelerated cellular aging [39,40]. Urbanicity-related factors such as adiposity [41], meat consumption [41] and cigarette smoking [42] are associated with DNAmAgeAccel. Alcohol associates with DNAmAge in a nonlinear manner, where light and heavy drinkers experience DNAmAgeAccel and moderate alcohol consumers have a relative deceleration of DNAmAge [43,44]. Education, aligned with its negative association with NCD risk [26], also protects against DNAmAgeAccel [42,43].
In terms of the urban environment, a vast amount of literature has reported on the genome-wide, global and gene-specific DNAm associations with exposure to general pollution and distinct pollutants [45]. Increased exposure to traffic-related air pollution, for example, has been associated with altered methylation at the TET1 gene, which encodes a key enzyme involved in DNA demethylation [46]. A dose-response association between traffic-related pollution and DNAm changes has also been observed [47]. Accelerated DNAmAge has also been observed in groups exposed to pollution and pesticides [48]. From a social environment point of view, neighborhood unity, aesthetics and safety have been associated with favorable DNAm and downstream expression changes in particularly stressfuture science group www.futuremedicine.com and inflammation-related genes [49,50]. Such neighborhood characteristics also enable outdoor recreational physical activity that in itself has proven beneficial [5], although these characteristics are likely to be largely present in urban-dwellers of high socioeconomic status [5,51].

DNAm is associated with NCDs
Associations between DNAm and NCDs have been reported in both directions (Figure 1), as exposure-related differential DNAm may precede disease (DNAm being on the causal pathway between exposure and disease), and, conversely, disease-related metabolic changes can affect DNAm (DNAm as a noncausal biomarker of disease). Investigations into the potential causal influence of DNAm on Type II diabetes [52] and CVD development [53] are increasing, although conclusive evidence is yet to be published. As noncausal biomarkers, both DNAm and DNAmAge have been used to identify several subtypes of cancer [54,55] and CVD [56,57]. Tumorous tissues are epigenetically older than their noncancerous counterparts [58]. As a prognostic marker, DNAm, particularly DNAmAge, has been useful in predicting cancer incidence [59,60] and survival [61], cardiac events [62], premature CVD [40,63] and all-cause mortality [60] independent of traditional risk factors. Last, as an intervention strategy, methylation-altering drugs are proving to be increasingly successful in the treatment of CVD [64], cancer [65,66] and Type II diabetes [67].

The missing link
Collectively, the evidence summarized in the previous sections highlights the potential role of DNAm as a mediator between urbanization and NCDs. The only robust evidence that has been able to link the environment to DNAm, and then DNAm to disease concerns the relationship between smoking and bladder cancer in postmenopausal women [68]. Preliminary findings have, however, associated BMI-related changes in DNAm with cardio-metabolic disease development [29]. In addition, a randomized controlled trial has provided some evidence of DNAm mediating the association between exposure to air pollution and adverse cardiovascular profiles [69]. The main gaps remaining in the literature, and the best way to address them, are the topic of interest of this review. Key questions include: Have we identified all the key risk factors in the urbanicity-NCD relationship? How does the research currently address the amalgamated risk posed by the entirety of the urban versus rural context? How can we investigate and understand the role of DNAm in this lifestyle-disease model better? Thus far, the role of DNAm in NCDs has been investigated typically by focusing on one form of exposure at a time. Investigating DNAm in the context of urbanization provides the opportunity to aggregate NCD-related exposures to provide not only a more accurate reflection of the amalgamated disease risk associated with urbanicity but also to start identifying currently unknown contributing factors that explain the variance in risk after accounting for all the known single forms of exposure. Such an investigation could also provide insight regarding the extent of potential additive risk compared with a washout effect when numerous methylation-altering exposures are clustered together. By identifying DNAm mediators involved in the relationship between urbanization and NCD, we might find modifiable targets for improving population health. The sections to follow evaluate the ability, strengths and weaknesses of different approaches to best answer key questions and elaborate on our current understanding of the role of DNAm in urbanization-related population health (Figure 1).

Contextualizing urbanization
Urbanization can be driven by the net movement of individuals from rural to urban residency in search of, among others, better education or healthcare, economic success, safety or food security. In this context, urbanization can be the result of individuals moving from a rural to an urban community within their own countries or to another country/culture entirely (migration). Alternatively, urbanization can occur as a specific region progressively urbanizes. There are three main epidemiological approaches that can be used to investigate the health-related consequences of urbanization: the migration, income-comparative and urban-rural divide approach.

Migration models
Migration studies are able to investigate the effects of environmental shifts in two ways. The first studies groups of similar ancestral and geographical origin, living in different countries, such as those who remained in the home country compared with those who moved to different locations [70]. These studies are useful in that they allow investigation of an altered environment while controlling for early-life exposure and ethnicity. A study of Japanese migrants, for example, reported a dramatic increase in CVD risk in the migrant compared with the nonmigrant group, providing evidence that the environmental shift increased CVD risk in this population [71]. A second approach is the comparison of the migrant population with their host. In these cases, ethnic inequalities in NCD risk and outcomes can be assessed, for example, the major differences in the rate of specific CVD incidence and mortality between migrants from different countries and the same HIC host population [70].
The complexity of using migration models to investigate the association of urbanization with health stems from the numerous migration-associated confounders that are not part of the rural-urban shift. First, the migrant group themselves are not necessarily reflective of the group they originated from in that migrants are often better resourced to enable their migration than those staying behind [72]. Second, the circumstances of migration complicate the separation of the health effects related to the rural-urban shift, through the stress and impact of the migration itself [72,73]. Third, migrants often experience a vast shift in culture and climate. In addition, cultural differences in health-seeking behavior may lead to a lack of timely disease diagnosis or noncompliance in treatment, particularly in new migrants [74]. Last, migration timing may profoundly influence the relative severity of the effect of urbanization on health outcomes. According to the Developmental Origins of Health and Disease hypothesis, individuals born in an environment with limited nutritional resources (often rural settlements) are prenatally programmed to survive in these conditions. If they are then subsequently exposed to nutritional abundance, they are not metabolically equipped to manage this affluence and are predisposed to develop NCDs [75,76].

Income-comparative models
An alternative model that can be used to investigate urbanization is a global income-comparative research model where groups of differing demographic backgrounds from across the world are compared with respect to disease prevalence. The largest investigation currently implementing such a study design is the international Prospective Urban and Rural Epidemiology (PURE) study. The study includes 225,000 individuals residing in 27 low-, middleand high-income countries [77]. The PURE study has provided many insights on global NCD risk progression and contributors. Meta-analyses by the PURE cohort include topics such as carbohydrate and fat intake [78] fruit, vegetable and legume intake [79], dietary nutrients [78], education [26], physical activity [80] and alcohol consumption [81] in relation to CVD and its related health outcomes such as blood lipid concentrations and blood pressure. The primary focus of these meta-analyses is better understanding of the relationship between country-income classification, subsequent exposure and the influence of this on CVD incidence and mortality.
Although these investigations provide vital information on the extent of the NCD crisis, particularly in LMICs, this approach is limited in a few ways. First, it is often unable to account for the genetic diversity risk of specific groups when comparing and combining multiple ethnicities. It is widely known that the risk models, ranges and cutoff created for one population are not always indicative of the same risk variance in other populations [82]. For this reason, continual attempts are being made by the WHO to recalibrate NCD risk models that are currently used in HICs to be used in LMIC population groups [83]. The numerous genome-wide association analyses that have indicated ethnic differences in genotype frequencies and their associations with intermediate phenotypes and ultimate risk indicate that although phenotypic risk assessment might be the most feasible, the gap in risk variability might only be fully addressed when also considering genetic contributors [2]. In the epigenetic context, methylation differences have also been reported among ethnic groups [11].
Second, as with migrant studies, differing geographical locations also introduce confounding by climate and diet [21,84]. Cross-cultural adaption of data collection methods is critical in these cases, as reference material developed for one population might leave many factors unstudied in the population to which it is applied, purely because of their absence in the reference group [85]. Many developing nations remain severely under-represented in genetic and epigenetic research, suggesting that the driver of observed DNAm associations might not have been identified or studied previously, resulting in potentially unquantifiable confounding when comparing these population groups [86].
Last, although socioeconomic status is associated with urbanicity, national economic status (such as the World Bank status used in most income-comparative models) does not reflect urbanicity. Developing nations, for example, are often LMICs, but generally have urban capitals, informal urban and rural settlements and rural agricultural landscapes [87].
Within-country rural-urban models A third approach that can be used to investigate the process of progressive urbanization is to consider those who do not undergo urbanization during their lifetime, but are, instead, subject to the urban-rural divide still common in developing countries [51]. This research design can be used under the condition of having a cohort that represents future science group www.futuremedicine.com communities of a single genetic origin, part of which resides in an urban, and the other in a rural area, and does so for its entire lifespan. These individuals should have been born, and remained, on either the rural or urban side of the sociodemographic divide throughout their lifetime. A cohort of this nature will allow for the investigation of discrepant environmental exposure and health outcomes while limiting many of the confounding factors discussed in the previous sections. Two examples of large-scale studies that can leverage this approach are the PURE [77] and the Research on Obesity and Diabetes among African Migrants (RODAM [88]) cohorts. Although spanning continents, all the countries participating in the PURE study contribute a variety of both rural and urban subcohorts [87]. The RODAM study, on the other hand, includes a rural and an urban site in Ghana (Africa), in addition to the Ghanaian migrants residing in Europe [88].
Only one of the PURE subcohorts has published epigenetic data [89], but many of the other subcohorts have access to previously collected peripheral blood samples in cryo-storage facilities [77]. No epigenetic data from the PURE cohort have been used to investigate urban-rural disparities. The advantage of a cohort such as PURE is the availability of longitudinal data on the disparity between large well-defined urban and rural communities in at least 27 countries [87]. The RODAM cohort has published genome-wide DNAm data in relation to obesity [90] and Type II diabetes [91], although no urban-rural epigenetic comparisons have been made to date. Although the RODAM study is currently of cross-sectional design, there are plans to transform it into a longitudinal cohort [92]. The PURE cohort was established in 2003, and the RODAM cohort in 2012. These cohorts are, therefore, able to capture urbanization at the pace it is currently experienced [77].
The country-specific urban-rural research platform has the benefit of being able to investigate the clusters of types of exposure that represent rural or urban living, while factors such as genetics, climate and geographical influencers remain constant and similar between groups. Developing nations, such as those included in the PURE and RODAM studies, are particularly likely to benefit from this approach, as the urban-rural divide is most severe in these countries. Furthermore, particularly in the context of epigenetic epidemiology, these countries often contain many under-or unstudied ethnic groups. Currently, most of the available evidence on NCDs, NCD risk factors and the role of epigenetics originates from study populations in developed countries [83,93]. As there are vast genomic and socioeconomic differences between these countries and the ethnic groups they contain [2,11], the feasibility of simply extrapolating findings from what is largely HIC European literature is unknown. Inclusion of more LMICs in large-scale research efforts will, therefore, not only provide an opportunity to generate population-relevant information to inform prevention, detection and treatment of NCDs in these countries but will also contribute to closing the knowledge gaps in the global literature. Findings from such investigations will provide external validation of generalizable findings, while highlighting the circumstances where population-specific research is needed.

Current challenges
One of the limitations of most epigenetic investigations is the unavailability of disease-relevant tissues. It has been well established that DNAm signatures differ among tissues, although the available evidence on environmentmethylation-disease patterns is almost exclusively derived from blood-based methylation investigations [94]. Urbanicity-associated DNAm changes are, therefore, more likely to be leveraged as biomarkers of exposure or disease indicators, as the unavailability of target tissues for specific disease or outcomes limits causal inference. Mediation analyses such as Mendelian randomization could be employed to help with this, although multiple causal inference methods might be needed for triangulation of evidence [52,95]. Because, to our knowledge, populationspecific genomic data are not available for many LMICs currently investigated, the addition of genetic data will be a valuable contribution.
As progressive urbanization is likely to affect both the rural and urban groups in LMICs, longitudinal measurements of DNAm will be a beneficial and informative resource. Research has shown that altered environments affect health at different rates. Adiposity, for example, seems to increase rapidly once individuals relocate to urban areas, whereas fasting insulin increases at a much more gradual pace [96]. Cross-sectional representations of urbanicity and health are therefore limited, as they capture only the factors that have an impact at the specific point in time. Should longitudinal data collection be performed, not only will there be better control of the epidemiological transition over time but this will also allow researchers to address the gap in longitudinal epigenetic research in terms of causality, reversibility and/or stability of DNAm. Standardized protocols for blood collection, handling and storage will be critical in avoiding the limitation of time-point-related batch variance.
Leveraging richly phenotyped, genetically similar, rural and urban communities with genome-wide epigenetic data and the ability to track NCD risk progression and mortality prospectively provides a unique opportunity to investigate the full environment → DNAm → NCD framework where such pathways exist, and where they do not, the value of DNAm as a biomarker for either environmental exposure or existing disease risk can be evaluated.

Conclusion
When exploring the role of DNAm in the association between urbanization and the rise in NCD prevalence, the migration, income-comparative and urban-rural study designs can be particularly useful. While each of these approaches are able to contribute to the understanding of the methylation-mediated risk attributable to urbanization from a unique perspective, they are all particularly beneficial in that they typically investigate under-represented cohorts (developing nations, migrant populations and LMICs). The integration of knowledge gained from these approaches, therefore, ultimately allow for a more rounded understanding of the role of DNAm in complex disease etiology through tackling the question from different angles, while simultaneously contributing to the currently lacking ethnic and environmental diversity of epigenetic epidemiology literature.

Future perspective
As the 21st century continues to be marked by urbanization, it is essential to improve our understanding of the molecular mechanisms driving the effect of the environmental shift on NCD prevalence and incidence. The current global landscape allows numerous approaches to be taken to investigate these mechanisms, each with its own strengths, limitations and answerable questions. In the era of big data and the continual pressure of the scientific community to promote open access and increase data availability, we expect the use of these models to significantly add to the genetic and environmental diversity captured in global epigenetic epidemiology data. Integrating the knowledge gained from the different perspectives of each of these three models will allow for a more holistic view of the different genetic and environmental origins of disease and the epigenetic mechanisms that bridge them. Ultimately, it is within the rounded understanding of methylation's role in the urbanization-NCD relationship that modifiable targets can be identified to translate research to population-based NCD prevention strategies.

Executive summary
The noncommunicable disease death toll is rising globally • In low-and middle-income countries, this is thought to be the result of urbanization. DNA methylation could mediate the urbanization-disease relationship • Urbanization-related exposures associate with DNA methylation (DNAm).
• DNAm associates with noncommunicable disease (NCD) risk factors and outcomes.
• Urbanization associates with NCD risk factors and outcomes. The following models can be used to explore this hypothesis Migration model • This model is particularly useful when investigating genetic predisposition and the developmental origins of health and disease. The migration process itself may, however, confound associations. Keep in mind that migrants are not always representative of their native population. Global income comparative model • This model can be used to report the extent of global urbanization and health disparities over time, but is susceptible to genetic, cultural and climate confounding. Comparable data collection methods and reference material is a necessity when this model is used. Within country urban-rural models • This model significantly reduces above-mentioned confounding and is, therefore, the most controlled setting to explore the hypothesis of DNAm mediating the effect of urbanization on NCD risk. This model is likely to be suitable only for developing nations, but can, therefore, be used when investigating un-or understudied populations. Integrating knowledge gained from these three models is the key • Integrating the knowledge gained from the different perspectives of each of these three models will allow for a more holistic view of the different genetic and environmental origins of disease and the epigenetic mechanisms that bridge them. Ultimately, it is within the rounded understanding of methylation's role in the urbanization-NCD relationship that modifiable targets can be identified to translate research to population-based NCD prevention strategies.
future science group www.futuremedicine.com