We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
Skip main navigation
Aging Health
Bioelectronics in Medicine
Biomarkers in Medicine
Breast Cancer Management
CNS Oncology
Colorectal Cancer
Concussion
Epigenomics
Future Cardiology
Future Medicine AI
Future Microbiology
Future Neurology
Future Oncology
Future Rare Diseases
Future Virology
Hepatic Oncology
HIV Therapy
Immunotherapy
International Journal of Endocrine Oncology
International Journal of Hematologic Oncology
Journal of 3D Printing in Medicine
Lung Cancer Management
Melanoma Management
Nanomedicine
Neurodegenerative Disease Management
Pain Management
Pediatric Health
Personalized Medicine
Pharmacogenomics
Regenerative Medicine
Short CommunicationOpen Accesscc iconby iconnc iconnd icon

Potential of machine learning methods to identify patients with nonvalvular atrial fibrillation

    Ryoko Suzuki

    Cardiovascular Medical, Bristol-Myers Squibb K.K., Tokyo, Japan

    ,
    Jun Katada

    *Author for correspondence:

    E-mail Address: jun.katada@pfizer.com

    Cardiovascular/Metabolism Medical Affairs, Internal Medicine, Pfizer Japan Inc., Tokyo, Japan

    ,
    Sreeram Ramagopalan

    Centre for Observational Research & Data Science, Bristol-Myers Squibb UK, Uxbridge, Middlesex, UK

    &
    Laura McDonald

    Centre for Observational Research & Data Science, Bristol-Myers Squibb UK, Uxbridge, Middlesex, UK

    Published Online:https://doi.org/10.2217/fca-2019-0056

    Abstract

    Aim: Nonvalvular atrial fibrillation (NVAF) is associated with an increased risk of stroke however many patients are diagnosed after onset. This study assessed the potential of machine-learning algorithms to detect NVAF. Materials & methods: A retrospective database study using a Japanese claims database. Patients with and without NVAF were selected. 41 variables were included in different classification algorithms. Results: Machine learning algorithms identified NVAF with an area under the curve of >0.86; corresponding sensitivity/specificity was also high. The stacking model which combined multiple algorithms outperformed single-model approaches (area under the curve ≥0.90, sensitivity/specificity ≥0.80/0.82), although differences were small. Conclusion: Machine-learning based algorithms can detect atrial fibrillation with accuracy. Although additional validation is needed, this methodology could encourage a new approach to detect NVAF.

    Atrial fibrillation (AF) is the most common form of arrhythmia, particularly in the elderly [1,2]. AF increases the risk of ischemic stroke, in particular embolic stroke, which is in general the most severe form of stroke with the poorest outcomes [3,4]. For prevention of cardioembolic stroke, anticoagulation therapy as well as appropriate AF management is essential. However, there are still many patients who are not diagnosed with AF before having a cardioembolic stroke [5,6]. Unfortunately, paroxysmal AF, reportedly constituting 50% or more of any-cause AF [7], is often asymptomatic and difficult to detect in general clinical practice [8], with electrocardiography is the only definite method for diagnosis. Shorter duration 12-lead electrocardiogram (generally for several minutes) is not sufficient to find paroxysmal AF and longer monitoring such as event recorder electrocardiography is generally required. However, wearing a device for a long period of time can be burdensome and is not always practical. Taking into consideration these limitations, a two-step approach might be useful, whereby patients at high risk for AF are first identified and then screening by electrocardiography is undertaken among these high-risk patients to diagnose AF.

    Although classical cardiovascular risk factors for AF have been investigated in previous studies, the overall identification ability of these factors may not be sufficient to justify screening large populations who present with such risk factors [9–12]. It is possible that a machine-learning approach could be used to develop a better identification model for AF by allowing the consideration of a larger number of variables, although to the best of our knowledge there are currently no reports of machine leaning-based AF detection in the literature. The objective of this study was therefore to assess the identification ability of a machine learning-based algorithm for the detection of nonvalvular AF (NVAF) using data obtained from a Japanese health claims database.

    Materials & methods

    Data source

    This study used de-identified health claims and Diagnosis Procedure Combination (the flat-free payment system) data from 314 acute care hospitals across Japan, obtained from the Medical Data Vision (MDV) Co, Ltd (Tokyo, Japan). The details of this database have been described elsewhere [13,14]. In brief, the database includes administrative data from 15 million patients in Japan. The distribution of demographic characteristics including the age and gender distributions of patients included in the database is similar to national statistics in Japan. The database includes inpatient and outpatient administrative data from participating hospitals. Although the data contain clinical and demographic characteristics, medical procedures and medications prescribed for all patients, laboratory data (clinical laboratory test results) and other clinical tests (such as blood pressure values) are not always available, with availability varying by hospital. Diseases were identified using International Classification of Disease, 10th Revision (ICD-10) codes and/or the local standard disease codes to supplement ICD-10 codes. The codes used in the current study are shown in Supplementary Table 1. Medical procedures were defined with the procedure codes shown in Supplementary Table 2.

    Patient inclusion

    Criteria for selection of NVAF and non-AF cohorts are shown in Figure 1. NVAF patients were defined as patients with at least two diagnosis codes for AF (I48), but not with valvular AF (standard disease code 8846941), postoperative AF (8847772), mechanical-valvular AF (T820) or chronic rheumatic heart diseases (I05-I09) between April 2008 and August 2017. Patients with a history of procedures involving prosthetic heart valves were also excluded. According to these criteria, 33,885 patients were selected into the NVAF cohort. The date of AF diagnosis (that is, first appearance of the I48 code during a whole available period in the database) was defined as the index date. The final NVAF cohort included, patients who had data for all essential variables and at least 60% of ‘optional variables’ within a period of ‘index date–6 months’ were selected (Table 1). Patients who had never been diagnosed with AF (no ICD code of I48 but having at least two visits and at least 1 year of available data) within the study period were defined as non-AF patients. Patients diagnosed with valvular AF, postoperative AF, mechanical-valvular AF or chronic rheumatic heart diseases and those with a history of valve replacement procedures were also excluded from this cohort. The index date for each non-AF patient was defined the date of last visit where laboratory data were available. Patients who had data for all essential variables and for at least 60% of ‘optional variables’ within a period of ‘index date–6 months’ were selected (Table 1). In total, 27,592 patients were included in the NVAF cohort and 463,750 patients in the non-AF cohort.

    Figure 1. Patient selection flow for creation of nonvalvular atrial fibrillation and non-atrial fibrillation cohorts.
    Table 1. Patient characteristics.
    CharacteristicsNVAF (n = 27,592)Non-AF (n = 463,750)All (n = 491,342)Required or Optionalp-value (NVAF vs non-AF)
     MeanSDMeanSDMeanSD  
    Demographic        
    Age (years)76.511.066.017.066.616.8Required<0.001
    Male15,84557.4%231,19247.1%247,03750.3%Required<0.001
    Height (cm)157.110.5158.110.4158.110.4Optional<0.001
    Weight (kg)55.913.856.013.856.013.8Optional0.242
    Diseases        
    Hypertension17,06261.8%173,10337.3%190,16538.7%Required<0.001
    Diabetes13,23548.0%120,92726.1%134,16227.3%Required<0.001
    Ischemic Stroke729426.4%38,3308.3%45,6249.3%Required<0.001
    Heart Failure15,33955.6%47,64410.3%62,98312.8%Required<0.001
    Anemia834430.2%81,95317.7%90,29718.4%Required<0.001
    Hyperthyroidism420315.2%33,7937.3%37,9967.7%Required<0.001
    CAD847230.7%47,60510.3%56,07711.4%Required<0.001
    PAD289910.5%17,2043.7%20,1034.1%Required<0.001
    COPD9803.6%93962.0%10,3762.1%Required<0.001
    VPC6712.4%49191.1%55901.1%Required<0.001
    Laboratory Value        
    RBC counts (x104/L)411.273.9411.873.5411.773.6Optional0.188
    Hemoglobin (g/dL)12.72.412.52.212.52.2Optional<0.001
    Hematocrit (%)38.06.637.76.237.76.2Optional<0.001
    MCV (fL)92.76.892.3107.992.3104.8Optional0.538
    MCH (pg)33.33.833.317.033.316.5Optional1.000
    MCHC (%)31.693.830.627.330.734.6Optional<0.001
    WBC counts (/L)763946436915645369566,367Optional<0.001
    Platelet count (x104/L)20.18.922.78.922.58.9Optional<0.001
    TP (g/dL)6.70.86.70.86.70.8Optional1.000
    GOT (IU/L)41.9146.437.0204.437.2201.6Optional<0.001
    GPT (IU/L)32.398.229.597.629.797.6Optional<0.001
    ALP (IU/mL)283.7230.6295.0373.1294.4366.6Optional<0.001
    γ-GTP (IU/L)57.5108.454.2115.554.4115.2Optional<0.001
    LDH (U/L)253.1263.8237.9493.8238.7431.7Optional<0.001
    UN (mg/dL)22.215.118.315.318.515.3Optional<0.001
    Serum creatinine (mg/dL)1.11.31.01.21.01.2Optional<0.001
    UA (mg/dL)5.92.15.31.95.31.9Optional<0.001
    Total cholesterol (mg/dL)171.242.5185.443.5184.643.6Optional<0.001
    TG (mg/dL)110.886.0126.193.7125.193.3Optional<0.001
    HbA1c (%)6.21.16.21.16.21.1Optional1.000
    CRP (mg/L)3.15.82.04.42.14.5Optional<0.001
    K+ (mEq/L)4.20.64.20.64.20.6Optional1.000
    Cl (mEq/L)103.74.9104.34.5104.34.5Optional<0.001
    Na+ (mEq/L)139.64.3140.04.1140.04.1Optional<0.001
    Fe2+ (g/dL)59.444.870.447.569.747.4Optional<0.001
    BNP (pg/mL)351.3560.9197.7587.4223.7585.8Optional<0.001
    GFR (ml/min/1.73m2)60.128.073.632.772.832.6Optional<0.001

    p- values obtained from unpaired t test or χ2 analysis.

    AF: Atrial fibrillation; ALP: Alkaline phosphatase; CAD: Coronary arterial disease; COPD: Chronic obstructive pulmonary disease; GFR: Glomerular filtration rate; HbA1c: Hemoglobin A1c; MCHC: Mean corpuscular hemoglobin concentration; MCH: Mean corpuscular hemoglobin; MCV: Mean corpuscular volume; NVAF: Nonvalvular atrial fibrillation; PAD: Peripheral arterial disease; RBC: Red blood cells; TG: Triglyceride; TP: Total protein; UA: Uric acid; UN: Urinary nitrogen; VPC: Ventricular premature contraction; WBC: White blood cells.

    Variable selection

    Variables were selected based on known associated with NVAF, and included demographic characteristics (age, sex, height and weight) [12], comorbidities (hypertension, diabetes [15], ischemic stroke [16], heart failure [16], anemia [17], hyperthyroidism [18], chronic obstructive pulmonary disease [19] and ventricular premature contraction [20], coronary artery disease [21] and peripheral artery disease [16]) and laboratory data related to renal function (creatinine and glomerular filtration rate), metabolic disease (cholesterol and triglyceride) [12] and others (hemoglobin A1c [15], brain natriuretic protein [22] and uric acid [23]) were selected. Additional laboratory test values (hematocrit, hemoglobin, mean corpuscular volume, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, white blood cell count, red blood cell count, platelet count, serum total protein, GOT, GPT, gamma-GTP, ALP, LDH, CRP, urinary nitrogen, K+, Cl, Na+ and Fe2+) were selected as these tests are commonly used in general clinical practice.

    Statistical analysis

    Datasets

    The raw data were processed to create datasets for analysis. First, missing data were imputed by multivariate imputation by chained equations. The proportion of missing values for each variable is shown in Supplementary Table 3. The dataset was then randomly divided in three at a 40:40:20 ratio, a training dataset 1, training dataset 2 and a validation dataset, respectively. And the ratio of NVAF/non-AF was kept consistent among three datasets. These datasets were used for development of both model 1 and model 2. Overall steps are shown in Figure 2. For model 2, to alleviate imbalance in the dataset (the number of non-AF patients was much larger than NVAF patients), random undersampling of the majority class was applied to the dataset so that the ratio of NVAF to non-AF was 1:1. After undersampling, the final training dataset contained 11,036 patients with NVAF and 11,036 non-AF.

    Figure 2. Development flow of atrial fibrillation identification models (model 1: deep learning-based model, model 2: stacking-based model).

    AUC: Area under the curve; MICE: Multivariate imputation by chained equation.

    Model-1

    For Model 1, a ‘deep learning’ (DL) method was used to create multiple features to predict NVAF in Step 1 using the training dataset 1. DL-based features created in the Step 1 were then applied to Step 2, where Lasso regression (LR) was used to develop a classifier using the training dataset 2. In Step 3, the developed algorithm was validated using the validation dataset.

    Model-2

    For Model 2, stacking of multiple single classifiers was used, which included LR, ridge regression, support vector machine, random forest, DL, AdaBoost and gradient boosting. These multiple models were trained using the training dataset 1. Each model provided probability of AF as an outcome which were then cast into a second level model (LR model), where the outcomes obtained in the first step became features in the second step. The developed algorithm was validated using validation dataset in model validation step.

    Variable-feature relation analysis

    We investigated the relationship between features developed by stacking model 1 or 2 and each of the variables included in the analytical dataset. The relationships were described using a Pearson correlation coefficient.

    Receiver operating characteristic (ROC) analysis was conducted to estimate area under the ROC curve (AUC), sensitivity and specificity, for each of the developed algorithms. All of the above analyses were performed using Python version 3.6.

    Results

    Patient characteristics are shown in Table 1. As expected, there were marked statistically significant differences between patients in the NVAF and non-AF cohorts in terms of key clinical and demographic characteristics. In general, patients in the NVAF cohort tended to be older with a higher comorbidity burden.

    The ROC curve suggested a small difference in performance between the stacking model and other methods (Figure 3). AUC, sensitivity and specificity are shown in Table 2. Stacking models 1 and 2 had an AUC of 0.90, specificity and sensitivity were also well-balanced (Table 2). The AUC for each single model was >0.8, and the sensitivity of single models varied between 0.09 and 0.84, and specificity between 0.79 and 0.99.

    Figure 3. The receiver operating characteristic curve of the single models (lasso regression, ridge regression, support vector machine, random forest and deep learning) and stacking models (model 1 and model 2).
    Table 2. Calculated parameters for each prediction model.
    ModelAUCSensitivitySpecificitySampling
    Lasso regression0.860.090.99 
    Ridge regression0.860.780.81undersampling
    Support vector machine0.860.780.79undersampling
    Random forest0.870.800.79undersampling
    Deep learning0.900.230.99 
    AdaBoost0.900.840.81undersampling
    Gradient boosting0.900.820.82 
    Stacking Model-10.900.820.82 
    Stacking Model-20.900.800.84 

    AUC: Area under the receiver operating characteristic curve.

    We visualized the correlation between study variables and each of the features engineered by the DL algorithm for stacking model 1 (Supplementary Figure 1) and each of the single model algorithms developed for stacking model 2 (Supplementary Figure 2). For the deep-learning developed features used in stacking model 1, almost all of the developed features were associated to used variables, although associations were strongest with arterial disease and COPD (Supplementary Figure 1). Importantly though, some of the developed features had no correlation with used variables (for example, features 10 and 15 in Supplementary Figure 1). Age and select comorbidities including heart failure, hypertension, diabetes and coronary arterial disease were most strongly correlated with each of the single model algorithms (Supplementary Figure 2). Lab tests tended to be weekly correlated, although among the lab tests considered brain natriuretic protein, urinary nitrogen, uric acid showed the strongest associations.

    Discussion

    In this Japanese study, we found that machine learning methods can be used to develop an algorithm which detects patients with NVAF from patients without NVAF with a level high accuracy. All machine learning algorithms developed in this study had at least an 80% chance of correctly identifying NVAF patients. These results should be considered exploratory and require further validation, but encouragingly demonstrate the potential utility of machine-learning approaches to help detect patients with NVAF in the general population.

    AF risk scores have been reported from both Western countries and Japan. The CHARGE-AF study has proposed a validated score for AF by using 5 US and European cohorts and its c-statistic was about 0.7 [10]. A Japanese risk score to predict a 10-year risk of AF proposed by the Suita study also had a c-statistic of 0.7 s [24]. Both scores have been prepared for the purpose of being used to effectively screen for AF in general outpatient clinics without 12-lead ECG measurement, and the c-statistics could be large enough for this purpose.

    In the CHARGE-AF and the Suita study risk scores, variables that are established cardiovascular risk factors, or are considered as factors associated with AF, were selected and used for preparation of the scores. However, in the present study, factors available from the database irrespective of their potential association with AF, were used for development of the algorithm. As a result, our algorithm contained more information on comorbid disease and laboratory tests, however, it is uncertain how these differences affected our model performance. Unfortunately, due to the limited availability of variables, we did not perform cross-validation between our algorithm and these previously reported risk scored. Performance of the algorithm should be examined by the cross-validation in the future.

    In this study, we used an administrative claims database, which includes real-world data related to patient health status and provision of clinical care collected in general clinical practice. AF is difficult to be diagnosed in clinical practice and there might be a considerable number of patients with AF who have not yet diagnosed. These patients with undiagnosed AF were included in the non-AF cohort in the present study and might have an impact on the result. This is a critical limitation of this study. A prospective, probably ‘interventional’, study may be necessary to actively diagnose NVAF to address this issue.

    In the present study, we did not show which factors/features contributed the most to identification of NVAF. Some variables that are not directly significantly associated with the outcome (NVAF in this study) are sometimes considerably contributed to the outcome when a machine learning-based approach, particularly DL or stacking, is applied to development of algorithms and vice versa. This is a critical difference between conventional statistics-based approach (such as risk scores based on risk factors) and our DL-based approach.

    There are limitations to our study that should be considered. First, as already mentioned above, our model was prepared and validated only by using claims data from the MDV database. This dataset might contain patients with misdiagnosed AF and this could affect model validity. Second, the MDV dataset is limited to patients treated within an acute care hospital setting who may be sicker and have more comorbid disease than would otherwise be observed in the broader Japanese population. Therefore, inferences gleaned from this study may not be applicable to patients solely seen in primary care or people who have had no recent healthcare interaction, and any comparisons of the algorithm performance seen in this study with those of CHARGE-AF and the Suita study should be viewed in this context. The algorithm developed in this study distinguished patients with diagnosed NVAF from those without a diagnosis, however it remains possible that patients with NVAF who are currently undiagnosed are distinct to those known. This possibility as well as algorithms to identify such patients should be explored in future studies.

    We selected variables based on known risk factors and data clinically available in general hospital practice. However, requiring patients to have at least 60% of test data available may have introduced bias if there are systematic differences between patients with a lab test recorded and those without, and therefore patients with >60% of data available may not be representative of a general hospital-admitted population. Additionally, there is very limited information on potential confounding factors such as socioeconomic status, smoking [25] or alcohol intake [26] which might improve or maximize the machine learning based algorithm to avoid missing latent AF patients. As we comment above, our results should be regarded as speculative rather than definitive: they represent results from what can be done using very large-scale, routinely collected administrative data and require further work, in different study designs, to confirm or refute the findings.

    Conclusion

    Machine-learning based algorithms can detect NVAF with accuracy. Although additional validation is needed, this methodology could encourage a new approach to improve the detection of patients with NVAF in the general population in an aging society like Japan.

    Future perspective

    Earlier detection of NVAF is key to prevent stroke and improve outcomes of patients, particularly in increasingly aging societies like Japan. Efforts to improve screening and earlier detection of at-risk patients will continue to evolve in coming years. As health-related data proliferate and advanced analytical techniques to handle such data continue to evolve alongside such efforts, the ability of these innovate approaches to be applied in everyday clinical settings to support decision-making for the detection of the NVAF may become realized. Indeed, machine learning is playing an increasing role within many areas of healthcare, including but not limited to earlier diagnosis and has the potential to transform clinical care.

    Summary points
    • Nonvalvular atrial fibrillation (NVAF) is associated with an increased risk of stroke, however many patients are only diagnosed after onset.

    • Efforts to support earlier diagnosis of NVAF are needed and machine-learning algorithms which are able to handle large amounts of complex data offer a potentially exciting approach to improve identification of and potentially earlier diagnosis of NVAF.

    • The objective of this study was to assess the potential of machine-learning based algorithms to detect NVAF in a Japanese patient population using data obtained from a national administrative claims database.

    • Compared with the non-AF cohort, patients in the NVAF cohort were older, more were male, and tended to have a higher comorbidity burden.

    • All machine learning algorithms identified NVAF with a high level of accuracy; the stacking model which combined multiple algorithms outperformed single-model approaches, although differences between model performance were small.

    • This study looked factors that distinguish patients with known NVAF from those without a diagnosis, but it remains possible that patients with undiagnosed NVAF are distinct to this patient group, and this possibility should be explored in future studies.

    • Misdiagnosed NVAF patients could have been included in non-AF cohort, which could impact model validity.

    • As the algorithms were created using NVAF diagnoses recorded in an administrative database in Japan, validation using other data of different sources such as electrical medical record-based data or physical examination data is needed.

    Supplementary data

    To view the supplementary data that accompany this paper please visit the journal website at: http://www.futuremedicine.com/doi/suppl/10.2217/fca-2019-0056

    Author contributions

    R Suzuki contributed to the whole study management, development of the final protocol and the manuscript, and interpretation of the results. J Katada contributed to the final study design, development of the protocol and the analytical plan and preparation of the manuscript. S Ramagopalan and L McDonald contributed to study design, interpretation of the results and development of the algorithm. All authors critically revised successive drafts the draft manuscript and approved the final one.

    Acknowledgments

    Cancer Scan Co. Ltd. contributed to data cleaning and data analysis for this study.

    Financial & competing interests disclosure

    This study was supported by Bristol-Myers Squibb K.K and Pfizer Japan Inc. R Suzuki was a full-time employee of BMS K.K. J Katada was a full-time employee of Pfizer Japan Inc. S Ramagopalan and L McDonald were full-time employee of BMS UK. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

    No writing assistance was utilized in the production of this manuscript.

    Ethical conduct of research

    The authors state that they have followed the principles outlined in the Declaration of Helsinki for all human or animal experimental investigations. This study was conducted by using a structured, de-identified database which does not require patient consent according to the local regulations.

    Open access

    This work is licensed under the Attribution-NonCommercial-NoDerivatives 4.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/

    Data sharing statement

    Data may be obtained from a third party and are not publicly available. Raw data will be able to be purchased from Medical Data Vision Co. Ltd. (Tokyo Japan).

    Papers of special note have been highlighted as: • of interest; •• of considerable interest

    References

    • 1. Go AS, Hylek EM, Phillips KA et al. Prevalence of diagnosed atrial fibrillation in adults: national implications for rhythm management and stroke prevention: the AnTicoagulation and risk factors in atrial fibrillation (ATRIA) study. JAMA 285(18), 2370–2375 (2001). • Epidemiology of atrial fibrillation.
    • 2. Chugh SS, Havmoeller R, Narayanan K et al. Worldwide epidemiology of atrial fibrillation: a Global Burden of Disease 2010 Study. Circulation 129(8), 837–847 (2014).
    • 3. Bruggenjurgen B, Rossnagel K, Roll S et al. The impact of atrial fibrillation on the cost of stroke: the berlin acute stroke study. Value Health 10(2), 137–143 (2007).
    • 4. Winter Y, Wolfram C, Schaeg M et al. Evaluation of costs and outcome in cardioembolic stroke or TIA. J. Neurol. 256(6), 954–963 (2009).
    • 5. Suissa L, Lachaud S, Mahagne MH. Optimal timing and duration of continuous electrocardiographic monitoring for detecting atrial fibrillation in stroke patients. J. Stroke Cerebrovasc. Dis. 22(7), 991–995 (2013).
    • 6. Turakhia MP, Shafrin J, Bognar K et al. Estimated prevalence of undiagnosed atrial fibrillation in the United States. PLoS ONE 13(4), e0195088 (2018).
    • 7. Seet RC, Friedman PA, Rabinstein AA. Prolonged rhythm monitoring for the detection of occult paroxysmal atrial fibrillation in ischemic stroke of unknown cause. Circulation 124(4), 477–486 (2011).
    • 8. Samol A, Masin M, Gellner R et al. Prevalence of unknown atrial fibrillation in patients with risk factors. Europace 15(5), 657–662 (2013).
    • 9. Schnabel RB, Sullivan LM, Levy D et al. Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study. Lancet 373(9665), 739–745 (2009). •• A risk score for de novo development of atrial fibrillation in the Framingham Heart Study.
    • 10. Alonso A, Krijthe BP, Aspelund T et al. Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF consortium. J. Am. Heart Assoc. 2(2), e000102 (2013). •• A risk model to predict incidence of atrial fibrillation in general US and EU cohorts.
    • 11. Guo Y, Tian Y, Wang H, Si Q, Wang Y, Lip GYH. Prevalence, incidence, and lifetime risk of atrial fibrillation in China: new insights into the global burden of atrial fibrillation. Chest 147(1), 109–119 (2015).
    • 12. Kokubo Y, Watanabe M, Higashiyama A, Nakao YM, Kusano K, Miyamoto Y. Development of a basic risk score for incident atrial fibrillation in a Japanese general population – The Suita study. Circ. J. 81(11), 1580–1588 (2017). •• Risk score for de novo development of atrial fibrillation in Japanese general population.
    • 13. Kohsaka S, Katada J, Saito K, Terayama Y. Safety and effectiveness of apixaban in comparison to warfarin in patients with nonvalvular atrial fibrillation: a propensity-matched analysis from Japanese administrative claims data. Curr. Med. Res. Opin. 34(9), 1627–1634 (2018).
    • 14. Kohsaka S, Murata T, Izumi N, Katada J, Wang F, Terayama Y. Bleeding risk of apixaban, dabigatran, and low-dose rivaroxaban compared with warfarin in Japanese patients with non-valvular atrial fibrillation: a propensity matched analysis of administrative claims data. Curr. Med. Res. Opin. 33(11), 1955–1963 (2017).
    • 15. Iguchi Y, Kimura K, Shibazaki K et al. HbA1c and atrial fibrillation: a cross-sectional study in Japan. Int. J. Cardiol. 156(2), 156–159 (2012).
    • 16. Akao M, Chun YH, Wada H et al. Current status of clinical background of patients with atrial fibrillation in a community-based survey: the Fushimi AF Registry. J. Cardiol. 61(4), 260–266 (2013).
    • 17. Xu D, Murakoshi N, Sairenchi T et al. Anemia and reduced kidney function as risk factors for new onset of atrial fibrillation (from the Ibaraki prefectural health study). Am. J. Cardiol. 115(3), 328–333 (2015).
    • 18. Marrakchi S, Kanoun F, Idriss S, Kammoun I, Kachboura S. Arrhythmia and thyroid dysfunction. Herz 40(Suppl. 2), 101–109 (2015).
    • 19. Li J, Agarwal SK, Alonso A et al. Airflow obstruction, lung function, and incidence of atrial fibrillation: the Atherosclerosis Risk in Communities (ARIC) study. Circulation 129(9), 971–980 (2014).
    • 20. Chong BH, Pong V, Lam KF et al. Frequent premature atrial complexes predict new occurrence of atrial fibrillation and adverse cardiovascular events. Europace 14(7), 942–947 (2012).
    • 21. Dewland TA, Vittinghoff E, Mandyam MC et al. Atrial ectopy as a predictor of incident atrial fibrillation: a cohort study. Ann. Intern. Med. 159(11), 721–728 (2013).
    • 22. Wachter R, Lahno R, Haase B et al. Natriuretic peptides for the detection of paroxysmal atrial fibrillation in patients with cerebral ischemia--the Find-AF study. PLoS ONE 7(4), e34351 (2012).
    • 23. Kuwabara M, Niwa K, Nishihara S et al. Hyperuricemia is an independent competing risk factor for atrial fibrillation. Int. J. Cardiol. 231, 137–142 (2017).
    • 24. Kokubo Y, Watanabe M, Higashiyama A et al. Interaction of blood pressure and body mass index with risk of incident atrial fibrillation in a Japanese urban cohort: the Suita study. Am. J. Hypertens. 28(11), 1355–1361 (2015).
    • 25. Aune D, Schlesinger S, Norat T, Riboli E. Tobacco smoking and the risk of atrial fibrillation: a systematic review and meta-analysis of prospective studies. Eur. J. Prev. Cardiol. 25(13), 1437–1451 (2018).
    • 26. Sano F, Ohira T, Kitamura A et al. Heavy alcohol consumption and risk of atrial fibrillation. The Circulatory Risk in Communities Study (CIRCS). Circ. J. 78(4), 955–961 (2014).