We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
Skip main navigation
Aging Health
Bioelectronics in Medicine
Biomarkers in Medicine
Breast Cancer Management
CNS Oncology
Colorectal Cancer
Concussion
Epigenomics
Future Cardiology
Future Medicine AI
Future Microbiology
Future Neurology
Future Oncology
Future Rare Diseases
Future Virology
Hepatic Oncology
HIV Therapy
Immunotherapy
International Journal of Endocrine Oncology
International Journal of Hematologic Oncology
Journal of 3D Printing in Medicine
Lung Cancer Management
Melanoma Management
Nanomedicine
Neurodegenerative Disease Management
Pain Management
Pediatric Health
Personalized Medicine
Pharmacogenomics
Regenerative Medicine

Unveiling breast cancer risk profiles: a survival clustering analysis empowered by an online web application

    Yuan Gu

    *Author for correspondence:

    E-mail Address: uwin@gwu.edu

    Department of Statistics, The George Washington University, Washington, DC 20052, USA

    ,
    Mingyue Wang

    Department of Mathematics, Syracuse University, Syracuse, NY 13244, USA

    ,
    Yishu Gong

    Harvard T.H. Chan School of Public Health, Harvard University, Boston, NY 02115, USA

    ,
    Xin Li

    Department of Statistics, The George Washington University, Washington, DC 20052, USA

    ,
    Ziyang Wang

    Department of Computer Science, University of Oxford, Oxford, OX1 3QD, UK

    ,
    Yuli Wang

    Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA

    ,
    Song Jiang

    Department of Biochemistry, Huzhou Institute of Biological Products Co., Ltd., 313017, China

    ,
    Dan Zhang

    Department of Information Science and Engineering, Shandong University, Shan Dong, China

    &
    Chen Li

    Department of Biology, Chemistry and Pharmacy, Free University of Berlin, Berlin, 14195, Germany

    Published Online:https://doi.org/10.2217/fon-2023-0736

    Aim: To develop a shiny app for doctors to investigate breast cancer treatments through a new approach by incorporating unsupervised clustering and survival information. Materials & methods: Analysis is based on the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) dataset, which contains 1726 subjects and 22 variables. Cox regression was used to identify survival risk factors for K-means clustering. Logrank tests and C-statistics were compared across different cluster numbers and Kaplan–Meier plots were presented. Results & conclusion: Our study fills an existing void by introducing a unique combination of unsupervised learning techniques and survival information on the clinician side, demonstrating the potential of survival clustering as a valuable tool in uncovering hidden structures based on distinct risk profiles.

    Papers of special note have been highlighted as: • of interest; •• of considerable interest

    References

    • 1. Jarman IH, Etchells TA, Martin JD, Lisboa PJ. An integrated framework for risk profiling of breast cancer patients following surgery. Artif. Intell. Med. 42(3), 165–188 (2008).
    • 2. Milioli HH, Tishchenko I, Riveros C, Berretta R, Moscato P. Basal-like breast cancer: molecular profiles, clinical features and survival outcomes. BMC Med. Genomics 10(1), 19 (2017).
    • 3. Rosner B, Glynn RJ, Tamimi RM et al. Breast cancer risk prediction with heterogeneous risk profiles according to breast cancer tumor markers. Am. J. Epidemiol. 178(2), 296–308 (2013). •• This study is directly related to our topic and provides a strategy for evaluating heterogeneity of risk factor associations by tumor marker levels while controlling for additional tumor markers.
    • 4. Caswell-Jin JL, Callahan A, Purington N et al. Treatment and monitoring variability in US metastatic breast cancer care. JCO Clin. Cancer Inform. 5, 600–614 (2021).
    • 5. Decock J, Long JR, Laxton RC et al. Association of matrix metalloproteinase-8 gene variation with breast cancer prognosis. Cancer Res. 67(21), 10214–10221 (2007).
    • 6. Hussain MS, Majami AA, Ali H et al. The complex role of MEG3: an emerging long non-coding RNA in breast cancer. Pathol. Res. Pract. 251, 154850 (2023).
    • 7. Nagle PW, Plukker JTM, Muijs CT, Van Luijk P, Coppes RP. Patient-derived tumor organoids for prediction of cancer treatment response. Semin. Cancer Biol. 53, 258–264 (2018).
    • 8. Weigelt B, Bissell MJ. Unraveling the microenvironmental influences on the normal mammary gland and breast cancer. Semin. Cancer Biol. 18(5), 311–321 (2008).
    • 9. Zeng Y, Li S, Zhang S, Wang L, Yuan H, Hu F. Cell membrane coated-nanoparticles for cancer immunotherapy. Acta Pharm. Sin. B 12(8), 3233–3254 (2022).
    • 10. Heng HH, Bremer SW, Stevens JB, Ye KJ, Liu G, Ye CJ. Genetic and epigenetic heterogeneity in cancer: a genome-centric perspective. J. Cell Physiol. 220(3), 538–547 (2009).
    • 11. Sariego J. Patterns of breast cancer presentation in the United States: does geography matter? Am. Surg. 75(7), 545–549; discussion 549–550 (2009).
    • 12. Akechi T, Momino K, Miyashita M, Sakamoto N, Yamashita H, Toyama T. Anxiety in disease-free breast cancer patients might be alleviated by provision of psychological support, not of information. Jpn J. Clin. Oncol. 45(10), 929–933 (2015).
    • 13. Olopade OI, Grushko TA, Nanda R, Huo D. Advances in breast cancer: pathways to personalized medicine. Clin. Cancer Res. 14(24), 7988–7999 (2008).
    • 14. Yang Q, Yu X, Lee HH et al. Single slice thigh CT muscle group segmentation with domain adaptation and self-training. J. Med. Imaging (Bellingham) 10(4), 044001 (2023).
    • 15. Curtis C, Shah SP, Chin SF et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403), 346–352 (2012).
    • 16. Meselhy Eltoukhy M, Faye I, Belhaouari Samir B. A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation. Comput. Biol. Med. 42(1), 123–128 (2012).
    • 17. Raykov YP, Boukouvalas A, Baig F, Little MA. What to do when K-means clustering fails: a simple yet principled alternative algorithm. PLOS ONE 11(9), e0162259 (2016).
    • 18. Rich JT, Neely JG, Paniello RC, Voelker CC, Nussenbaum B, Wang EW. A practical guide to understanding Kaplan–Meier curves. Otolaryngol. Head Neck Surg. 143(3), 331–336 (2010). •• A practical guide for the Kaplan–Meier curves in the medical study.
    • 19. Christensen E. Multivariate survival analysis using Cox's regression model. Hepatology 7(6), 1346–1358 (1987). •• One of the foundation papers for the Cox regression model which we used in our study.
    • 20. Zhang Z, Reinikainen J, Adeleke KA, Pieterse ME, Groothuis-Oudshoorn CGM. Time-varying covariates and coefficients in Cox regression models. Ann. Transl. Med. 6(7), 121 (2018).
    • 21. Bradburn MJ, Clark TG, Love SB, Altman DG. Survival analysis part II: multivariate data analysis – an introduction to concepts and methods. Br. J. Cancer 89(3), 431–436 (2003). •• The bible for multivariate survival analysis, which we used in our study.
    • 22. Lanczky A, Gyorffy B. Web-based survival analysis tool tailored for medical research (KMplot): development and implementation. J. Med. Internet Res. 23(7), e27633 (2021). • Another study which developed the web application of Kaplan–Meier plot for survival analysis.
    • 23. Gu Y, Gong Y, Wang M, Jiang S, Li C, Yuan Z. Enhancing kidney failure analysis: web application development for longitudinal trajectory clustering. medRxiv doi: 10.1101/2023.05.31.23290804 (2023).
    • 24. Dwivedi B, Mumme H, Satpathy S, Bhasin SS, Bhasin M. Survival Genie, a web platform for survival analysis across pediatric and adult cancers. Sci. Rep. 12(1), 3069 (2022).
    • 25. Zhang X, Yu H, Xie Q et al. Design study of a PET detector with 0.5 mm crystal pitch for high-resolution preclinical imaging. Phys. Med. Biol. 66(13), (2021).
    • 26. Yang J, Zhao S, Wang J, Sheng Q, Liu Q, Shyr Y. Immu-Mela: an open resource for exploring immunotherapy-related multidimensional genomic profiles in melanoma. J. Genet. Genomics 48(5), 361–368 (2021).
    • 27. Zhang R, Ma Y, Ren J. Green development performance evaluation based on dual perspectives of level and efficiency: a case study of the Yangtze River economic belt, China. Int. J. Environ Res. Public Health 19(15), (2022).
    • 28. Lundin M, Lundin J, Burke HB, Toikkanen S, Pylkkanen L, Joensuu H. Artificial neural networks applied to survival prediction in breast cancer. Oncology 57(4), 281–286 (1999). • This interesting paper introduces neural network modeling on survival analysis and also prediction on breast cancer.
    • 29. Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK. Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med. Inform. Decis. Mak. 19(1), 48 (2019). • This is a study using machine learning to work on predicting risk factors for survival, they handle the missing data by using multiple imputation.
    • 30. Feng A, Xue Y, Wang Y et al. Label propagation via random walk for training robust thalamus nuclei parcellation model from noisy annotations. ArXiv (2023). https://arxiv.org/abs/2303.17706
    • 31. Wang Y, Deng Y, Tan Y, Zhou M, Jiang Y, Liu B. A comparison of random survival forest and Cox regression for prediction of mortality in patients with hemorrhagic stroke. BMC Med. Inform. Decis. Mak. 23(1), 215 (2023).
    • 32. Mariani L, Coradini D, Biganzoli E et al. Prognostic factors for metachronous contralateral breast cancer: a comparison of the linear Cox regression model and its artificial neural network extension. Breast Cancer Res. Treat. 44(2), 167–178 (1997).
    • 33. Vahdaninia M, Montazeri A. Breast cancer in Iran: a survival analysis. Asian Pac. J. Cancer Prev. 5(2), 223–225 (2004).
    • 34. Hajihosseini M, Faradmal J, Sadighi-Pashaki A. Survival analysis of breast cancer patients after surgery with an intermediate event: application of illness-death model. Iran J. Public Health 44(12), 1677–1684 (2015).
    • 35. Li Z, Li Z, Chen Q et al. Machine-learning-assisted spontaneous Raman spectroscopy classification and feature extraction for the diagnosis of human laryngeal cancer. Comput. Biol. Med. 146, 105617 (2022).
    • 36. Cutler DR, Edwards TC Jr, Beard KH et al. Random forests for classification in ecology. Ecology 88(11), 2783–2792 (2007).
    • 37. Hopken MW, Gilfillan D, Gilbert AT et al. Biodiversity indices and random forests reveal the potential for striped skunk (Mephitis mephitis) fecal microbial communities to function as a biomarker for oral rabies vaccination. PLOS ONE 18(8), e0285852 (2023).
    • 38. Pontil M, Verri A. Properties of support vector machines. Neural Comput. 10(4), 955–974 (1998).
    • 39. El Haji H, Souadka A, Patel BN et al. Evolution of breast cancer recurrence risk prediction: a systematic review of statistical and machine learning-based models. JCO Clin. Cancer Inform 7, e2300049 (2023).
    • 40. Kourounis G, Elmahmudi AA, Thomson B, Hunter J, Ugail H, Wilson C. Computer image analysis with artificial intelligence: a practical introduction to convolutional neural networks for medical professionals. Postgrad. Med. J. 99(1178), 1287–1294 (2023).
    • 41. Chi CL, Street WN, Wolberg WH. Application of artificial neural network-based survival analysis on two breast cancer datasets. AMIA Annu. Symp. Proc. 2007, 130–134 (2007).
    • 42. Tong L, Mitchel J, Chatlin K, Wang MD. Deep learning based feature-level integration of multi-omics data for breast cancer patients survival analysis. BMC Med. Inform. Decis. Mak. 20(1), 225 (2020).
    • 43. Ye B, Shi J, Kang H et al. Advancing pan-cancer gene expression survial analysis by inclusion of non-coding RNA. RNA Biol. 17(11), 1666–1673 (2020).
    • 44. Zhang H, Wang Y, Qi J, Abbaszadeh S. Penalized maximum-likelihood reconstruction for improving limited-angle artifacts in a dedicated head and neck PET system. Phys Med. Biol. 65(16), 165016 (2020).
    • 45. Banu A, Ahmed R, Musleh S, Shah Z, Househ M, Alam T. Predicting overall survival in METABRIC cohort using machine learning. Stud. Health Technol. Inform 305, 632–635 (2023).
    • 46. Huang E, Cheng SH, Dressman H et al. Gene expression predictors of breast cancer outcomes. Lancet 361(9369), 1590–1596 (2003).
    • 47. Sorlie T, Tibshirani R, Parker J et al. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc. Natl Acad. Sci. USA 100(14), 8418–8423 (2003).
    • 48. Lu X, Lu X, Wang ZC, Iglehart JD, Zhang X, Richardson AL. Predicting features of breast cancer with gene expression patterns. Breast Cancer Res. Treat 108(2), 191–201 (2008).
    • 49. Wodzinski M, Ciepiela I, Kuszewski T, Kedzierawski P, Skalski A. Semi-supervised deep learning-based image registration method with volume penalty for real-time breast tumor bed localization. Sensors (Basel) 21(12), (2021).
    • 50. Huang Z, Zhan X, Xiang S et al. SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Front. Genet. 10, 166 (2019).
    • 51. Zhou P, Liu Z, Wu H, Wang Y, Lei Y, Abbaszadeh S. Automatically detecting bregma and lambda points in rodent skull anatomy images. PLOS ONE 15(12), e0244378 (2020).
    • 52. Li Z, Li Z, Chen Q et al. Detection of pancreatic cancer by convolutional-neural-network-assisted spontaneous Raman spectroscopy with critical feature visualization. Neural Netw. 144, 455–464 (2021).
    • 53. Yang J, Zhao S, Wang J, Sheng Q, Liu Q, Shyr Y. A pan-cancer immunogenomic atlas for immune checkpoint blockade immunotherapy. Cancer Res. 82(4), 539–542 (2021).
    • 54. Domanski MJ, Tian X, Wu CO et al. Time course of LDL cholesterol exposure and cardiovascular disease event risk. J. Am. Coll. Cardiol. 76(13), 1507–1516 (2020).
    • 55. Sachdev V, Tian X, Gu Y et al. A phenotypic risk score for predicting mortality in sickle cell disease. Br. J. Haematol. 192(5), 932–941 (2021).
    • 56. Wang S, Li C, Wang R et al. Annotation-efficient deep learning for automatic medical image segmentation. Nat. Commun. 12(1), 5915 (2021).
    • 57. Wang Y, Li Y, Yi F et al. Two-crossed-polarizers based optical property modulation method for ionizing radiation detection for positron emission tomography. Phys. Med. Biol. 64(13), 135017 (2019).
    • 58. Mukherjee A, Russell R, Chin SF et al. Associations between genomic stratification of breast cancer and centrally reviewed tumour pathology in the METABRIC cohort. NPJ Breast Cancer 4, 5 (2018).
    • 59. Bennett J, Pomaznoy M, Singhania A, Peters B. A metric for evaluating biological information in gene sets and its application to identify co-expressed gene clusters in PBMC. PLoS Comput. Biol. 17(10), e1009459 (2021).
    • 60. Milioli HH, Vimieiro R, Tishchenko I, Riveros C, Berretta R, Moscato P. Iteratively refining breast cancer intrinsic subtypes in the METABRIC dataset. BioData Min. 9, 2 (2016).
    • 61. Mucaki EJ, Baranova K, Pham HQ et al. Predicting outcomes of hormone and chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) study by biochemically-inspired machine learning. F1000Res 5, 2124 (2016).
    • 62. Carene D, Tran-Dien A, Lemonnier J et al. Association between FGFR1 copy numbers, MAP3K1 mutations, and survival in axillary node-positive, hormone receptor-positive, and HER2-negative early breast cancer in the PACS04 and METABRIC studies. Breast Cancer Res. Treat 179(2), 387–401 (2020).
    • 63. Jordan AM, Weingarten J, Murphy WM. Transitional cell neoplasms of the urinary bladder. Can biologic potential be predicted from histologic grading? Cancer 60(11), 2766–2774 (1987).
    • 64. Carter CL, Allen C, Henson DE. Relation of tumor size, lymph node status, and survival in 24,740 breast cancer cases. Cancer 63(1), 181–187 (1989).
    • 65. Galea MH, Blamey RW, Elston CE, Ellis IO. The Nottingham Prognostic Index in primary breast cancer. Breast Cancer Res. Treat. 22(3), 207–219 (1992).
    • 66. Dowsett M, Cooke T, Ellis I et al. Assessment of HER2 status in breast cancer: why, when and how? Eur. J. Cancer 36(2), 170–176 (2000).
    • 67. Hess KR, Pusztai L, Buzdar AU, Hortobagyi GN. Estrogen receptors and distinct patterns of breast cancer relapse. Breast Cancer Res. Treat 78(1), 105–118 (2003).
    • 68. Glass AG, Lacey JV Jr, Carreon JD, Hoover RN. Breast cancer incidence, 1980–2006: combined roles of menopausal hormone therapy, screening mammography, and estrogen receptor status. J. Natl Cancer Inst. 99(15), 1152–1161 (2007).
    • 69. Bayati H, Davoudi H, Fatemizadeh E. A heuristic method for finding the optimal number of clusters with application in medical data. Ann. Int. Conf. IEEE Eng. Med. Biol. Soc. 2008, 4684–4687 (2008).
    • 70. Novak A, Hornyak B, Razso Z et al. The introduction of health behavior profiles in the Hungarian Defense Forces: a cluster analysis of lifestyle factors according to the health screening tests performed in 2011–2015. Int. J. Occup. Med. Environ. Health 32(1), 99–114 (2019).
    • 71. Negri T, Mantri S, Angelov A et al. A rapid and efficient strategy to identify and recover biosynthetic gene clusters from soil metagenomes. Appl. Microbiol. Biotechnol. 106(8), 3293–3306 (2022).