We use cookies to improve your experience. By continuing to browse this site, you accept our cookie policy.×
Skip main navigation
Aging Health
Bioelectronics in Medicine
Biomarkers in Medicine
Breast Cancer Management
CNS Oncology
Colorectal Cancer
Concussion
Epigenomics
Future Cardiology
Future Microbiology
Future Neurology
Future Oncology
Future Rare Diseases
Future Virology
Hepatic Oncology
HIV Therapy
Immunotherapy
International Journal of Endocrine Oncology
International Journal of Hematologic Oncology
Journal of 3D Printing in Medicine
Journal of Comparative Effectiveness Research
Lung Cancer Management
Melanoma Management
Nanomedicine
Neurodegenerative Disease Management
Pain Management
Pediatric Health
Personalized Medicine
Pharmacogenomics
Regenerative Medicine
Published Online:https://doi.org/10.2217/14622416.7.3.455

Objectives: To provide a mathematical introduction to the Wichita (KS, USA) clinical dataset, which is all of the nongenetic data (no microarray or single nucleotide polymorphism data) from the 2-day clinical evaluation, and show the preliminary findings and limitations, of popular, matrix algebra-based data mining techniques. Methods: An initial matrix of 440 variables by 227 human subjects was reduced to 183 variables by 164 subjects. Variables were excluded that strongly correlated with chronic fatigue syndrome (CFS) case classification by design (for example, the multidimensional fatigue inventory [MFI] data), that were otherwise self reporting in nature and also tended to correlate strongly with CFS classification, or were sparse or nonvarying between case and control. Subjects were excluded if they did not clearly fall into well-defined CFS classifications, had comorbid depression with melancholic features, or other medical or psychiatric exclusions. The popular data mining techniques, principle components analysis (PCA) and linear discriminant analysis (LDA), were used to determine how well the data separated into groups. Two different feature selection methods helped identify the most discriminating parameters. Results: Although purely biological features (variables) were found to separate CFS cases from controls, including many allostatic load and sleep-related variables, most parameters were not statistically significant individually. However, biological correlates of CFS, such as heart rate and heart rate variability, require further investigation. Conclusions: Feature selection of a limited number of variables from the purely biological dataset produced better separation between groups than a PCA of the entire dataset. Feature selection highlighted the importance of many of the allostatic load variables studied in more detail by Maloney and colleagues in this issue [1], as well as some sleep-related variables. Nonetheless, matrix linear algebra-based data mining approaches appeared to be of limited utility when compared with more sophisticated nonlinear analyses on richer data types, such as those found in Maloney and colleagues [1] and Goertzel and colleagues [2] in this issue.

Bibliography

  • Maloney EM, Gurbaxani BM, Jones JF, Coelho LdS, Pennachin C, Goertzel BN: Chronic fatigue syndrome and high allostatic load. Pharmacogenomics7(3), 467–473 (2006).Google Scholar
  • GoertzelBN, Pennachin C, Coelho LdS, Gurbaxani BM, Maloney EM, Jones JF: Combinations of single nucleotide polymorphisms in neuroendocrine effector and receptor genes predict chronic fatigue syndrome. Pharmacogenomics7(3), 475–483 (2006).Google Scholar
  • Dougherty ER: Feature-selection overfitting with small-sample classifier design. IEEE Intelligent Systems20(6), 64–66 (2005).CrossrefGoogle Scholar
  • Hastie T, Tibshiriani R, Friedman J: The Elements of Statistical Learning. Springer, New York, NY, USA, 371–406 (2001).CrossrefGoogle Scholar
  • Hastie T, Tibshiriani R, Friedman J: The Elements of Statistical Learning. Springer, New York, NY, USA, 485–491 (2001).CrossrefGoogle Scholar
  • Liu H: Evolving feature selection. IEEE Intelligent Systems20(6), 64 (2005).Crossref, CASGoogle Scholar
  • Reeves WC, Wagner D, Nisenbaum R et al.: Chronic fatigue syndrome – a clinically empirical approach to its definition and study. BMC Med.3, 19 (2005).Crossref, MedlineGoogle Scholar
  • Hastie T, Tibshiriani R, Friedman J: The Elements of Statistical Learning. Springer, New York, NY, USA, 91 (2001).CrossrefGoogle Scholar
  • Capuron L, Welberg L, Heim C et al.: Cognitive dysfunction relates to subjective report of mental fatigue in patients with chronic fatigue syndrome. Neuropsychopharmacology [Epub ahead of print] (2006).MedlineGoogle Scholar
  • 10  Goertzel BN, Pennachin C, Coelho LdS, Maloney EM, Jones JF, Gurbaxani BM: Allostatic load is associated with symptoms in chronic fatigue syndrome patients. Pharmacogenomics7(3), 485–494 (2006).Google Scholar
  • 11  Berens M, Liu H, Yu L: Fostering biological relevance in feature selection for microarray data. IEEE Intelligent Systems20(6), 71–73 (2005).Google Scholar
  • 12  Forman G: Feature selection: we've barely scratched the surface. IEEE Intelligent Systems20(6), 74–76 (2005).Google Scholar
  • 13  Smigrodzki R, Goertzel B, Pennachin C, Coelho L, Prosdocimi F, Parker WD: Genetic algorithm for analysis of mutations in Parkinson's disease. Artif. Intell. Med.35(3), 227–241 (2005).Crossref, MedlineGoogle Scholar
  • 14  Diaz-Avalos R, Long C, Fontano E et al.: Cross-β order and diversity in nanocrystals of an amyloid-forming peptide. J. Mol. Bio.330(5), 1165–1175 (2003).Google Scholar
  • 15  Nelson R, Sawaya MR, Balbirnie M et al.: Structure of the cross-β spine of amyloid-like fibrils. Nature435(7043), 773–778 (2005).Crossref, Medline, CASGoogle Scholar