Sensitivity and specificity of an eye movement tracking-based biomarker for concussion

Object: The purpose of the current study is to determine the sensitivity and specificity of an eye tracking method as a classifier for identifying concussion. Methods: Brain injured and control subjects prospectively underwent both eye tracking and Sport Concussion Assessment Tool 3. The results of eye tracking biomarker based classifier models were then validated against a dataset of individuals not used in building a model. The area under the curve (AUC) of receiver operating characteristics was examined. Results: An optimal classifier based on best subset had an AUC of 0.878, and a cross-validated AUC of 0.852 in CT- subjects and an AUC of 0.831 in a validation dataset. The optimal misclassification rate in an external dataset (n = 254) was 13%. Conclusion: If one defines concussion based on history, examination, radiographic and Sport Concussion Assessment Tool 3 criteria, it is possible to generate an eye tracking based biomarker that enables detection of concussion with reasonably high sensitivity and specificity.

The complexity of diagnosis for concussion reflects many factors. First among these is the functional and anatomic variability of the normal brain. Even identical twins have differences in personality, behavior and abilities resulting in differential brain morpho metry [1]. Children with poorer academic achievement scores may do worse on concussion detection tests that require reading [2]. For this reason, many concussion detection tests have required baseline assessments. A second factor is that the human brain is constantly changing over time. A classic example of this is the difference in functional MRI activated by speech as a child learns to read [3]. Tests requiring baselines are particularly vulnerable to developmental influence, learning curves, practice effect [4] and volitional exaggeration [5,6]. Third is the variability of brain injury itself. No two blows to the head can result in the exact same pattern of injury. A fourth factor is obfuscation by nonbrain injury and other factors which can result in headache, nausea, vomiting, dizziness and other symptoms mimicking brain injury.
Emergency department (ED) assessment of concussion patients generally includes history and physical examination, but can also include CT imaging, which does not quantitate concussion. Concussion may be a diagnosis of exclusion in the ED, or it may be overlooked, as the principal purpose of the ED visit is to ensure the absence of preventable morbidity and mortality.
The Sport Concussion Assessment Tool 3 (SCAT3) was designed to assess concussion signs and symptoms in athletes and has been validated in several studies, with subsets validated as SCAT2 [7][8][9][10][11][12][13]. Several other studies assessing diagnostics for concussion have relied on SCAT3 testing to assess the extent of the condition, and thus we have proceeded similarly [7][8][9][10][11][12][13]. We selected a symptom severity score (SSS) of >40 and standardized assessment of concussion score (SAC) ≤24 Sensitivity and specificity of an eye movement tracking-based biomarker for concussion based on prior concussion studies with athletes and civilians [14,15]. This definition of concussion is consistent with Center for Disease Control descriptions of characteristics of concussion.
We have developed an eye tracking algorithm that detects cranial nerve palsies and is sensitive for detection of acute mass effect in the brain [16]. The algorithm also detects disruption of pathways controlling eye movements associated with structural traumatic brain injury (TBI) and concussion [17]. Eye tracking is performed while a subject watches television or a video moving inside an aperture with a set trajectory for 220 s at a fixed distance from a viewing monitor. The position of each pupil is recorded over time elapsed as the video travels on its time course, enabling detection of impaired ability to rotate the eyes relative to time and therefore relative to each other. In our previous work, we demonstrated that the severity of disconjugate gaze in ED structural TBI and concussion patients detectable with this algorithm was proportionate to the severity of concussion symptoms. Eye tracking also improved over time after both structural brain injury and concussion, with the former patients improving more slowly [17].
The purpose of the current study is to determine the sensitivity and specificity of our eye tracking metrics as a biomarker for concussion based on a classifier function.

Subject selection
Control subjects were employees, volunteers, visitors and patients at the Bellevue Hospital Center recruited in accordance with Institutional Review Board policy. Inclusion criteria for normal control subjects were: age 18-60 years, vision correctable to within 20/500 bilaterally, intact ocular motility and ability to provide a complete ophthalmologic, medical and neurologic history as well as medications/drugs/alcohol consumed within the 24 h prior to tracking. Exclusion criteria were history of: strabismus, diplopia, palsy of cranial nerves III, IV or VI, papilledema, optic neuritis or other known disorder affecting cranial nerve II, macular edema, retinal degeneration, dementia or cognitive impairment, hydrocephalus, sarcoidosis, myasthenia gravis, multiple sclerosis or other demyelinating disease, and active or acute epilepsy, stroke/hemorrhage or brain injury sufficiently significant to result in hospitalization. Subjects reporting any minor brain injury regardless of loss of consciousness were also excluded.
All trauma patients were recruited from the Bellevue Hospital Emergency Services (Emergency Room and Trauma Bay), trauma service and neurosurgery service. They were between the ages of 18 and 60, subject to the same exclusion requirements as controls except for head injury, consentable and able/willing to participate in the study. Both structural and nonstructural brain injury patients needed to have obtained a CT scan of the head prior to consideration for study enrollment. Trauma exclusion criteria included patients suffering burns, anoxic injury or multiple/extensive injuries resulting in any medical, surgical or hemodynamic instability. Structural brain injury was defined as final CT scan reading (by an attending physician radiologist) demonstrating the presence of hemorrhage (subdural, epidural,  Sensitivity & specificity of an eye movement tracking-based biomarker for concussion Research Article subarachnoid or intraparenchymal), brain contusion or full-thickness skull fracture consistent with acute brain injury. Structural brain injury patients were considered eligible for recruitment for up to 2 weeks after injury or surgery as long as they exhibited evidence of not yet being fully recovered from the brain injury (e.g., were still hospitalized.) No structural TBI patients were recruited preoperatively; they either had nonsurgical injuries or were recruited postsurgically. SCAT3 assessments were administered at the time of eye tracking by research personnel blinded to the eye tracking findings to patients blinded to their eye tracking results.
For the purposes of assessing eye movement as a biomarker for concussion, we defined concussion as: traumatic injury resulting in ED evaluation, sufficient indication for a CT scan of the brain, which was negative for structural brain or skull injury. Criteria for obtaining a head CT in the emergency room and trauma bay were based on Level One trauma center/ ATLS/ACEP guidelines in accordance with the discretion of the individual examining physician responsible for the care of the patient. SCAT3 SSS of >40 and SCAT3 SAC score ≤24. Subjects meeting these four criteria were considered 'true positives' for concussion.

Visual stimulus
We recorded subjects' eye movements with an Eyelink 1000 eye tracker at a fixed distance of 55 cm from a computer monitor over a time period of 220 s. The distance was fixed by means of a chinrest attached to the base of the viewing monitor and camera. Subjects were seated in either a height adjustable or heightfixed chair or bed, with the monitor height adjusted to the subject as described previously [17]. The visual stimuli were the music videos Shakira Waka-Waka, K'naan Wavin' Flag or Disney videos from Puss in   Sensitivity & specificity of an eye movement tracking-based biomarker for concussion Research Article The afferent stimulus was presented binocularly and eye tracking was performed binocularly. Subjects were not spatially calibrated to the tracker to enable independent analysis of each pupil position over time.

Data analysis
The eye tracker sampled pupil position at 500 Hz, yielding 100,000 samples over 200 s. We created scatterplots of the entire time series by plotting the 100,000 (x,y) pairs representing the two orthogonal components of the pupil position estimated by pupilcornea reflection measurement over time to create 'box trajectories' that reflected the temporal nature of the pupillary movement. These figures look like boxes, reflecting the timing of the aperture as it moved around the screen (Figure 1) with each 10 s of data collection representing one unit of ocular traverse. Horizontally, the pupil traveled approximately 34° over 10 s and vertically it traveled approximately 23° in 10 s. Two-hundred data points prior to and following each blink were removed prior to creating the measures of disconjugacy and aspect ratio to limit noise in the data from the blink event.
Typical eye tracking experiments feature a gazepoint-fixation-based calibration system to train the eye tracker's internal model to be able to accurately predict the subject's gaze position on the screen. Our algorithm is not training a model eye gaze model nor is it concerned about the accurate localization of gaze on a screen. We do not need to account for error(s) of spatial gaze position as a subject views a particular point on the screen, but are rather interested in physi-ologic capability. Thus we take raw pupil coordinates from the EyeLink device transform the data based on values from each eye respectively and not mixing values across eyes, consistent with our assumption that brain injured patients have eyes that may not move together.
Without spatial calibration, exact measurements of error in the spatial domain are impossible. Our analysis avoided this problem by deeming it irrelevant what exactly the subject is viewing and assessing the eye movement trajectories in the time domain, rather than the spatial domain. By using a constantly changing stimulus (a continuously playing movie) with a periodic envelope (the aperture trajectory), we were able to look at relative eye movements over time. Effectively, each subject's mean trajectory over the path of the aperture served as its own calibration. To clarify regarding the temporal nature of the boxes, consider an aperture that circles the perimeter of the monitor twice. At 38 s after the start of the time series, the eye tracker will report a pair of values, (x 1 ,y 1 ). Seventy-eight seconds after the start of the time series, the stimulus aperture will appear in the same spatial location as it did at second 38, and the eye tracker will report a second pair of values, (x 2 ,y 2 ). To be concrete, assume that the two pairs were (-2.0,0.1) and (-2.0,-0.1). The trial triggered average of the data (i.e., an average across the repeated cycles, synchronized by time of cycle start) would result in the pair (-2,0), and that is the point indicated in the box plot, even though no actual pupil angle measures corresponded to that pair of numbers. Further with each cycle it is  not necessarily true that the pupils should be at the same exact Cartesian coordinate -on the contrary they are unlikely to overlap exactly given the size of the viewing aperture. Eighty-nine separate metrics were obtained from these 100,000 data points obtained over time that reflected functions of individual and conjugate eye movements. Thirty-two of these metrics were assessing function of only the left eye, 32 of the right eye and 25 of both eyes. Metrics were named first according to whether they reflected function of a single eye (right or left) or both eyes (conjugacy) [16,17]. They were further subdivided as reflecting horizontal (x) or vertical (y) eye movement. Additionally they were then further subdivided by the location of the stimulus trajectory (right, bottom, left or top) or in all directions (total). For example some metrics were based on transformed pupil coordinates such as height or width of the box trajectory (Figure 1) which represented mean or median values of a pupil position over time, whereas others were calculated from these single metrics. Aspect ratio = height/width of the trajectory, which relates function of CNIII relative to CN VI [16]. Area = height × width and thus represents total function of CN III and VI. Distance was calculated using transformed (x,y) Cartesian coordinates and Pythagorean theorem. Velocity was distance over time. The BOX metrics were combinatorial scores calculated from raw metrics.

Statistical analysis
True positives and negatives were age and gender balanced and their eye tracking metrics were compared using Bonferroni corrected Wilcoxon rank sum tests. To achieve a family-wise error rate of 0.05, a candidate biomarker must have a p-value below 0.05/89 = 0.00056. Ideally, a valid biomarker should be independent of gender or age. Accordingly, we evaluated the association between each eye tracking measure and age and gender in the full control sample and excluded those that were either significantly associated with gender (p ≤ 0.01) or age (p ≤ 0.05) for biomarker consideration. We built classifier functions using two model selection methods the 'best subset' model, and the least absolute shrinkage and selection operator method. To appraise the classifier, fourfold cross validation was repeated 1000-times to obtain an average AUC of the receiver operating characteristic (ROC) curve. We also utilized a random forests algorithm for obtaining a classifier. The random forests method builds a large bootstrap collection of logical trees, and then averages the individual predictions. An out-of-bag (OOB) error estimate is almost identical to that obtained by N-fold cross validation. The results of these eye tracking biomarker based classifier models were then validated against a dataset of individuals not used in building the model.

Results
In order to generate the classifier models, we first considered both CT+ and CT-patients as a total group of brain injured subjects. The brain injured group had 42 subjects, 34 of which were males. Since there were only eight female cases, we decided to focus on male subjects. The current data had 281 control subjects, 129 of which were males. To balance age, we obtained a sample of 34 male controls and 34 male cases. In the selected sample, the age distribution of cases and controls are not significantly different (p = 0.801) (Table 1).  Sensitivity & specificity of an eye movement tracking-based biomarker for concussion Research Article

Group comparisons
The 89 eye tracking measures of 34 controls and 34 brain injured cases were individually compared using the Wilcoxon rank sum test. The unadjusted p-values of selected measures are shown in Table 2. The eye tracking measures that remained significant after adjusting for multiple comparisons are shown in red below. It is likely that the number of significant variables would increase if we used more powerful multiple comparison adjustment methods, such as the bootstrap or Holm's method.

Biomarker generation
Among the 66 eye tracking measures that were not strongly associated with age or gender, 28 measures were found to be significantly different between controls and brain injured cases (p < 0.05). Four variables including conj_boxscore_value, conj_boxscore2_ value, conj_boxscore3_value and conj_boxscore5_ value were highly correlated, so only one, conj_box-score_value was used for further model building. The p-values for comparing concussion cases versus controls, male versus female and correlating metrics to age and the area under the ROC curve (AUC) for each predictor are shown in Table 3.

Model building in the balanced sample (training data)
The age and gender balanced sample with 34 concussions and 34 controls was used to build the mod- els. The selected variables, the AUC, cross-validated AUC, the misclassification rate and the cross-validated misclassification rate are shown in Table 4. We generated two models using the best subset approach due to a missing value of the conj_varAspect_ value variable in a case subject. When we deleted that case with the missing value and performed the best subset approach, we obtained the model including four predictors. When we deleted the variable conj_ varAspect_value and then performed the best subset approach, we generated the model with two predictors: right_distTop_value and conj_varX_value. The OOB misclassification rate for the random forest classifier is 27.9% using the 25 eye tracking measures shown in Table 3.

Model validation in an external dataset
We tested the classifier performance in a validation dataset consisted of 255 subjects (247 mixed gender controls and eight female cases), which were not included in the all-male 34/34 training data. The misclassification rates, numbers of true positives, false positives, false negatives, true negatives, sensitivity, specificity and AUC of the three models are shown in Table 5. Note that random forest methodology does not enable a calculation of AUC.

Analysis on the balanced sample excluding CT+ subjects
In order to focus the biomarker on concussion, as opposed to including both concussion and structural Sensitivity & specificity of an eye movement tracking-based biomarker for concussion Research Article brain injury, we removed the CT+ subjects from the balanced sample and redid the analysis. Age and gender are well balanced between the CT-cases and the controls in the balanced sample, while the validation dataset consisted of mixed-gender controls and female-only cases. The age distribution is not significantly different between CT-cases and controls (p-value = 0.665). Summary statistics of age in the CT-cases and the controls are shown in Table 6.
We built three classifier models using the approaches described above for the balanced sample excluding CT+ subjects. The selected variables, the AUC, crossvalidated AUC, the misclassification rate and the cross-validated misclassification rate are shown in the The OOB misclassification rate is 23.6% using the 25 eye tracking measures shown in Table 3.
We tested the model performance in the validation dataset excluding CT+ subjects (247 controls and seven concussions). The misclassification rates, numbers of true positives, false positives, false negatives, true negatives, sensitivity, specificity and AUC of the models are shown in Table 8. Again note that random forest methodology does not enable a calculation of AUC.
We then created an ROC curve of the balanced sample excluding CT+ subjects (21 cases and 34 controls) using the best subset approach (Figure 2).
Then we created an ROC curve of the balanced sample excluding CT+ subjects (21 cases and 34 controls) using the LASSO approach (Figure 3) We also created the ROC curves of the validation sample (247 controls and 7 concussions) using the best subset approach and LASSO approach (Figures 4 & 5).

Discussion
Brain injury is known to have an impact on smooth pursuit, saccades, fixation, pupil size, vergence and other aspects of gaze [18][19][20][21][22][23]. Eye movement tracking for the assessment of brain injury has previously been performed in patients with postconcussive symptoms to assess both intrinsic ocular capability [24][25][26] and attention [27]. We have developed an algorithm that interprets eye tracking data obtained while a subject watches a music video, cartoon or other short film clip of their choosing as it moves in an aperture on a viewing monitor. The positions of the pupils are mapped over time and metrics are obtained assessing alterations in movement. The technology is rapid, noninvasive, automatable, portable and does not require literacy in any particular language. Its objectivity arises from the fact that it assesses relatively passive eye movements rather than requiring a subject to follow instructions and move their eyes deliberately.
Previously we have demonstrated that this algorithm detects both clinical and subclinical cranial nerve palsies resulting from both direct nerve damage, and from intracranial mass effect in the supra-and infra-tentorial spaces. The ocular motility deficits were found to be reversible with correction of the neurosurgical problem [16]. We have also shown that brain injured subjects have greater ocular dysmotility than nonbrain injured subjects, while nonhead-injured trauma sub- Sensitivity & specificity of an eye movement tracking-based biomarker for concussion Research Article jects are not different from nonbrain-injured subjects. The severity of ocular motility dysfunction correlates with the severity of concussion symptoms in trauma subjects regardless of whether that injury can be seen on CT imaging. Deficits are worse in the 1-2 weeks after the injury and then recover in most patients at about 1 month postinjury [17].
Criticism [28] of our prior work with eye tracking of brain injured subjects focused on four points, which we address individually: • "Any asymmetry in the spatial relationship that the camera or the infrared light source has with the two fellow eyes would result in different extents of relocation of the images of the pupils or corneal reflections. Asymmetries exist because there is a physical separation between the two eyes as well as between the camera and the infrared light source." Asymmetry in the spatial relationship between camera, light source and eyes was controlled by using a chin and forehead rest fixed to the base of the viewing monitor and camera. By fully constraining this system, asymmetries were reduced. While there is physical separation between the eyes, this distance is a constant value in any given individual.
As C Tyler explained [28] in his refutation of the Maruta comments, if these asymmetries were an issue, "all patients would be equally subject to the same degree by the effects of asymmetry and lack of calibration. As stated, none of the criticisms suggest a systematic bias between the different patient categories. The significant differences among categories cannot therefore be attributed to any of the factors raised by the author, and controlling these factors should only improve the significance of [the] result"; • "[…although] eyes are highly symmetrical within individuals, they are not perfectly symmetrical 2 and a 1-2% nonconformity in corneal curvature or axial length is not uncommon, which further confounds the relationship between eye rotation and changes in pixel coordinates… mapping is not linear." Again the above refutation appliessuch a problem should affect controls as much as trauma subjects. In addition, our current paper describes numerous significant metrics not relying on measurements from both eyes, but rather from a single eye; • "Implementing a calibration procedure under monocular viewing" would achieve the same purpose as our algorithm. While this criticism is technically correct, it has been our experience that trauma patients are willing to watch a film clip for 220 s, but somewhat less willing to sit through an additional 5 min of monocular calibration despite the fact that this can easily be performed with a monocular occluder; • "Having a larger male-to-female ratio in one group could increase the extent of binocular asymmetry in uncalibrated data since men tend to have a larger interpupillary distance." In this current work, we present data demonstrating that there is no difference in horizontal conjugacy between male and female subjects.

Conclusion
In the current study, we establish that numerous parameters vary between brain injured subjects and controls ( Table 2), and that some of these parameters are relatively independent of age and gender (Table 3).
Ultimately we establish a relatively high sensitivity and specificity of this eye tracking algorithm for classifying concussion (Figures 1 & 2; Tables 7 & 8). Interestingly concussion had higher misclassification in the balanced sample (Table 7) than in the larger external validation dataset (Table 8). We suspect this may be because the balanced sample had patients who obligatorily had particular SCAT3 subset scores, which may imperfectly correlate with actual brain functionality. This misclassification rate also reflects a limitation of our methodology: specifically that there is currently no 'gold standard' diagnostic for concussion. Thus, generation of an AUC relies on our defining 'true positives' for concussion using best available standards. The SCAT3 SSS and SAC are at present the most widely validated measures for concussion. Data suggesting that some patients may maintain cognitive functioning even in the presence of structural brain injury underscores the complexity of brain function and injury [29]. A different patient with the same injury may have higher or lower functional cognitive assessment dependent on baseline capabilities. Thus, some of the 'misclassification' associated with our classifier may be due to the inadequacy of the SCAT3 subcomponents rather than of eye tracking. Additional limitations of our study are that the validation data set of 7 concussions in 254 subjects is relatively small and that the control group excluded individuals with prior recent brain injury. If the eye tracking metrics are highly interdependent, the chance of type II errors becomes higher with corrections for multiple comparison. Also trauma patients had to have obtained a head CT to participate in the study, which may potentially imply that they were more severely injured than many concussion patients who do not receive a head CT. Finally, we have the limitation of having a relatively large misclassification rate but this limitation is rendered less clinically dangerous due to the fact that most misclassifications are false positives. With our algorithm, to identify one true positive, six to eight negative people are classified falsely as positive. Since the medical risk of missing a case is greater than the risk of falsely classifying a negative patient as concussed this may be an acceptable risk. Consider for example, the imbalanced number of concussions and controls in our external validation dataset. One could imagine that a hypothetical model which randomly classifies everyone as normal would only have a 3% misclassification rate in an imbalanced sample such as ours. However, such a model would miss all seven concussions and thus hardly be optimal for patients.
The complexity of concussion does not lend itself well to a single diagnostic. Our eye tracking algorithm appears to be detecting at least two parameters: intracranial mass effect and disruption of neural pathways controlling ocular motility. It is logical to assume that 'concussions' not affecting these parameters will not be detectable with our algorithm.
While our current results are promising, additional data on potential confounders of eye tracking still need to be investigated. These include alcohol and other intoxicants, fatigue and prior history of trauma and neurologic or ophthalmic disorders among others. Future studies currently in progress will elaborate the role of these factors on eye tracking as a biomarker for concussion.

Executive summary
• Concussion is a condition that is not well defined; therefore it is difficult to diagnose.
• The purpose of this work is to determine the sensitivity and specificity of an eye movement tracking based biomarker for concussion.

Methods
• Brain injured subjects recruited through the Bellevue Hospital emergency department and normal uninjured controls were prospectively enrolled in a study in which both eye tracking while watching a short film clip for 220 s and Sport Concussion Assessment Tool (SCAT3) data were collected. • For the purposes of assessing eye movement as a biomarker for concussion, we defined concussion as traumatic injury resulting in emergency department evaluation, sufficient indication for a CT scan of the head, which was negative for structural brain or skull injury, SCAT3 symptom severity score of >40 and standardized assessment of concussion subset of SCAT3 ≤24. • True positives and negatives were age and gender balanced and their eye tracking metrics were compared using Bonferroni corrected Wilcoxon rank sum tests. • We built classifier functions using two model selection methods the 'best subset' model, and the least absolute shrinkage and selection operator (LASSO) method. We also utilized a random forests method of obtaining a classifier. • The results of these eye tracking biomarker based classifier models were then validated against a dataset of individuals not used in building the model.

Results
• Significant group differences between brain injured and concussed subjects versus negative controls were found for 28 eye tracking metrics that were not influenced by age or gender. These were used to develop the three classifier functions. • In a sample of 21 concussion cases versus age and gender balanced uninjured controls, the 'best subset' model selected four metrics and the resulting receiver operating characteristic of the classifier had an area under the curve (AUC) of 0.878, and a cross-validated AUC of 0.852. The LASSO model selected two metrics and resulted in an AUC of 0.880 and a cross-validated AUC of 0.826. • In an external dataset of 254 subjects (247 controls and 7 concussions), 'best subset' had a misclassification rate of 14.2%, LASSO had a misclassification rate of 13.8% and random forest had a misclassification rate of 13.0%.

Discussion
• If one defines concussion based on history, physical examination, radiographic and SCAT3 criteria, it is possible to generate an eye tracking based biomarker that enables detection of concussion with reasonably high sensitivity and specificity.
future science group future science group Sensitivity & specificity of an eye movement tracking-based biomarker for concussion Research Article ment with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.
No writing assistance was utilized in the production of this manuscript.

Ethical conduct
The authors state that they have obtained appropriate institutional review board approval or have followed the princi-ples outlined in the Declaration of Helsinki for all human or animal experimental investigations. In addition, for investigations involving human subjects, informed consent has been obtained from the participants involved.

Open access
This work is licensed under the Creative Commons Attribution 4.0 License. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/