The LM had the advantage of identifying some prognostic factors, associating each of them with an odds ratio, while the use of ML is limited by the difficulty of interpreting the model, often referred to as ‘black box’. This finding is perfectly in agreement with results recently obtained by Iosa et al.
[9], who compared the performance of classical regression, neural network, and cluster analysis in predicting the outcome of patients with stroke. Similarly, Gravesteijn et al.
[10] reached the same conclusions on TBI patients evaluating the different performance of logistic regression with respect to SVM, random forests, gradient boosting machines, and artificial neural networks. In terms of model performance, the
researchers' SVM and DT values are similar to those reported by Abujaber et al.
[6], whereas k-NN has never been employed in this clinical domain and this outperformed other ML approaches in all the evaluation metrics. The only algorithm that relatively underperformed was the NB. The accuracy (and sensitivity) was somewhat lower, passing from the analyzed to the test sample. The researchers' data conflict with those reported by Amorim et al.
[7] who described the excellent performance of this algorithm as a screening tool in predicting the functional outcome of TBI patients. This discrepancy could be mainly due to the use of different clinical predictors. Indeed, there is large heterogeneity in factors (i.e., age, gender, clinical severity, clinical comorbidities, systolic blood pressure, respiratory rate, lab values, and presence of mass lesion) identified as having a prognostic value in TBI patients, thus making a direct comparison between ML approaches difficult to perform
[6]. Another limit is due to the fact the dataset is unbalanced (62.8% negative vs. 37.2% positive outcome) and could negatively affect performances of machine learning. To overcome this issue and increase classification robustness,
Tthe resear
chhcers also applied the technique of LOOCV that is less affected by this problem and allows the
m researchers to compare four machine learning techniques since each type of algorithm performs predictions differently. For instance, the DT algorithm performs well with unbalanced datasets thanks to the splitting rules that look at class variables.
Moreover, the type of predictors, such as continuous and categorized (operator-dependent) variables and the lack of objective biological high-dimensional data (i.e., neuroimaging, genetics), might also limit the performance of ML techniques applied in this domain
[11]. The researchers' data would seem to confirm this hypothesis because of the change in identified predictors for classification. Indeed, as shown in
Figure 1 and
Figure 2, moving from 2 to 4 classes of outcome approaches impacts the most significant features extracted by predictive models. Apart from age, for reaching the excellent performance with the 2 classes approach, LM and ML algorithms need CRS-r values and ERBI values at T1, whereas, for the 4 classes approach, diagnosis at admission and sex are the most important features.
Figure 1. Comparison of accuracies between LM and ML models (2 classes of outcome). Legend: Linear Regression Model (LM), Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), Decision Tree (DT) and Ensemble of Machine Learning models (Ensemble ML). Symbols ***, p-values <0.001.
Figure 2. Comparison of accuracies between LM and ML models (4 classes of outcome). Legend: Linear Regression Model (LM), Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), Decision Tree (DT) and Ensemble of Machine Learning models (Ensemble ML).
Finally, this is the first research employing an ensemble ML approach to improve the outcome prediction in TBI patients. This approach has been demonstrated to be useful for integrating multiple ML models in a single predictive model characterized by higher robustness, reducing the dispersion of predictions
[12]. However, this method would not seem to boost performance except when the four classes approach was employed (
Figure 2). Indeed, in the KW analysis, the researchers observe that the ensemble ML for two classes reach a high accuracy similar to other ML techniques of about 84% for LOOCV and 82% for 10-fold CV as shown in
Table 1 while using four classes approach (
Table 2) a (not significant) trend of performance metrics was observed (71.5% for LOOCV and 70.5% for 10-fold CV), which is five to ten percentage points higher than the other models.
Table 1. Classification Results with MRMR feature selection using, respectively, LOOCV and 10-fold Cross Validation (2 classes).
Classification Model |
Cross Validation |
Performance Metric |
Linear Model |
Support Vector Machine (SVM) |
k-Nearest Neighbors (k-NN) |
Naïve Bayes (NB) |
Decision Tree (DT) |
Ensemble |
Classification Model |
Cross Validation |
Performance Metric |
Linear Model |
Support Vector Machine (SVM) |
k-Nearest Neighbors (k-NN) |
Naïve Bayes (NB) |
Decision Tree (DT) |
Ensemble |
LOOCV |
Accuracy |
84.69% |
80.61% |
82.65% |
64.29% |
79.59% |
83.67% |
Precision µ |
84.69% |
80.61% |
82.65% |
64.29% |
79.59% |
83.67% |
Precision M |
83.46% |
79.51% |
81.20% |
64.29% |
78.14% |
82.47% |
Recall µ |
64.84% |
58.09% |
61.36% |
37.50% |
56.52% |
63.08% |
Recall µ |
38.55% |
20.67% |
39.63% |
41.88% |
32.58% |
45.45% |
Recall M |
41.51% |
38.65% |
40.40% |
25.00% |
38.25% |
40.79% |
Recall M |
36.51% |
23.14% |
37.28% |
38.74% |
F1 Score µ |
73.45% |
67.52% |
70.43% |
47.37% |
66.10% |
71.93% |
30.86% |
81.20% |
64.29% |
74.53% |
80.00% |
Recall µ |
66.67% |
59.70% |
61.36% |
37.50% |
52.08% |
59.70% |
Recall M |
42.22% |
39.37% |
40.40% |
25.00% |
36.75% |
40.00% |
F1 Score µ |
75.00% |
68.97% |
70.43% |
47.37% |
LOOCV |
Accuracy |
65.31% |
43.88% |
66.33% |
68.37% |
59.18% |
71.43% |
Precision µ |
65.31% |
43.88% |
66.33% |
68.37% |
59.18% |
71.43% |
Precision M |
66.49% |
36.49% |
66.50% |
52.94% |
46.16% |
71.73% |
41.58% |
F1 Score µ |
48.48% |
28.10% |
49.62% |
51.94% |
42.03% |
55.56% |
F1 Score M |
55.44% |
52.02% |
53.95% |
F1 Score M |
47.13% |
28.32% |
47.77% | 36.00% |
51.36% |
54.59% |
44.74% |
36.99% |
52.64% |
10-fold CV |
Accuracy |
85.89% |
81.78% |
82.78% |
64.33% |
76.78% |
81.78% |
10-fold CV |
Accuracy |
64.22% |
45.00% |
65.33% |
67.44% |
57.44% |
70.44% |
Precision µ |
85.71% |
Precision µ |
64.28% | 81.63% |
82.65% |
64.29% |
76.53% |
44.89%81.63% |
65.31% |
67.34% |
57.14% |
70.41% |
Precision M |
84.44% |
Precision M | 80.50% |
65.72% |
36.83% |
65.44% |
52.16% |
45.83% |
70.78% |
Recall µ |
37.50% |
21.36% |
38.55% |
40.74% |
30.77% |
44.23% |
Recall M |
35.73% |
23.42% |
36.38% |
37.78% |
28.68% |
40.66% |
61.98% |
68.97% |
F1 Score µ |
47.37% |
28.95% |
48.48% |
50.77% |
40.00% |
54.33% |
F1 Score M |
56.30% |
52.87% |
53.95% |
36.00% |
49.22% |
53.33% |
Table 2. Classification Results with MRMR feature selection using, respectively, LOOCV and 10-fold Cross Validation (4 classes).
F1 Score M |
46.29% |
28.63% |
46.77% |
43.82% |
35.28% |
51.65% |
3. Conclusions
In summary, the researchers found that ML algorithms do not perform better than more traditional regression models in predicting the outcome after TBI. As future work, the researchers plan to perform further external validations of all these models on other datasets that could capture dynamic changes in prognosis during intensive care courses extending the current models with new objective predictors, such as neuroimaging data (EEG, PET, fMRI)
[13].