The LM had the advantage of identifying some prognostic factors, associating each of them with an odds ratio, while the use of ML is limited by the difficulty of interpreting the model, often referred to as ‘black box’. This finding is perfectly in agreement with results recently obtained by Iosa et al.
[9][29], who compared the performance of classical regression, neural network, and cluster analysis in predicting the outcome of patients with stroke. Similarly, Gravesteijn et al.
[10][30] reached the same conclusions on TBI patients evaluating the different performance of logistic regression with respect to SVM, random forests, gradient boosting machines, and artificial neural networks. In terms of model performance,
theour SVM and DT values are similar to those reported by Abujaber et al.
[6], whereas k-NN has never been employed in this clinical domain and this outperformed other ML approaches in all the evaluation metrics. The only algorithm that relatively underperformed was the NB. The accuracy (and sensitivity) was somewhat lower, passing from the analyzed to the test sample.
The Our
esearchers' data conflict with those reported by Amorim et al.
[7] who described the excellent performance of this algorithm as a screening tool in predicting the functional outcome of TBI patients. This discrepancy could be mainly due to the use of different clinical predictors. Indeed, there is large heterogeneity in factors (i.e., age, gender, clinical severity, clinical comorbidities, systolic blood pressure, respiratory rate, lab values, and presence of mass lesion) identified as having a prognostic value in TBI patients, thus making a direct comparison between ML approaches difficult to perform
[6]. Another limit is due to the fact the dataset is unbalanced (62.8% negative vs. 37.2% positive outcome) and could negatively affect performances of machine learning. To overcome this issue and increase classification robustness,
Thwe
researchers also applied the technique of LOOCV that is less affected by this problem and allows
themus to compare four machine learning techniques since each type of algorithm performs predictions differently. For instance, the DT algorithm performs well with unbalanced datasets thanks to the splitting rules that look at class variables.
Moreover, the type of predictors, such as continuous and categorized (operator-dependent) variables and the lack of objective biological high-dimensional data (i.e., neuroimaging, genetics), might also limit the performance of ML techniques applied in this domain
[11][31].
The Our
esearchers' data would seem to confirm this hypothesis because of the change in identified predictors for classification. Indeed, as shown in
Figure 14 and
Figure 25, moving from 2 to 4 classes of outcome approaches impacts the most significant features extracted by predictive models. Apart from age, for reaching the excellent performance with the 2 classes approach, LM and ML algorithms need CRS-r values and ERBI values at T1, whereas, for the 4 classes approach, diagnosis at admission and sex are the most important features.
Figure 14. Comparison of accuracies between LM and ML models (2 classes of outcome). Legend: Linear Regression Model (LM), Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), Decision Tree (DT) and Ensemble of Machine Learning models (Ensemble ML). Symbols ***, p-values <0.001.
Figure 25. Comparison of accuracies between LM and ML models (4 classes of outcome). Legend: Linear Regression Model (LM), Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Naïve Bayes (NB), Decision Tree (DT) and Ensemble of Machine Learning models (Ensemble ML).
Finally, this is the first
res
earchtudy employing an ensemble ML approach to improve the outcome prediction in TBI patients. This approach has been demonstrated to be useful for integrating multiple ML models in a single predictive model characterized by higher robustness, reducing the dispersion of predictions
[12][32]. However, this method would not seem to boost performance except when the four classes approach was employed (
Figure 25). Indeed, in
theour KW analysis,
the researchers we observe that the ensemble ML for two classes reach a high accuracy similar to other ML techniques of about 84% for LOOCV and 82% for 10-fold CV as shown in
Table 17 while using four classes approach (
Table 28) a (not significant) trend of performance metrics was observed (71.5% for LOOCV and 70.5% for 10-fold CV), which is five to ten percentage points higher than the other models.
Table 17. Classification Results with MRMR feature selection using, respectively, LOOCV and 10-fold Cross Validation (2 classes).
8. Classification Results with MRMR feature selection using, respectively, LOOCV and 10-fold Cross Validation (4 classes).
Classification Model |
Cross Validation |
Cross Validation | Performance Metric |
Linear Model |
Support Vector Machine (SVM) |
k-Nearest Neighbors (k-NN) |
Naïve Bayes (NB) |
Decision Tree (DT) |
Ensemble |
Performance Metric |
Linear Model |
Support Vector Machine (SVM) |
k-Nearest Neighbors | (k-NN) |
Naïve Bayes (NB) |
Decision Tree (DT) |
Ensemble |
LOOCV |
Accuracy |
84.69% |
LOOCV |
Accuracy | 80.61% |
82.65% |
65.31% |
43.88% |
66.33%64.29% |
79.59% |
83.67% |
68.37% |
59.18% |
71.43% |
Precision µ |
Precision µ | 84.69% |
80.61% |
82.65% |
64.29% |
65.31%79.59% |
83.67% |
Precision M |
83.46% |
79.51% |
81.20% |
64.29% |
78.14% |
82.47% |
Recall µ |
64.84% |
58.09% |
61.36% |
37.50% |
56.52% |
71.93% |
43.88% |
63.08% |
F1 Score M |
55.44% |
52.02% |
53.95% |
36.00% |
51.36% |
54.59% |
10-fold CV |
Accuracy |
85.89% |
81.78% |
82.78% |
64.33% |
76.78% |
81.78% |
Precision µ |
85.71% |
81.63% |
82.65% |
64.29% |
76.53% |
81.63% |
Precision M |
84.44% |
80.50% |
81.20% |
64.29% |
74.53% |
80.00% |
Recall µ |
66.67% |
59.70% |
61.36% |
37.50% |
52.08% |
59.70% |
Recall M |
42.22% |
39.37% |
40.40% |
25.00% |
36.75% |
40.00% |
66.33% |
68.37% |
59.18% |
71.43% |
Precision M |
66.49% |
36.49% |
66.50% |
52.94% |
46.16% |
71.73% |
Recall µ |
38.55% |
20.67% |
39.63% |
41.88% |
32.58% |
45.45% |
Recall M |
41.51% |
38.65% |
40.40% |
25.00% |
38.25% |
40.79% |
Recall M |
36.51% |
23.14% |
37.28% |
38.74% |
30.86% |
41.58% |
F1 Score µ |
73.45% |
67.52% |
70.43% |
47.37% |
66.10% |
F1 Score µ |
48.48% |
28.10% |
49.62% |
51.94% |
42.03% |
55.56% |
F1 Score M |
47.13% |
28.32% |
47.77% |
44.74% |
36.99% |
52.64% |
10-fold CV |
Accuracy |
64.22% |
45.00% |
65.33% |
67.44% |
57.44% |
70.44% |
Precision µ |
64.28% |
44.89% |
65.31% |
67.34% |
57.14% |
70.41% |
Precision M |
65.72% |
36.83% |
65.44% |
52.16% |
45.83% |
70.78% |
Recall µ |
37.50% |
21.36% |
38.55% |
40.74% |
30.77% |
44.23% |
Recall M |
35.73% |
23.42% |
36.38% |
37.78% |
28.68% |
40.66% |
F1 Score µ |
75.00% |
68.97% |
70.43% |
47.37% |
61.98% |
68.97% |
F1 Score µ |
47.37% |
28.95% |
48.48% |
F1 Score M |
56.30% |
52.87% |
53.95% |
36.00% |
49.22% |
53.33% |
Table 2
50.77% |
40.00% |
54.33% |
F1 Score M |
46.29% |
28.63% |
46.77% |
43.82% |
35.28% |
51.65% |
3. Conclusions
In summary,
thwe
researchers found that ML algorithms do not perform better than more traditional regression models in predicting the outcome after TBI. As future work,
the researchers we plan to perform further external validations of all these models on other datasets that could capture dynamic changes in prognosis during intensive care courses extending the current models with new objective predictors, such as neuroimaging data (EEG, PET, fMRI)
[13][33].