The considerations in this review may help to develop further studies to predict mortality in COVID patients, including both adulthood and childhood, although children and young people remain at low risk of COVID mortality. Moreover, suggestions collected in this study could also be useful to predict prognoses other than mortality (e.g., intubation and length of hospital stay).
More than a year has passed since the report of the first case of coronavirus disease 2019 (COVID), and many deaths continue to occur. Despite the discovery of different vaccine formulas from different pharmaceutical companies, many problems related to mass production and distribution across the world still persist. This factor is accompanied by political and economic constraints that may further limit vaccine access . For these reasons, pandemic containment is a hard task, resulting in increased deaths. At the time this manuscript is written, SARS-CoV-2 numbers reported by the World Health Organization (Ginevra, Switzerland) ( https://covid19.who.int/ , 31 May 2021) worldwide include: almost 173,005,553 people infected with SARS-CoV-2; more than 3,727,605 death cases and around 1,900,955,505 vaccine doses administered. Multiple hospitalizations, due to the rapid spread of the virus have required an improvement of patient management throughout the healthcare system. In this context, it is important to minimize the time required for resource allocation and clinical decision making, such as triage, choice of ventilation modality, admission to the intensive care unit. Currently, baseline machine learning (ML) and deep learning (DL) techniques are widely accepted thanks to their ability to obtain information from the input data without “a priori” definitions . These approaches can be efficiently tested in healthcare applications such as diagnosis of diseases, analysis of medical images, collection of big data, research and clinical trials, management of smart health records, prediction of outbreaks . Consequently, DL models are capable of solving complex tasks in the intricate clinical field . ML is acquiring an increasingly sought-after role in predicting the outcome of COVID patients . For instance, a mortality prediction model could rapidly and effectively help clinical decision-making for COVID patients at imminent risk of death. Recent studies reviewed predictive models for SARS-CoV-2 diagnosis and severity, length of hospital stay, intensive care unit (ICU) admission, mechanical ventilation modality outcomes , highlighting pitfalls of the machine and deep learning methods based on imaging data ; however, systematic reviews focused on prediction of COVID mortality outcome with ML methods, including DL techniques, are lacking in the literature.
The aim of this review is to discuss the current state of the art of ML methods to predict COVID mortality by: (1) summarizing the existing published literature on baseline ML- and DL-based COVID mortality prognosis systems based on medical evaluations, laboratory exams and Computer Tomography (CT); (2) presenting relevant information including the type of data employed, the data splitting technique, the proposed ML methodology and evaluation metrics; (3) providing possible explanations of the best results obtained; (4) discussing challenging aspects of current studies, providing suggestions for future developments.
This systematic review considers the state of the art in ML and DL as applied to COVID mortality prediction. We performed a MEDLINE search on PubMed on 26 May 2021 using the terms “machine learning covid survival” (146 results), “machine learning covid mortality” (131 results), “deep learning covid survival” (49 results), “deep learning covid mortality” (45 results) and additional similar terms. The search results were filtered to remove: duplicates, ML approaches for SARS-CoV-2 diagnosis or prognosis besides mortality, preprint works, abstract works, papers that deviated from our purpose. We try to shed some light on peculiar characteristics of these studies in terms of: (i) data source, (ii) data partitioning, (iii) class of features, (iv) implemented features ranking method, (v) implemented ML technique, (vi) metrics evaluated for performance assessment.
We focused on the type of model validation that each study used to split data into train and test groups. Particularly, we chose to report the number of subjects used for the train and test set, and the corresponding number of survived and non-survived subjects. Additionally, we categorized validations type in: internal, external, merged and prospective (in particular internal prospective or merged prospective); referring to Internal validation when the studies subdivided a single-site database into train and test groups; external validation when studies trained and tested the model using data from independent cohorts, obtained from different sites. Moreover, we referred to merged validation for studies that combined data from different sites producing a single database to split into train and test groups or used multisite publicly available epidemiological datasets. Finally, we indicated prospective validation when studies implemented a temporal validation, assessing temporal generalizability. In the case of internal prospective validation, data of hospitalized patients from a first timeframe was used for training and data of patients admitted at a different time from the same hospital was used for testing. Differently, prospective merged validation relied on multisite data to train the model and multisite data collected in a subsequent timeframe for testing.
We expected to collect papers with both clinical and imaging features. In the latter, we included hand-made extracted features with radiomic analysis and the features learned with the use of convolutional neural networks (CNN). Clinical features comprise demographic (e.g., age, sex, race) , comorbidities (e.g., diabetes, heart disease), symptoms (e.g., cough, fever), vital signs (e.g., heart rate, oxygen saturation), laboratory values (e.g., glucose, creatinine, haemoglobin), disease treatment and clinical course (e.g., artificial ventilation, length of hospital stay, drugs). Clinical features can be classified in binary (yes/no: 0/1) and continuous features (numerical values). We considered binary features when studies associated them with 0/1 values or dichotomized continuous feature’s value in a binary form, defining a numerical range and setting the feature to 1 if the value is within that range, 0 otherwise. While we have referred to continuous features when studies used predictors (features used for prediction tasks) as continuous variables or dichotomized binary features in continuous features.
To build a reliable model for solving classification, the feature set should contain as much useful information as possible, and a number of features as small as possible. It is necessary to filter out the irrelevant and redundant features by choosing a subset of relevant features to avoid over-fitting and tackle the problem of dimensionality . Feature ranking (or selection or reduction) techniques are a good approach for features space dimensionality reduction . Feature ranking improves features understanding and reduces the computational cost, increasing the efficiency of the classification. Since Shapley Additive Explanation (SHAP) and least absolute shrinkage and selection operator (LASSO) logistic regression algorithm are widely used methods for model interpretation and feature selection in survival studies , we highlighted whether the studies used these methods or others. Particularly SHAP is a method to explain individual predictions by computing the contribution of each feature to the prediction. LASSO is a new method for estimation in linear models based on regression analysis.
A total of 19/24 studies adopted binary features . 1/24 study dichotomized continuous feature’s value in a binary form .
A total of 16/24 studies adopted continuous features . A total of 2/24 studies dichotomized binary feature in continuous feature associating a Charlson comorbidity score to the feature’s value .
We found 8/24 articles in which SHAP method was used to optimize survival prediction in COVID . Vaid et al. demonstrated that interactions between features had a weak contribution to outcome prediction compared to the importance of each feature individually . On the contrary, Abdullal et al. used SHAP analysis to assess the contribution of patient variables to the mortality prediction, with no features reduction . A similar approach was employed by other studies . Subudhi et al. tested 18 models and performed the SHAP technique on the temporally distinct patients to compare the important features selected on the different validation cohorts . In the other works, the most relevant features were selected with LASSO . Ko et al. employed the analysis of variance (ANOVA) to select features with the most significant difference between survivors and deceased. Particularly, in the study by Ko et al. , the purpose was to identify a significant difference between the two classes (survived and no survived) by selecting the features with p -values less than 10 −5 . In the study by Di et al. , the moDel Agnostic Language for Exploration and eXplanation (DALEX) package is used as a features selection method usually adopted for predictive models. Booth et al. implemented a different ranking method including a Logistic Regressor (LR) classifier, obtaining regression coefficients as a measure of feature importance]. An et al. compared different features ranking models to figure out if there was a coherence in using different features ranking procedures . Hu et al. used regression algorithms for feature reduction as well . Li et al. used the univariate analysis to compare distribution differences between COVID survivors and non-survivors . Moreover, they compared an evaluation model with 83 features and a model with only the first five features selected. Yan et al. performed feature ranking with a Multi-tree XGBoost . With DL models, features selection can be implemented by combining available features, as shown by Zhu et al. , to obtain the optimal number of features necessary for classification. Three articles did not apply any feature selection before the prediction algorithm .
This systematic review specifically considers the state of the art in ML and DL as applied to COVID mortality prediction. Both binary and multi-class features are considered throughout the review. We summarized the developed models considering data source, data partitioning, class of features, ML technique and evaluation metrics for performance assessment. Clinical features are used in all studies for data samples, while only one paper currently has CT images features. Most of the studies presented an imbalanced number of survived and non-survived cases. We found some best practices that studies could follow for developing optimal ML models: (1) the use of a high-quality dataset with a large balanced number of samples, (2) the implementation of an ensemble of different ML methodologies, (3) clinical features should include different features class type including Age, CRP, LDH values, (4) as many metrics as possible should be reported to have a complete view on model performance, including both the most common metrics, such as AUCROC and ACC, and other important metrics for performance prediction assessment, such as SENS, SPEC, PPV and NPV.
The considerations in this review may help to develop further studies to predict mortality in COVID patients, including both adulthood and childhood, although children and young people remain at low risk of COVID mortality . Moreover, suggestions collected in this study could also be useful to predict prognoses other than mortality (e.g., intubation and length of hospital stay).