Modern Intensive Care Units (ICUs) provide continuous monitoring of critically ill patients susceptible to many complications affecting morbidity and mortality. ICU settings require a high staff-to-patient ratio and generates a sheer volume of data. For clinicians, the real-time interpretation of data and decision-making is a challenging task. Machine Learning (ML) techniques in ICUs are making headway in the early detection of high-risk events due to increased processing power and freely available datasets such as the Medical Information Mart for Intensive Care (MIMIC). We conducted a systematic literature review to evaluate the effectiveness of applying ML in the ICU settings using the MIMIC dataset. A total of 322 articles were reviewed and a quantitative descriptive analysis was performed on 61 qualified articles that applied ML techniques in ICU settings using MIMIC data. We assembled the qualified articles to provide insights into the areas of application, clinical variables used, and treatment outcomes that can pave the way for further adoption of this promising technology and possible use in routine clinical decision-making.
1. Introduction
Artificial intelligence (AI) encompasses a broad-spectrum of technologies that aim to imitate cognitive functions and intelligent behavior of humans [
1]. Machine Learning (ML) is a subfield of AI that focuses on algorithms that allow computers to define a model for complex relationships or patterns from empirical data without being explicitly programmed [
2]. ML, powered by increasing availability of healthcare data, is being used in a variety of clinical applications ranging from diagnosis to outcome prediction [
1,
3]. The predictive power of ML improves as the number of samples available for learning increases [
4,
5].
ML algorithms can be supervised or unsupervised based on the type of learning rule employed. In supervised learning, an algorithm is trained using well-labeled data. Thereafter, the machine predicts on unseen data by applying knowledge gained from the training data [
6]. Most adopted supervised ML models are Random Forest (RF), Support Vector Machines (SVM), and Decision Tree algorithms [
6]. In unsupervised learning, there is no ground truth labeling required. Instead, the machine learns from the inherent structure of the unlabeled data [
7]. Either type of ML is an iterative process in which the algorithm tries to find the optimal combination of both model variables and variable weights with the goal of minimizing error in the predicted outcome [
5,
6]. If the algorithm performs with a reasonably low error rate, it can be employed for making predictions where outputs are not known. However, while developing a ML model, an optimal bias-variance tradeoff should be selected to optimize prediction error rate [
8]. Improper selection of bias and variance results in two problems: (1) underfitting and (2) overfitting [
9]. Finding the “sweet spot” between the bias and variance is crucial to avoid both underfitting and overfitting [
8,
10].
Deep learning (DL), a subcategory of machine learning, achieves great power and flexibility compared to conventional ML models by drawing inspiration from biological neural networks to solve a wide variety of complex tasks, including the classification of medical imaging and Natural Language Processing (NLP) [
10,
11,
12,
13,
14]. Most widely used DL models are variants of Artificial Neural Network (ANN) and Multi-Layer Perceptron (MLP). In general, ML models are data driven and they rely on a deep understanding of the system for prediction, thereby, empowering users to make informed decision.
To provide better patient care and facilitate translational research, healthcare institutions are increasingly leveraging clinical data captured from Electronic Health Records (EHR) systems [
15]. Of these systems, the Intensive Care Unit (ICU) generates an immense volume of data, and requires a high staff-to-patient ratio [
16,
17]. To avoid adverse events and prolonged ICU stays, early detection and intervention on patients vulnerable to complications is crucial; for these reasons, the ML literature is increasingly using ICU patient data for clinical event prediction and secondary usage, such as sepsis and septic shock [
18]. ML techniques in ICUs are making headway in the early detection of high-risk events due to increased processing power and freely available datasets such as the Medical Information Mart for Intensive Care (MIMIC) [
19]. The data available in the MIMIC database includes highly structured data from time-stamped, nurse-verified physiological measurements made at the bedside, as well as unstructured data, including free-text interpretations of imaging studies provided by the radiology department [
13].
2. Discussion
The aim of this systematic review was to provide an up-to-date and holistic view of the current ML applications in ICU settings using MIMIC data in the attempt to predict clinical outcomes. Our review revealed ML application was widely adopted in areas such as mortality, risk stratification, readmission, and infectious disease in critically ill patients using retrospective data. This review may be used to provide insights for choosing key variables and best performing models for further research.
The application of ML techniques within the ICU domain is rapidly expanding with improvement of modern computing, which has enabled the analysis of huge volumes of complex and diverse data [
1]. ML expands on existing statistical techniques, utilizing methods that are not based on a priori assumptions about the distribution of the data, but deriving insights directly from the data [
80,
81].
With ICUs being complex settings that generate a variety of time-sensitive data, more and more ML-based studies have begun tapping the openly available, large tertiary care hospital data (MIMIC). Our screening resulted in 61 publications that utilized MIMIC data to train and test ML models enabling reproducibility. The majority of these publications focused on predicting mortality, sepsis, AKI, and readmissions.
2.1. Mortality Prediction
Mortality prediction for ICU patients is critical and crucial for assessing severity of illness and adjudicating the value of treatments, and timely interventions. ML algorithms developed for predicting mortality in ICUs focused mainly on in-hospital mortality and 30 days mortality at discharge. Studies by Marafino et al. [
22], Pirracchio et al. [
23], Hoogendoorn et al. [
24], Awad et al. [
26], Davoodi et al. [
29], and Weissman et al. [
31] predicted in-hospital mortality, whereas Du et al. [
25] predicted 28 days mortality at discharge, and Zahid et al. [
30] predicted both 30 days and in-hospital mortality. Most studies focusing on predicting in-hospital mortality looked at mortality after 24 h of ICU admission. However, one in particular, Awad et al. [
26], predicted mortality within 6 h of admission. Marafino et al. [
22] predicted mortality using only nursing notes from the first 24 h of ICU admission, whereas Weissman et al. [
31] improved mortality prediction by combining structured and unstructured data generated within the first 48 h of the ICU stay. Davoodi et al. [
29] and Hoogendoorn et al. [
24] predicted after 24 h and within a median of 72 h, respectively. Studies by Tang et al. [
33], Caicedo-Torres et al. [
36], Sha et al. [
38], and Zhang et al. [
41] predicted in-hospital mortality irrespective of the admission or discharge time.
For mortality prediction, all of the studies used three main categories of clinical variables: (1) demographics, (2) vital signs, and (3) laboratory test variables. In addition to the most commonly used data elements, other clinical information such as medications, intake/output variables, risk scores, and comorbidities were also utilized. Weissman et al. [
31] and Zhang et al. [
41] used clinical variables from both structured and unstructured data types for mortality prediction.
Multiple studies predicted mortality on disease-specific patient cohorts. Celi et al. [
21] and Lin et al. [
35] predicted in-hospital mortality on AKI patients. Lin et al. [
35] predicted mortality based on five important variables (urine output, systolic blood pressure, age, serum bicarbonate level, and heart rate). In addition, the study by Lin et al. [
35] also revealed that the effect of kidney injury markers, such as cystatin C and neutrophil gelatinase-associated lipocalin on subclinical injury, had not yet been analyzed, which can provide AKI prognostic information. This is due to lack of data availability in MIMIC. Garcia-Gallo et al. [
37] and Kong et al. [
40] predicted mortality on sepsis patients, and specifically, Garcia-Gallo et al. [
37] identified patients that are on 1-year mortality trajectory. Anand et al. [
34] claimed that the risk of mortality in diabetic patients could be better predicted using a combination of limited variables: HbA1c, mean glucose during stay, diagnoses upon admission, age, and type of admission. To compute the “diagnosis upon admission” variable, the study utilized Charlson Comorbidity Index, Elixhauser Comorbidity Index, and Diabetic Severity Index. The authors further claimed that combining diabetic-specific metrics and using the fewest possible variables would result in better mortality risk prediction in diabetic patients.
In our review, studies have used both traditional ML (10 studies) and DL methods (11 studies) to predict mortality. In traditional ML techniques, Random Forest, Decision Tree, and Logistic Regression were the most commonly used algorithms. However, recent studies by Caicedo-Torres et al. [
36], Du et al. [
25], and Zahid et al. [
30] have used DL methods for mortality prediction with a promising accuracy ranging from 0.86–0.87 as reported in the
Supplementary Table S1. Traditional ML models can be easily interpretable when compared to DL models that have many levels of features and hidden layers to predict outcomes. Understanding the features that contribute towards the prediction plays an important role for clinical decision-making [
82,
83]. For example, one of the most cited studies by Pirracchio et al. [
23] developed a mortality prediction algorithm (Super Learner) using a combination of traditional ML models; the results of which were easily interpretable by clinical researchers. In general, DL techniques are employed to improve prediction accuracy by training on large volumes of data [
12]. Zahid et al. [
30] developed a DL model (Self-Normalizing Neural Network (SNN)) that performed marginally better than the Pirracchio et al. [
23] mortality prediction rate (Area Under the Receiver Operating Characteristic curve (AUROC) of SNN: 0.86 and Super Learner: 0.85). However, interpreting the results of DL models is challenging because of multiple hidden layers and they are often treated as black-box models. To address this limitation, Caicedo-Torres et al. [
36] and Sha et al. [
38] demonstrated the interpretability of the model in visualizations that will allow clinicians to make informed decisions.
This entry is adapted from the peer-reviewed paper 10.3390/informatics8010016