Machine Learning (ML) Applications in Antimicrobial Resistance

Machine Learning (ML) Applications in Antimicrobial Resistance: Comparison

Please note this is a comparison between Version 1 by GEORGIOS FERETZAKIS and Version 3 by Catherine Yang.

Machine learning (ML) algorithms are increasingly applied in medical research and in healthcare, gradually improving clinical practice. Among various applications of these novel methods, their usage in the combat against antimicrobial resistance (AMR) is one of the most crucial areas of interest, as increasing resistance to antibiotics and management of difficult-to-treat multidrug-resistant infections are significant challenges for most countries worldwide, with life-threatening consequences. ML methods have been developed to analyze bacterial genomes, forecast medication susceptibility, recognize epidemic patterns for surveillance purposes, or propose new antibacterial treatments or vaccines. In addition to developing new antibiotics, optimizing the use of current drugs has also been a key priority in stopping the spread of AMR, as one of the main drivers of AMR is the inappropriate use of antibiotics.

machine learning
artificial intelligence
antimicrobial resistance
AMR
Antibiotic resistance

1. Diagnosis of AMR

Currently, AMR is principally diagnosed using two techniques in clinical microbiology ^[1][11]. One is classical culture-based antimicrobial susceptibility testing (AST), and the other is whole-genome sequencing for antimicrobial susceptibility testing (WGS-AST) ^[2][1]. Although the former approach is simpler and easier to use, it typically requires a day or more to produce the results, which significantly lengthens the empirical antibiotic regimen and raises the possibility of treatment failure due to ineffective therapy or the threat of antibiotic resistance caused by broad-spectrum antibiotics.

The implementation of ML methods has substantially reduced the time of bacterial susceptibility profiling to less than three hours for the flow-cytometry AST method (FAST) ^[3][21] and only 30 min for the infrared (IR) spectrometry ^[4][22]. While these ML-assisted diagnostics can accelerate antimicrobial susceptibility testing, they require costly infrastructure and expert personnel to be carried out.

Although matrix-assisted laser desorption/ionization coupled to time-of-flight mass spectrometry (MALDI-TOF MS) is widely recognized as a reference method for the rapid and inexpensive identification of microorganisms in routine laboratories, little attention has been paid to its ability to determine AMR. Some recent studies have evaluated its potential use in conjunction with machine learning to detect AMR in clinical pathogens. ^[5][6][23,24]. Some ML-based MALDI algorithms are available for micro-organism identification and are also FDA approved, as, for example, the MALDI Biotyper CA (MBT-CA) System (Bruker Daltonics Inc, Billerica, MA) that was approved by FDA in 2013 ^[6][24].

Kirby–Bauer disk-diffusion and microdilution antibiograms are recommended as reference methods by the European committee on antimicrobial susceptibility testing (EUCAST) and the Clinical and Laboratory Standards Institute (CLSI) for determining antimicrobial resistance ^[7][25]. Results are usually qualitative and classified into categories, i.e., susceptible or resistant, depending on the breakpoint calibrated by the EUCAST, or expressed as minimum inhibitory concentrations (MICs) ^[7][25].

Although these conventional methods are effective, they are cumbersome, time-consuming, and do not enable the rapid choice of an effective targeted anti-infective treatment ^[5][23]. As the results cannot be obtained sooner than 48 h after receiving a sample, prolonged use or overuse of broad-spectrum antibiotics may result. For some pathogens, an even longer incubation time (72 h or more) is required ^[7][25]. Hence, rapid, accurate, low-cost diagnostic tests are needed to optimize antimicrobial use and minimize the potential selective pressures.

Whole genome system (WGS)-based diagnostic approaches are being used to overcome these limitations, especially for viral infections and tuberculosis, where culture-based microbiological diagnostics are either not applicable or time-consuming ^[2][1]. WGS can potentially alleviate many of these concerns by offering the potential to predict AST results by identifying the presence or absence of resistance genes, as well as mutations in relevant genes, from which clinicians can infer the activity of antibiotic agents ^[8][26]. However, the integration of WGS diagnostics in routine antibiotic surveillance and daily clinical practice has several challenges, especially in limited resource settings. These methods are more expensive and more complex to implement than standard antibiotic susceptibility testing ^[2][1].

The development of molecular tests has significantly contributed to rapid diagnostic testing and timely identification of pathogens and antibiotic resistance patterns, but their high costs and limited availability prevent them from being widely used. In this context, ML-driven predictive models of antimicrobial resistance may serve as a bridge between specimen collection and results from molecular and genotypic susceptibility analysis, facilitating time-sensitive empirical antibiotic choices.

2. Prediction of AMR

Accurate prediction of resistance against different antibiotics is directly beneficialfrom the patient’s point of view, because it helps avoid treatment failures. Such a prediction could have additional long-term benefits, for example, enabling the use of more targeted antibiotics, decreasing the need to use multiple antibiotics to cure the same infection, and lowering the risk of onward transmission. Machine learning algorithms have the potential to help clinicians predict antimicrobial resistance.

Besides the detection of antimicrobial resistance phenotypes, different ML modeling tools have been applied by several researchers to predict antibiotic susceptibility patterns of pathogens, allowing for the selection of the most appropriate treatment. Goodman et al. used recursive partitioning to build a decision tree for the prediction of extended-spectrum β-lactamase (ESBL) production in Escherichia coli and Klebsiella spp. bacteremia based on patient epidemiological and microbiological data ^[9][27]. Sousa et al. performed a prospective study to validate the decision tree (DT) designed by Goodman et al. in a cohort of bacteremic patients in a region with a high prevalence of ESBL ^[10][28]. In contrast with the earlier study, all types and species of β-lactamase producing Gram-negative bacilli were included. After increasing the cut-off values of certain variables associated with resistant infections, a modified DT was obtained with significantly improved performance compared to the original one. An analogous method was used by Guillamet et al. for the prediction of resistance to piperacillin–tazobactam, cefepime, and meropenem in patients with Gram-negative bloodstream infection. In their study, a good overall agreement in accuracies between multivariable logistic regression models and clinical decision trees that were developed using a recursive partitioning algorithm (Chi-squared automatic interaction detection) was observed ^[11][29].

3. Machine-Learning-Assisted Antibiotic Prescription

It is common for non-infection specialists to treat infections in hospitals. The physicians are encouraged to follow local antimicrobial guidelines and evidence-based policies. However, adherence to the prescribing policies tends to be deficient. Human and behavioral factors influence the doctor’s prescribing decisions. On the other hand, clinicians are frequently urged to overuse antibiotics due to worries about the high mortality linked to delayed prescribing in diseases such as sepsis, the increase in drug-resistant infections, and the lack of accurate diagnostics to enable dynamic decision-making ^[12][13][5,8]. In a cohort of over a thousand critically ill patients with Gram-negative bacteremia, a fourfold increase in mortality was attributed to the failure of administering an in vitro active antibiotic treatment within six hours of septic shock, emphasizing the need for timely and appropriate antibiotic treatment ^[14][7].

A strategy for combating inappropriate antibiotic prescriptions in community and nursing home-acquired urinary tract infections (UTIs) is described by Yelin et al. ^[15][37]. The resistance of cultured bacteria from UTIs to six commonly prescribed antibiotics was associated with a number of demographic factors, including a residency in a nursing home, alongside with history of UTIs and prior antibiotic prescriptions. Despite acknowledged biases in the study design, computer-driven drug recommendations seem to reduce the proportion of inappropriate prescriptions to 5%, compared with physicians prescribing inappropriately 9% of the time. A major drawback of the method is its inability to select the narrowest-spectrum antibiotic, among those with the lowest resistance. Similar approaches could be used to treat other bacterial infections when detailed patient data are available, as has recently been suggested for bloodstream infections in a hospital setting ^[16][38].

In the aforementioned studies, algorithms and models have been developed for predicting antibiotic resistance based on epidemiologic factors. However, it remains largely unstudied whether they will affect antimicrobial prescribing when implemented into clinical practice ^[17][46]. Recently, a case-based-reasoning algorithm was incorporated into a hospital’s information system and was evaluated using real-world patient data to investigate the potential impact of the system on antibiotic prescribing practices. The algorithm provided appropriate antibiotic recommendations that were significantly narrower in spectrum compared to choices being made in current clinical practice by physicians ^[18][47]. Avoiding unnecessary antibiotic prescriptions is also of major importance for the promotion of antimicrobial stewardship. Wong et al. developed an ML-assisted mobile application to help inexperienced or busy Emergency Department doctors in Singapore decide whether to prescribe antibiotics for uncomplicated upper respiratory tract infections (URTIs) ^[19][48]. Likewise, in COVID-19 hospitalized patients, a supervised ML algorithm was successfully used to detect bacterial co-infections or secondary infections, thus supporting antibiotic prescribing decisions or recommending antibiotic discontinuation ^[20][49].

4. Machine Learning-Assisted Clinical Decision Support Systems (ML-CDSS)

As antibiotic resistance is a major cause of mortality, it is imperative that researchers develop rapid and efficient methods to guide the rational administration of antibiotics, collectively known as antimicrobial stewardship programs (ASPs) ^[14][21][3,7]. Antibiotic prescriptions can result in the selection of drug-resistant organisms, affecting not only individual patients but also a patient’s microbiome and society as a whole ^[22][4]. It is often difficult to make consistent decisions during infection management due to the dynamic nature of the situation. Integrating broad and complex information is essential to making responsible prescribing decisions ^[23][24][15,16]. Besides the evidence-based guidelines, there are several clinical decision support systems (CDSS) and biomarkers that are commonly used to guide treatment. In a recent review, various uses of machine learning for clinical decision support in infectious diseases were identified, including the support of diagnosis, the severity of disease prediction, and selection of appropriate antimicrobial treatment ^[25][14]. Currently used CDSS are computer-assisted expert systems, based on human expertise (knowledge-based), subsequently translated into rules that are manually programmed in the system, trying to simulate or reproduce the decision-making ability of an expert on a specific task ^[25][26][27][14,18,19]. In contrast to expert systems, ML-assisted CDSS are able to automatically learn and improve from data (data-based), define their own rules, and interpret unknown situations ^[23][25][14,15].

The development of ML-CDSS using minimum variables may be beneficial when data are not readily available across certain areas or when resources are limited ^[28][34]. Particular attention should be paid to which variables are used by the ML-CDSS to predict their outcome. Moreover, it is difficult to develop and validate ML-CDSS without high-quality clinical data. It is essential to build a comprehensive clinical database so that clinicians can use future machine learning tools with confidence.

Decision support models for empiric treatment of sepsis can integrate predictors of antibiotic resistance and permit rapid antibiotic de-escalation without endangering timely and sufficient treatment ^[29][50]. Moreover, previous antibiotic susceptibility results provide potent information to predict resistance to existing infections ^[30][51]. Sick-Samuels et al. constructed a decision tree by using recursive partitioning to predict the risk of broad-spectrum antibiotic (BSA) resistance in a cohort of septic pediatric patients based on five distinctive risk factors ^[31][52]. Nearly half of high-risk BSA-resistant episodes were incorrectly categorized as low-risk episodes, and 9% were incorrectly categorized as high-risk episodes. This could have resulted in either undertreatment or overtreatment, depending on the situation. An alternative approach could improve the sensitivity of the prediction algorithm by capturing additional patient characteristics or variables.

In a retrospective study carried out in a children’s hospital in Cambodia, Oonsivalai and colleagues, propose a patient-level data-driven decision support system using a variety of machine learning techniques ^[16][38]. Mainly targeting ceftriaxone, a third-generation cephalosporin, the most frequently prescribed empirical antibiotic in practice at their study site, they specifically concentrate on the value of using the predictive models to identify patients at high risk of being infected with organisms resistant to it. The age of the patient, an age-adjusted weight score, and whether the infection was acquired in the hospital or in the community were revealed to be the most crucial factors for predicting antibiotic susceptibility. These are objective variables that are frequently collected in most therapeutic settings. The models’ other variables can also be quickly and inexpensively gathered using brief questionnaires. The calculations that underlie the predictions can easily be carried out remotely using any Internet-connected device in a matter of seconds on a low-cost computer. This makes the strategy extremely suitable for settings in low- and middle-income countries (LMIC), which often have the largest illness burden and the most urgent issues with antibiotic resistance ^[32][45].

Multivariate associations rule mining methods—a subset of unsupervised ML techniques originally used in market-basket analysis—may efficiently identify and quantify correlations between resistance patterns, enabling the identification and tracking of clinically relevant MDR through comparisons between relevant subsets of isolates. In the clinical context, the application of association rule mining in the antimicrobial susceptibility dataset could also offer better antibiotic treatment policies ^[33][34][56,57]. However, in the researchers hcontext of this study, we have not invested in reviewing unsupervised techniques, mainly because these tend to be associated with earlier phases of ML projects, where the application goal is still elusive and the data analysts are still searching for the exact nature of the problem to be solved; for the time being, this is less of an issue of the AMR domain, where assessing resistance is widely accepted as a clear goal.

Table 13 summarizes the performance of machine learning-assisted clinical decision support systems (ML-CDSS) across different studies.

Table 13.

Performance of machine learning-assisted clinical decision support systems (ML-CDSS) across different studies.

Author	Year of Publication	Medical Setting	Geographical Setting	Input Data	ML Algorithm	Performance Evaluation	Bacterial Species
Oonsivalai et al. ^[16][38]	2018	Hospital admissions	Cambodia	Clinical, demographic and living condition information	LR, DT, RF, Boost, SVM, k-NN	AUC: 0.74–0.85	Bacterial isolates in blood cultures
Elligsen et al. ^[29][50]	2021	Hospital admissions	Canada	Demographics, acquisition of bacteremia, previous hospital/ICU admission, AST, antibiotic prescriptions	LR models	Antibiotic de-escalation (29 vs. 21%; OR = 1.77; 95% CI, 1.09–2.87; p = 0.02)	GNB bloodstream infections
Sick-Samuels et al. ^[31][52]	2019	Pediatric hospital	USA	Demographic, clinical, and microbiological data	Recursive partitioning, DT	AUC 0,70	GNB BSIs
Cazer et al. ^[33][56]	2021	Hospital admissions	USA	Bacterial isolates, infection site, AST, resistance phenotypes	Association Mining	Average cLift: 5	Staphylococcus aureus isolates
Sakagianni et al. ^[34][57]	2022	Intensive Care Unit	Greece	Demographics/bacterial species/sample type/AST	Association Mining	Max Lift: 3.44	Pseudomonas aeruginosa, Acinetobacter baumannii, Klebsiella pneumoniae
Feretzakis et al. ^[35][58]	2021	Medical wards	Greece	Demographics/Gram stain/bacterial species/sample type/AST	Microsoft Azure AutoML (StackEnsemble, VotingEnsemble, MaxAbsScaler, LightGBM, SparseNormalizer, XGBoost)	AUC: 0.822	Bacterial isolates
Lee et al. ^[36][55]	2021	Hospital admissions	Hong Kong	Patient reference number/Date of culture/Bacterial species/Sample type/AST	Adaptive boosting, gradient boosting, RF, SVM, K-NN and NN *	AUC: 0.761	Escherichia coli, Klebsiella spp., Proteus mirabilis
Liang et al. ^[37][53]	2022	Intensive Care Unit	China	Demographic data, vital signs, basic and primary diseases, important test indicators, operation histories and antibiotic use	RF, XGBoost, DT, multiple LR	AUC 0.78–0.91	CR-GNB carriage
Goodman et al. ^[38][54]	2019	Hospital admissions	USA	Blood cultures/AST	LR, DT	C-statistic LR:0.87 DT:0.77	ESBL bacteria

* SVM: support vector machine, NN: neural network, RF: random forest, LR: logistic regression, DT: decision tree, XGBoost: eXtreme gradient boosting, k-NN: k-nearest neighbours, eCSR: expected cross-support ratio, cLift: conditional lift, CR-GNB: carbapenem-resistant Gram-negative bacteria, ESBL: extended-spectrum beta-lactamase, AST: antimicrobial susceptibility test.

5. Prediction of AMR in the Environment Employing AI/ML

The problem of AMR is multifactorial and arises from the interaction of bacterial evolution, human behavior, and environmental factors that play a significant role in the transmission of resistant bacteria and pathogen emergence ^[23][15]. There is no doubt that AMR has expanded considerably beyond strictly medical settings to include relevant aspects of the environment. Currently, there is a general consensus that intervention strategies should not be limited to consider only human and veterinary medicine, but that the environment should also be taken into account. Thus, an important challenge in AMR control is estimating the prevalence of antibiotic resistance genes (ARGs) in source environments. Furthermore, investigating the conditions and extent of environmental selection for resistance is critical to allow preventive measures ^[39][59].

Machine learning and deep learning models have been validated for the prediction of ARGs in various environmental sources, such as in recreational beaches, soil, wastewater, and in several geographical regions ^[40][41][60,61]. Jang et al. studied neural network techniques aiming to predict ARGs occurrence on beaches quickly and accurately, as well as to define the environmental variables that influence these predictions ^[42][62].