Constrained-Disorder Principle-Based Systems Improve Digital Twins Performance

Constrained-Disorder Principle-Based Systems Improve Digital Twins Performance: Comparison

Please note this is a comparison between Version 1 by Tal Sigawi and Version 2 by Lindsay Dong.

Digital twins are computer programs that use real-world data to create simulations that predict the performance of processes, products, and systems. Digital twins may integrate artificial intelligence to improve their outputs. Models for dealing with uncertainties and noise are used to improve the accuracy of digital twins. Most used systems aim to reduce noise to improve their outputs. Nevertheless, biological systems are characterized by inherent variability, which is necessary for their proper function. The constrained-disorder principle defines living systems as having a disorder as part of their existence and proper operation while kept within dynamic boundaries.

digital twins
digital health
variability
noise
complex systems
system biology

1. Introduction

A digital twin is a computer program that uses real-world data to create simulations that predict how a system, a product, or a process perform ^[1][2][1,2]. These programs integrate artificial intelligence (AI) and software analytics to improve output [3]. In most currently used digital twins platforms, noise in input datasets detract from the accuracy of the results [4]. Different methods are used to reduce noise and uncertainties to improve the accuracy of program outputs [5].

2. The Constrained-Disorder Principle Defines Noise as Inherent to Biological Systems

Biological systems are complex, and part of their complexity results from the inherent noise and variability that characterize their function. The constrained-disorder principle (CDP) defines biological systems as comprising a disorder within constrained random boundaries [6]. It defines living organisms as machines with a regulated degree of variability. Per the CDP, a disorder is necessary for the systems’ existence and proper operation ^[6][7][6,7]. Variability is inherent to all levels of biological systems ^{[7][8][9][10]}[7,8,9,10]. At the genome level, variability characterizes normal DNA function, and a similar stochastic function is required for the proper function of RNA and proteins ^[8][9][10][8,9,10]. Fluctuations in gene expression, cell-to-cell signaling, and the cell environment are tightly regulated [11]. At the cellular level, multiple examples exist of the need for inherent variability. Dynamic instability characterizes microtubule function and implies variability in their elongation and shortening ^{[6][12][13][14][15]}[6,12,13,14,15]. At the whole-organ function level, heart rate variability (HRV), blood pressure variability, and gait variability are examples of functions that require noise for the systems’ proper function ^{[16][17][18][19]}[17,18,19,20].

3. Bioengineering Needs to Account for Variability

System engineering and computerized architectures of biological systems must account for the variability that characterizes them ^[20][21][16,21]. Engineering single-cell and multi-cellular biological systems using a combination of synthetic and systems biology, nanobiotechnology, pharmaceutical science, and computational approaches are challenged by noise and the intra- and inter-cellular fluctuations that characterize systems [22]. Bioengineering must comprise noisy variables inherent to biological systems and requires that biological noise is recognized as a design element with fundamentals that can be actively controlled [23]. As part of a stochastic design, engineering noise can improve modeling accuracy ^[16][20][16,17].

4. Digital Twins Use Real-World Data to Create Simulations

Digital twins were presented and defined by Grieves as a model, including virtual products, physical products, and their connection ^[17][18]. They use real-world data to create simulations that predict how a system performs. Digital twins reflect the real-time operation state, future evolution trends, and essential functions of systems by integrating historical data, real-time data, and physical models ^[18][19]. A digital twin is a virtual clone of a tangible entity, a vehicle engine, a person, or an intangible system, and is studied independently of its real-world counterpart to make informed judgments ^[19][24][20,24]. The definition provided for a digital twin differs from the conventional definition as a key tool for digital transformation in the manufacturing industry. According to the conventional definition, a digital twin is a virtual representation of a physical good, process or product. A DT is a virtual representation of a physical asset, process, or system that enables real-time monitoring, analysis, and optimization [25]. Digital twins collect data from multiple dimensions such as personnel, equipment, materials, processes, and the environment, generating an actual operation state in objects [26]. They conduct virtual simulations driven by real-time data to generate an optimal linkage operation strategy and process regulation ^[27][28][29][27,28,29]. Digital twins accurately describe and optimize the physical entity using an optimization model [30]. They make up for the deficiency of traditional modeling and simulation methods by reflecting the physical object’s essential characteristics ^[24][31][32][24,31,32]. The digital twin platform is divided into three linkage stages ^[24][33][34][24,33,34]. In the initial planning stage, digital twins collect real-time operation data on factors such as personnel, equipment, materials, methods, and the environment, creating a virtual object layer. In the dynamic revision planning stage, the virtual object layer in the digital twins-enabled architecture reflects the target. It dynamically evaluates and optimizes the process based on relevant models while comparing the actual operation state of the system with the dynamic optimization state. The virtual twin can adapt to changes in its physical counterpart, just as the physical object responds to interventions in the virtual twin ^[35][36][37][36,37,38]. Digital twins follow the coevolution of digital objects and physical entities by continuously collecting relevant data and improving themselves ^[31][38][31,39]. The model adapts using monitoring, collection, and processing of the associated sensors’ data on the system, enabling digital twins to make predictions about their corresponding physical counterparts ^[24][39][24,40]. Digital twins allow for forecasting and interventions to prevent problems under ever-changing real-world conditions ^[27][28][27,28]. A digital twin focuses on manufacturing operations by gathering data from physical sources and information technology ^[40][44]. The engineering of digital twin services is challenged by the complexity of interactions and the heterogeneous nature of these services. The concurrent use of models and data (e.g., model-based systems engineering (MBSE)) is considered for complex systems in service-oriented engineering projects. It was recently proposed that recalling information systems can improve workflow among enterprises and servitization ^[40][44].

5. Using Digital Twin Systems in Biology

The design of a digital twin model in biology is based on selecting a specific purpose and identifying the components of the targeted biological system and the interactions between them ^[41][45]. It implies capturing the mechanisms and features relevant to the selected purpose and the possible interventions, generating a conceptual map of the model that integrates all pre-defined components ^[42][46]. The model is validated using human or other preclinical data. These steps are followed by uncertainty quantification of the model’s behavior [29]. A model’s personalization requires using the appropriate patient-specific data to generate a subject-specific digital twin ^[43][47]. The model inputs consist of single-time or repeated clinical and laboratory biomarker measurements during diagnosis and therapeutic intervention. The model output consists of binary outputs, i.e., whether to intervene or not, or dynamic outputs, such as changes over time from a predetermined set of health parameters ^[24][29][24,29].

6. Applications of Digital Twins in Healthcare

The use of digital twins in healthcare is enhanced by improved computer capacity and the development of wearable and smart devices, which provide abundant data that require correct interpretation ^[36][37][37,38]. Nevertheless, implementing digital twins in medicine presents challenges due to the complexity and variability of biological processes, which are translated into noisy dynamic data ^[44][50]. Digital twins in healthcare provide advantages such as the remote visibility of patients and their internal organ systems and processes, and their physical devices’ behavior ^[45][51]. Digital twin models assist in drug development, early diagnosis, treatment optimization, and precision medicine ^[46][52]. Digital twins provide personalized medicine by bridging the inter-individual variability in the inputs and the response to treatment and disease trajectories ^[47][53]. They use individual cell, genetic, longitudinal clinical, and wellness data to produce distinct personalized models and collect continuous data on parameters from subjects and the environment. Virtual twins can identify a pre-illness condition, enabling preventive measures to be taken ^[48][49][43,54]. The historical and real-time data of individuals and the population assist machine learning (ML) algorithms in predicting future outcomes ^[50][51][52][55,56,57]. An example is a virtual representation of a single person where every known medicine for that subject’s illness is tested, enabling the improvement of therapeutic regimens ^[53][58]. The systems monitor the virtual “person” and provide notifications about side effects, enabling preventive action ^[49][54][54,59]. Historical and real-time data assist ML systems in predicting future conditions ^[50][55][56][55,60,61]. Models are generated for predicting the efficacy of a particular treatment based on frequent measurements of a patient’s clinical or laboratory biomarkers, or “offline”, using simulated patient populations for developing new drugs [29]. In cardiology, digital twins can improve planning and decision-making in cardiac interventions by creating individual structural and functional heart models ^[36][48][57][37,43,62]. The models simulate drug impact and responses to the implementation of devices, and can refine their output based on real-time intraoperative data ^[36][58][37,63]. This method applies to cardiac resynchronization therapy, valve replacement surgeries, catheter ablation procedures, and the correction of congenital heart diseases ^[36][58][59][37,63,64]. An artificial pancreas for treating type 1 diabetes mellitus comprises a closed-loop system that incorporates real-time glucose levels into an algorithm that directs insulin delivery ^[60][69]. It contains several features of digital twins, including collecting and analyzing patient-specific online data and generating clinically meaningful outputs ^[61][62][63][70,71,72]. In oncology, digital twin systems are developed for predicting outcomes and optimizing therapies ^[57][64][62,77]. Digital twins of the immune system have been developed while facing the challenge if its inherent complexity and the difficulty of measuring multiple variables of a patient’s immune state [29]. These models represent numerous autoimmune, inflammatory, infectious, and malignant diseases ^[57][62]. Digital twins were introduced as a tool for patients with multiple sclerosis to improve diagnosis, monitor disease progression, and adjust therapy ^[65][78]. Systems have been developed for modeling inflammatory bowel disease ^[66][79]. In orthopedics, a digital twin of the human vertebra, simulating its structure and response to physical stress, predicts the risk of fractures in predisposed subjects ^[59][64]. A limb model simulating its anatomy and range of motion facilitates planning and improves the outcome of arthroplasty procedures ^[52][67][57,81]. A digital twin of long bone fractures simulates stabilization modalities to guide intervention and postoperative management ^[66][79].

7. The Need to Model Uncertainties and Noise in Complex Systems

Despite the achievements of the digital twin systems, uncertainties are an integral part of the inference process ^[37][68][38,68]. Uncertainties can result from multiple structural, parametric, algorithmic, and observational variables. If these uncertainties are not adequately addressed, the allegedly optimal solutions or predictions generated by the model may fail in real life ^[69][41]. Inaccuracy or uncertainty in biology may cause misleading inferences and inadequate decision-making, potentially jeopardizing a patient’s health ^[36][37]. Confidence in prediction is also valuable for establishing clinicians’ trust in new technologies ^[37][68][38,68]. Digital twins can be designed to deal with the uncertainty and unpredictability that are part of the life cycle of complex systems ^[70][83]. Uncertainty quantification of digital twin models is necessary to improve their accuracy under dynamic internal and external environmental conditions. The current models aim to estimate and reduce the effect of uncertainties on model predictions ^[29][70][71][29,83,84]. Uncertainties in medical digital twin systems arise from the inherent complexity and variability of biological processes, which are reflected by the inaccuracy of the computational models ^[69][41]. The two primary sources of uncertainty that have been described are ‘aleatoric uncertainty’ and ‘epistemic uncertainty’ ^[69][72][41,85]. The former relates to statistical or data uncertainty and stems from unpredictable randomness, stochasticity, and the intrinsic noise of the measured variables ^[37][69][72][38,41,85]. This type of uncertainty is not reduced, even with more data collected ^[37][68][38,68]. Epistemic uncertainty refers to model or systematic uncertainty. It originates from the structure and parameters of the mathematical algorithms used for data analysis, including their assumptions and approximations, and from missing values and errors in the measurements ^{[61][68][69][72]}[41,68,70,85].

8. Digital Twins’ Methods for Dealing with Uncertainties

Neural network (NN) decisions are unreliable because they lack expressiveness and transparency [73]. An NN cannot understand or resonate with the content of the data it is trained on and cannot explain its decisions ^[74][75][74,75]. NNs are sensitive to small data distribution changes, making it difficult to rely on their predictions, and they show overconfidence and are vulnerable to adversarial attacks ^[76][77][76,86]. Several methods have been applied to medical deep learning systems for identifying and quantifying uncertainties, including Bayesian inference, fuzzy systems, and ensemble methods ^[69][41]. Considering uncertainties during data processing provides better verification and validation of the output and improves the system’s reliability ^[37][69][72][38,41,85]. 1. Complete Bayesian analysis is a component of probability statistics derived from the Bayesian theorem used for uncertainty quantification ^[69][78][41,87]. Bayesian inference estimates the probability of a hypothesis under updated knowledge (i.e., posterior probability). It uses prior probability (the probability of the hypothesis occurring irrespective of the updated knowledge), model evidence (the observation of experimental or simulated data), and likelihood (the probability of specific parameters being observed if the hypothesis is correct) ^[72][78][85,87]. Under the Bayesian principles, a prior distribution for the uncertain parameters is assumed based on expert knowledge. Using model evidence, the posterior distribution of these uncertain parameters is estimated via the formula, and a confidence interval reflecting the reliability of the result is extracted ^{[37][68][72][78]}[38,68,85,87]. 2. The Markov Chain Monte Carlo (MCMC) method is used to estimate the posterior distribution, which is computationally intensive and sometimes cannot be calculated analytically ^[68][69][41,68]. MCMC addresses the sampling problem via probability distribution and approximation methods (e.g., Variational Inference and Monte Carlo dropouts) [68]. Monte Carlo (MC) simulations attempt to predict all the possible results of a system with random variables ^[69][41]. The algorithm runs multiple possible values within the known range of each input parameter, producing an output of a probability distribution that reflects every possible result and its likelihood ^[61][70]. The MCMC method enables the expression of the posterior probability of complex real-world processes by using computer simulations of random samplings from the probability distribution ^[78][87]. 3. Variational inference (VI) for approximate Bayesian inference provides a computational approximation of the intractable posterior probability distribution by solving an optimization problem and finding a tractable distribution similar to the unknown one ^[61][68][68,70]. VI is faster than MCMC, and the convergence into a result is unequivocal [68]. However, it involves complex calculations, approximates the desired distribution rather than the theoretically optimal solution with considerably fewer samplings, and is applicable to large-scale datasets and complex models ^[61][68][68,70]. 4. The Monte Carlo dropout method for approximate Bayesian inference prevents overfitting during the training of deep learning systems, improving generalization and prediction abilities from unseen data during the testing phase [68]. Some neurons within the hidden layers of a deep NN are randomly omitted, including their incoming and outgoing connections, resulting in diminished network complexity. As the neuron elimination is random, each training iteration is performed on a different edited network, resulting in multiple predictions generated from the same data. The output is a distribution of predictions produced by ensembles of smaller networks, reflecting the model’s uncertainty ^[37][61][38,70].

9. Improving Digital Twins for Biological Systems by Differentiating between Inherent Noise and Measurement-Related Unwanted Noise

The computerized architectures of biological systems must account for systems’ inherent noise [6]. This requires differentiation between these systems’ inherent noise and noise resulting from the uncleanliness of datasets and noisy measurements. This differentiation is necessary for improving output accuracy. As the output characteristics of every system need to comprise its noise, this implies that the exact type of noise needs to be part of the output. The CDP implies that every system is characterized by a constrained-disorder bounded by dynamic boundaries ^[6][7][79][6,7,128]. Thus, differentiation between the two types of noise and uncertainty is necessary for generating accurate outputs using digital twins and is a critical element of their performance in complex biological systems in a personalized way ^[80][129]. The methods described above use approximations and distributions, which are beneficial for learning about systems and determining their trajectories. However, these methods are insufficient to reach the maximal accuracy required for analyzing dynamically disordered internal and external environments in complex biological systems ^[9][81][82][9,93,127].

10. Augmented Digital Twins Make Use of Noise to Improve the Performance of Biological Systems

Second-generation AI systems are developed to use the inherent noise of biological systems to improve model accuracies and, therefore, diagnoses, response to therapies, and outcome predictions ^{[83][84][85][86]}[113,130,131,132]. Based on the n = 1 concept, where the model generates subject-tailored outputs, these systems are dynamic, comprising methods that account for continuous alterations in the inherent noise of biological processes in a personalized way ^[81][87][88][93,133,134]. Second-generation AI systems, which quantify signatures of biological variabilities and implement them into treatment algorithms dynamically, were proposed for overcoming the loss of response to medications ^{[20][89][90][91][92][93][94][95][96][97][98][99][100][101][102][103][104][105][106][107][108][108][109][110]}[16,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,156,157,158]. Second-generation algorithms were found to account for dynamicity in response to therapies that characterized each subject ^[111][135]. This is based on evaluating the clinical outcome as an endpoint for the algorithm, which is the most relevant parameter for patients and healthcare providers. Digital twins that comprise the relevant noise-based signatures, such as HRV, or variability in cytokines secreted by immune cells in inflammatory disorders, provide higher accuracy for establishing diagnoses, generating treatment plans, and predicting outcomes dynamically in a personalized way ^{[83][84][85][86]}[113,130,131,132].