The development and application of emerging technologies of Industry 4.0 enable the realization of digital twins (DT), which facilitates the transformation of the manufacturing sector to a more agile and intelligent one. DTs are virtual constructs of physical systems that mirror the behavior and dynamics of such physical systems. A fully developed DT consists of physical components, virtual components, and information communications between the two. Integrated DTs are being applied in various processes and product industries. Although the pharmaceutical industry has evolved recently to adopt Quality-by-Design (QbD) initiatives and is undergoing a paradigm shift of digitalization to embrace Industry 4.0, there has not been a full DT application in pharmaceutical manufacturing. Therefore, there is a critical need to examine the progress of the pharmaceutical industry towards implementing DT solutions. The aim of this entry is to give an overview of the current status of DT development and its application in pharmaceutical and biopharmaceutical manufacturing. State-of-the-art Process Analytical Technology (PAT) developments, process modeling approaches, and data integration studies are reviewed. Challenges and opportunities for future research in this field are also discussed.
Competitive markets today demand the use of new digital technologies to promote innovation, improve productivity, and increase profitability[1]. The growing interests in digital technologies and the promotion of them in various aspects of economic activities[2] have led to a wave of applications of the technologies in manufacturing sectors. Over the years, the advancements of digital technologies have initiated different levels of changes in manufacturing sectors, including but not limited to the replacement of paper processing with computers, the nurturing and promotion of Internet and digital communication[1], the use of programmable logical controller (PLC) and information technology (IT) for automated production[3], as well as the current movement towards a fully digitalized manufacturing cycle[4]. The digitalization waves have enabled a broad range of applications from upstream supply chain management, shop floor control and management, to post-manufacturing product tracing and tracking.
Among the new digital advancements, the development of artificial intelligence (AI)[5], Internet of Things (IoT) devices[3][5] and digital twins (DTs) have received attention from governments, agencies, academic institutions, and industries[6]. The idea of Industry 4.0 has been put forward by the community of practice to achieve a higher level of automation for increased operational efficiency and productivity. Smart technologies under the umbrella of Industry 4.0, such as the development of the IoT, big data analytics (BDA), cyber-physical systems (CPS), and cloud computing (CC) are playing critical roles in stimulating the transformation of current manufacturing to smart manufacturing[7][8][9][10]. With the development of these Industry 4.0 technologies to assist data flow, a number of manufacturing activities such as remote sensing[11][12], real-time data acquisition and monitoring[13][14][15], process visualization (data, augmented reality, and virtual reality)[16][17], and control of all devices across a manufacturing network[18][19] is becoming more feasible. The implementation of Industry 4.0 standards by institutions and companies encourages them to implement a more robust, integrated data framework to connect the physical components to the virtual environment[1], enabling a more accurate representation of the physical parts in digitized space, leading to the realization and application of DTs.
The concept of creating a “twin” of a process or a product can be traced back to the late 1960s when NASA ensembled two identical space vehicles for its Apollo project[20][21][22]. One of the two was used as a “twin” to mirror all the parts and conditions of the one that was sent to the space. In this case, the “twin” was used to simulate the real-time behavior of the counterpart.
The first definition of a “digital twin” appeared in 2002 by Michael Grieves in the context of an industry presentation concerning product lifecycle management (PLM) at the University of Michigan[23][24][25]. As described by Grieves, the DT is a digital informational construct of a physical system, created as an entity on its own and linked with the physical system[24].
Since the first definition of DT, interpretations from different perspectives have been proposed, with the most popular one given by Glaessegen and Stargel, noting that a DT is an integrated multiphysics, multiscale, probabilistic simulation of a complex product and uses the best available data, sensors, and models to mirror the life of its corresponding twin[26]. It is generally accepted that a complete DT consists of a physical component, a virtual component, and automated data communications between the physical and virtual components[2]. Ideally, the digital component should include all information of the system that could be potentially obtained from its physical counterpart. This ideal representation of the real physical system should be an ultimate goal of a DT, but for practical usage, simplified or partial DTs are the dominant ones in industry currently, including the employment of a digital model where the digital representation of a physical system exists without automated data communications in both ways, and a digital shadow where model exists with one-way data transfer from physical to virtual component[2].
Together with the US Food and Drug Administration (FDA)’s vision to develop a maximally efficient, agile, flexible pharmaceutical manufacturing sector that reliably produces high quality drugs without extensive regulatory oversight[27], the pharmaceutical industry is embracing the general digitalization trend. Industries, with the help of academic institutions and regulatory agencies, are starting to adopt Industry 4.0 and DT concepts and apply them to research and development, supply chain management, as well as manufacturing practice[9][28][29][30][31]. The digitalization move that combines Industry 4.0 with International Council for Harmonisation (ICH) guidelines to develop an integrated manufacturing control strategy and operating model is referred to as the Pharma 4.0[32].
However, according to the recent survey conducted by Reinhardt et al.[33], the preparedness of the industry for this digitalization move is still unsatisfactory. It is noted that most pharmaceutical and biopharmaceutical processes currently rely on quality control checks, laboratory testing, in-process control checks, and standard batch records to assure product quality, whereas the process data and models are of lower impact. Within pharmaceutical companies, there are gaps in knowledge and familiarization with the new digitalization move, resulting in a roadblock in strategic and shop floor implementation of such technologies.
As mentioned in Section 1, a DT has a physical component, a virtual component, and automated data communication in between, which is realized through an integrated data management system. This synergy between the physical, virtual space, and the integrated data management platform is demonstrated in Figure 1. The physical component consists of all manufacturing sources for data, including different sensors and network equipment (e.g., routers, workstations)[34]. The virtual component needs to be a comprehensive digital representation of the physical component in all aspects[8]. The models are built on prior knowledge, historical data, and the data collected in real-time from the physical components to improve its predictions continuously, thus capturing the fidelity of the physical space. The data management platform includes databases, data transmission protocols, operation data, and model data. The platform should also support data visualization tools in addition to process prediction, dynamic data analysis, and optimization[34].
Figure 1. Physical component, virtual component, and data management platform of a general digital twin (DT) framework.
In pharmaceutical manufacturing, the potential of using DTs to facilitate smart manufacturing can be seen in different phases of process development and production. In the process design stage, the use of a DT can significantly accelerate the selection process of a manufacturing route and its unit operations as it is able to represent physical parts with various models. The understanding of process variations can be obtained from DT simulations, which allows for the prediction of product quality, productivity, and process attributes, reducing the time and costs for physical experiments[35]. In the operation phase, real-time process performance can be monitored and visualized at any time, and the DT can analyze the system in a continuous manner to provide control and optimization insights of the process[35]. The DT can also be used as a training platform for operators and engineers, as the real-time scenario simulation and on-the-job feedback can be realized through DT. With regards to pre- and post-manufacturing tasks, the DT platform can assist with tasks including but not limited to material tracking, serialization, and quality assurance.
Some key requirements for achieving smart manufacturing with DT include real-time system monitoring and control using Process Analytical Technology (PAT), continuous data acquisition from equipment, intermediate and final products, and a continuous global modeling and data analysis platform[29]. The pharmaceutical industry has taken several steps towards this by using techniques such as Quality-by-Design (QbD)[36], Continuous Manufacturing (CM)[36], flowsheet modeling[37], and PAT implementations[38]. Some of the tools have been investigated extensively, but the overall integration and development of DTs are still under infancy.
A key component in the development of a DT is data collection. In addition to readings from equipment, (critical) quality attributes also need to be collected from physical plants in a timely manner for use in the virtual component. The models and analyses are reliant on good data. Several traditional technologies exist to determine CQAs such as sieve analysis and High-Performance Liquid Chromatography (HPLC), but these cannot provide real-time data and are performed away from the production line rather than in-line or at-line. Thus, PAT tools have been explored and developed to address these issues[39].
PAT tools in the pharmaceutical industry have a wide range of applications, including measuring particle size of crystals[40], blend uniformity[41], testing tablet content uniformity[42], etc. Spectroscopy tools (Nuclear Magnetic Resonance (NMR), Ultraviolet (UV), Raman, near-infrared, mid-infrared, online mass spectrometry) constitute one of the major techniques used to measure the CQAs of pharmaceutical processes. Raman and Near-Infrared Spectroscopy (NIRS) are commonly used in the industry. Raman Spectroscopy has been employed for the on-line monitoring of powder blending processes[43]. Since acquisition times for Raman can be higher, NIRS is preferred for real-time measurements. NIRS has been used for real-time monitoring of powder density[15] and blend uniformity of processes[41]. NIRS has also been integrated with control platforms for process monitoring and control[44]. Baranwal et al.[45] employed NIRS to replace HPLC methods to predict API concentration in bi-layer tablets. PAT tools have also been used by the pharmaceutical industry to determine the particle size distribution of the product[46]. Several available optical tools such as Focused Beam Reflectance Measurement (FBRM)[47], a high-resolution camera system[48] have also been employed in the industry for particle size analysis. Some studies have utilized a network of PAT tools to achieve a monitoring system to help monitor and control a unit process[39][49].
The US FDA has also taken steps in promoting the use of PAT tools in pharmaceutical manufacturing with the goal of ensuring final product quality[50]. The pharmaceutical industry has adopted PAT in various applications throughout the drug-substance manufacturing process[51]. Although this has certainly led to an increase in the usage of PAT tools, their applications still remain focused on research and development rather than in full-scale manufacturing[38]. In the limited number of cases where they were employed in manufacturing, they have been successful in reducing manufacturing costs and improving the monitoring of product quality[52]. The development of different PAT methods, with their compelling application as an integral part of a monitoring and control strategy[53], has established a building block in gathering essential data from the physical component, enabling the further development of process model and DT.
DTs highly depend on the use of data and models, and in the pharmaceutical industry, there is a growing interest in the development and application of methods and tools that facilitate that[54]. Different types of models have been developed for batch and continuous process simulations, material property identification and prediction, system analyses, and advanced control. Papadakis et al. recently proposed a framework for selecting efficient reaction pathways for pharmaceutical manufacturing[55], which includes a series of modeling workflows for reaction pathway identification, reaction and separation analysis, process simulation, evaluation, optimization, and operation[54]. The overall framework would yield an optimized reaction process with identified design space and process analytical technology information. The models developed under this framework can all be used as the virtual component within a DT framework to provide further process understanding and control of the manufacturing plant.
As mentioned in Section 2.2, the modeling approaches can be classified as mechanistic modeling, data-driven modeling, and hybrid modeling. For mechanistic modeling approaches in pharmaceutical manufacturing, the discrete-element method (DEM), finite-element method (FEM), and computational fluid dynamics (CFD) are often used[56]. To simulate the particle-level or bulk behavior of the material flow in different pharmaceutical unit operations, DEM is a powerful tool and has been applied widely[57][58][59], though its high computational cost limits its practical use when running locally. With HPC and cloud computing, it is possible to integrate DEM simulations with the overall process, resulting in a near-real-time model. For model fluid flow in pharmaceutical processes, including API drying and fluidized beds, CFD and FEM are popularly implemented[56]. These two methods are also heavily utilized in biopharmaceutical manufacturing (see Section 4.2).
Data-driven modeling methods involve the collection and usage of a large amount of experimental data to generate models, and the resulting models are based on the provided datasets only. Commonly implemented approaches in pharmaceutical manufacturing include the artificial neural network (ANN)[60][61], multivariate statistical analysis, Monte Carlo[62], etc. These methods are less computationally intensive, but due to the lack of underlying physical understanding in the trained models, the prediction outside of the space of the dataset is often unsatisfactory.
There is also a recent trend in developing various types of hybrid modeling techniques to model complex pharmaceutical manufacturing processes, while lowering the demand of computational cost and data availability. Population balance modeling (PBM), with a comparatively lower computational cost, has been extensively used to model blending and granulation processes[63][64], and a PBM–DEM hybrid model has also been used to improve model accuracy while maintaining reasonable computational costs[65]. Other semi-empirical hybrid models, such as the ones that incorporate material properties into process models[66], and to investigate the effect of material properties in residence time distribution (RTD) and process parameters[58][67][68][69][70], have also been developed for different powder processing unit operations[71][72]. These models, when incorporated with a full DT framework, will facilitate the overall product and process design and development, accelerating the drug-to-market timeline.
In addition to developing models for single pharmaceutical unit operations, a flowsheet model integrating the entire manufacturing process can be used to predict the process dynamics affected by material properties and operating conditions of different unit operations. More importantly, systematic process analysis of the flowsheet model, such as sensitivity analysis, design space identification, and optimization, can all be performed with the flowsheet model. This provides insight into the characteristics and bottlenecks of the process and thus facilitates the development of control strategies[37]. Throughout the years of development, many researchers and pharmaceutical companies have developed mature approaches in conducting these analyses offline during the process design phase[71][73][37][74][75] . Flowsheet models are needed for the development of DTs. However, flowsheet models are stand-alone, so they cannot automatically update adapting to the physical plant. In current research, there is limited communication between the flowsheet model and the plant, which is a challenge in the development of a DT.
The implementation of IoT devices in pharmaceutical manufacturing lines leads to the acquisition of vast amounts of data. This collection of process data and CQAs needs to be transmitted to the virtual component in real-time and in an efficient manner. In addition to these, several pharmaceutical process models also require material properties for accurate prediction. Thus, a central database location is required for access to all datasets for the virtual component[76]. In addition, the applications and databases should also be compliant with 21 CFR Part 11 data integrity requirements in accordance with US FDA’s guidance[77]. The database not only serves as a warehouse for real product data but can also be used to store results from simulations performed in the virtual component and optimized process parameters. It would also serve the purpose of relaying back these optimized process parameters to the real product.
Several studies have attempted to achieve an integrated data framework in downstream pharmaceutical manufacturing[76][78][44][79][80][81][82]. Some of these studies were focused on implementing a control system for the direct compression line[44][70][82] . Cao et al.[76] presented an ISA-88 compliant manufacturing execution system (MES) where the batch data were stored on a cloud database as well as on a local data historian. The communications between the equipment and the control platform were performed in a similar manner for all the studies. The process control system (PCS) created a database based on the input recipe, and the database was replicated directly into the local data historian. The communication between the historian and PCS can be achieved using TCP/IP and OPC since each software is hosted on different computer systems on the same network. The historian database can in turn be duplicated onto the cloud using network protocols such as MQTT, HTTPS, etc. Some authors have also presented ontologies for efficient data flow for laboratory experiments performed during pharmaceutical manufacturing[83][84][85]. Cao et al.[76] also addressed the collection of laboratory data in an ISA-88 applicable recipe-based electronic laboratory notebook—many of the presented studies focused primarily on integrating one component of a completely integrated data management system. Figure 2 illustrates a sample data integration framework, where data collected from the manufacturing plant as well as laboratory experiments are uploaded to a cloud database using the mentioned protocols. The data can then be used in the virtual component for simulations, and corrective actions can be sent back to the control platform.
Biopharmaceutical manufacturing focuses on the production of large molecule-based products in heterogeneous mixtures, which can be used to treat cancer, inflammatory, and microbiological diseases[86][87]. To fulfill the FDA regulations and obtain safe products, biopharmaceutical operations should be strictly controlled and operate under a sterilized process environment.
In recent years, there is an increasing demand for biologic-based drugs that drives the need for manufacturing efficiency and effectiveness[88] . Thus, many companies are transitioning from batch to continuous operation mode and employing smart manufacturing systems[87]. DT integrates the physical plant, data collection, data analysis, and system control[4], which can assist biopharmaceutical manufacturing in product development, process prediction, decision making, and risk analysis, as shown in Figure 4. Monoclonal Antibody production is selected as an example to represent the physical plant, which includes cell inoculation, seed cultivation, production bioreactor, recovery, primary capture, virus inactivation, polishing, and final formulation. These operations produce and purify protein products. Quality (majorly protein structure and composition) and impurities need to be monitored and transported to a virtual plant for analysis and virtual plant updates. Virtual plant includes plant simulation, analysis, and optimization, which guide the physical plant diagnosis and update with the help of the process control system. Integrated mAb production flowsheet modeling, bioreactor analysis and design space and biomass optimization are selected as examples shown in the three sections in the figure. However, the capabilities of virtual plant are not limited to the examples list above. To understand the progress of DT development in biopharmaceutical manufacturing, this section reviews the process monitoring, modeling and data integration (virtual plant, physical plant communication) in the existed industry and analyzed possibilities and gaps to achieve integrated biopharma-DT manufacturing.
Figure 4. Biopharma process, benefits, and DT connections.
Biological products are highly sensitive to cell-line and operating conditions, while the fractions and structures of the product molecules are closely related to drug efficacy[89]. Thus, having a real-time process diagnostic and control system is essential to maintain consistent product quality. However, process contamination needs to be strictly controlled in the biopharmaceutical manufacturing; thus, the monitoring system should not be affected by fouling nor interfere with media to maintain monitoring accuracy, sensitivity, stability, and reproducibility[90]. In general, among different unit operations, process parameters and quality attributes need to be captured.
Biechele et al.[90] presented a review of sensing applied in bioprocess monitoring. In general, process monitoring includes physical, chemical, and biological variables. In the gas phase, the commonly used sensing system consists of semiconducting, electrochemical, and paramagnetic sensors, which can be applied to oxygen and carbon dioxide measurements[90][91]. In the liquid phase, dissolved oxygen, carbon dioxide, and pH values have been monitored by an in-line electrochemical sensor. However, media composition, protein production, and qualities such as glycan fractions are mostly measured by online or at-line HPLC or GC/MS[91][92]. The specific product quality monitoring methods are reviewed by Guerra et al.[93] and Pais et al.[94].
Recently, spectroscopy methods have been developed for accurate and real-time monitoring for both upstream and downstream operations. The industrial spectroscopy applications mainly focus on cell growth monitoring and culture fluid components quantifications[95]. UV/Vis and multiwavelength UV spectroscopy have been used for in-line real-time protein quantification[95]. NIR has been used for off-line raw material and final product testing[95]. Raman spectroscopy has been used for viable cell density, metabolites, and antibody concentration measurements[96][97]. In addition, spectroscopy methods can also be used for process CQA monitoring, such as host cell protein and protein post-translational modifications[92][98] . Research shows that in-line Raman spectroscopy and Mid-IR have capabilities to monitor protein concentration, aggregation, host cell proteins (HCPs), and charge variants[99][100]. The spectroscopy methods are usually supported with chemometrics, which require data pretreatments such as background correction, spectral smoothing, and multivariant analysis for quantitative and qualitative analysis of the attributes. Many different applications of spectroscopic sensing are reviewed in the literature[95][92][93][98].
The application of DT in biopharmaceutical manufacturing requires a complete virtual description of physical plant within a simulation platform[4]. This means that the simulation should capture the important process dynamics in each unit operation within an integrated model. Previous reviews have focused on the process modeling methods for both upstream and downstream operations[88][101][102][103][104][105].
For upstream bioreactor, extracellular fluid dynamics[106][107][108], system heterogeneities, and intracellular biochemical pathways[109][110][111][112][113][114][115][116][117][118][119][120] can be captured. Process modeling supports early-stage cell-line development, obtains optimal media formulations, and enables prediction of the overall bioreactor performance, including cell activities, metabolites’ concentrations, productivity, and product quality under different process parameters[121][122]. The influence from various parameters such as temperature, pH, dissolved oxygen, feeding strategies, and amino acid concentrations can be captured and further used to optimize process operations[123][124][125][126][127].
For downstream operation, modeling strategies have focused on selecting design parameters, adjusting operating conditions, and buffer usage to achieve high protein productivity and purities efficiently. The different operating conditions include (1) flowrate, buffer pH, or salt concentration effects for chromatography operation[127][128][129][130][131]; (2) residence time, buffer concentration, and pH used for virus inactivation; (3) feed protein concentration, flux, retentate pressure operated for filtration[132]. Thus, the product concentration and various types of impurities can be predicted for each unit operation. The detailed modeling methods have been reviewed in the literature[133].
In recent years, biopharmaceutical companies are shifting from batch to continuous operations. It remains an unanswered question if it is feasible to start up a new, fully continuous process plant or replace specific unit operations with continuous units. Integrated process modeling provides a virtual platform to test various operating strategies such as batch, continuous, and hybrid operating modes[134]. These different operating modes can be compared based on life cycle analysis and economic analysis for different target products under various operation scales[134][135][136][137][138].
For flowsheet modeling, there are two approaches available in the literature, which include mechanistic and data-driven models. Due to the high computational cost, mechanistic modeling mostly focuses on the integration of a limited number of units, such as the combination of multiple chromatography operations[139]. Data-driven/empirical models are generally used to integrate all the unit operations in a computationally efficient way. Mechanistic models for a single unit can be integrated with other units that are built by the data-driven model to optimize a specific unit in the integrated process[140]. Mass flow and RTD models[141] can be included in the model to examine different scenarios of adding and replacing new unit operations and adjusting process parameters. Coupling with the control system, flowsheet modeling will be able to achieve real-time decision making and optimize the overall process operation automatically[142].
The data-driven models can be further integrated with Monte Carlo analysis or linear/nonlinear programming for risk assessment and process scheduling. Zahel et al.[143] applied Monte Carlo simulation in the end-to-end data-driven model, which can be used to estimated process capabilities and provide risk-based decision making following a change in the manufacturing operations.
Data obtained in the biopharmaceutical monitoring system are usually heterogeneous in data types and time scales. They can be collected from different sensors, production lines (laboratory or manufacturing), and at different time intervals. With the development of real-time PAT sensors, a large amount of data is obtained during biopharmaceutical manufacturing. Thus, data preprocessing is essential to handle missing data, perform data visualization, and reduce dimension[144]. Casola et al.[145] presented data mining-based algorithms to stem, classify, filter, and cluster historical real-time data in batch biopharmaceutical manufacturing. Lee et al.[146] applied data fusion to combine multiple spectroscopic techniques and predict the composition of raw materials. These preprocessing algorithms remove noise from the dataset and allow the data to be used in a virtual component directly.
In DTs, virtual components and physical components should communicate frequently. Thus, the virtual platforms need to have the flexibility to adjust their model-structure for different products and operating conditions. Herold and King[147] presented an algorithm that used biological phenomena to identify fed-batch bioreactor process model structure automatically. Luna and Martinez[148] used experimental data to train the imperfect mathematical model and corrected model prediction errors. Although there are no such applications for the integrated process, these works show the possibilities to achieve physical and virtual component communication.
In biopharmaceutical manufacturing, the integrated database can guide process-wide automatic monitoring and control[149]. Fahey et al. applied six sigma and CRISP-DM methods and integrated data collection, data mining, and model predictions for upstream bioreactor operations. Although the process optimization and control have not been considered in this work, it still shows the capabilities to handle large amounts of data for predictive process modeling[150]. Feidl et al.[149] used a supervisory control and data acquisition (SCADA) system to collect and store data from different unit operations at each sample time and developed a monitoring and control system in MATLAB. The work shows the integration of supervisory control with a data acquisition system in a fully end-to-end biopharmaceutical plant. However, process modeling has not been considered during the process operations, which cannot support process prediction and analysis.
DTs are a crucial development of the close integration of manufacturing information and physical resources that raise much attention across industries. The critical parts of a fully developed DT include the physical and virtual components, and the interlinked data communication channels. Following the development of IoT technologies, there are many applications of DT in various industries, but the progress is lagging for pharmaceutical and biopharmaceutical manufacturing. This review paper summarizes the current state of DT in the two application scenarios, providing insights to stakeholders and highlighting possible challenges and solutions of implementing a fully integrated DT.
This entry is adapted from the peer-reviewed paper 10.3390/pr8091088