Applications of Hybrid Models in Bioprocess Development: History
Please note this is an old version of this entry, which may differ significantly from the current revision.
Contributor: , , , , ,

The Fourth Industrial Revolution (Industry 4.0) has spurred advanced development of intelligent biomanufacturing, which has evolved the industrial structures in line with the worldwide trend. To achieve this, intelligent biomanufacturing can be structured into three main parts that comprise digitalization, modeling and intellectualization, with modeling forming a crucial link between the other two components. Hybrid models are models that combine mechanistic models and machine learning algorithms (data-driven models) with bioprocess information at multi-spatial and temporal scales.

  • hybrid modeling
  • intelligent biomanufacturing
  • industrial biotechnology

1. Metabolic Engineering

Despite the advancements in the system and synthetic biology, developing new cell factories by traditional metabolic engineering remains challenging. It typically requires several months or even years to meet the economic requirements for industrial-scale production [1]. Recently, researchers have utilized advanced machine learning methods and omics technology to construct models that simulate complex cellular metabolism. These methods and technologies aid in improving the accuracy of product synthesis pathway design and optimization of metabolic flux, while significantly reducing the cost of research and development.

1.1. Metabolic Model Reconstruction for Better Performance

The GEMs are an important tool to investigate cell growth and production, so continuously upgrading the models by supplying the missing information of the metabolic network to improve the accuracy of the GEMs is essential. With the development of omics technology, rich information on genomics, transcriptomics, proteomics and metabolomics has provided a detailed supplement in the reconstruction of metabolic pathways. For example, Sánchez et al. applied GECKO to a Saccharomyces cerevisiae GEM (ecYeast7) by integrating kinetic and omics data to constrain the proteome resource allocation with an enhanced performance on phenotype prediction [2]. Nielson et al. developed a deep learning method (DLkcat) based on graph neural network (GNN) and convolutional neural network (CNN) [3]. DLkcat integrated substrate structure information and protein sequence information to achieve high-throughput prediction of kcat of cell metabolic enzymes [3]. This method was applied to reconstruct 343 ecGEM models of yeasts [3]. Culley et al. proposed a multimodal learning framework based on transcriptomics and fluxomics to predict the growth phenotype of S. cerevisiae cells with the integration of large-scale gene expression profiles and mechanistic metabolic model constrained based on transcriptome data [4]. A multi-view neural network method was used to compare the performance of the multi-omics constrained GEMs [4]. This method increased the prediction accuracy and provided tools for understanding the relationship between the biological mechanisms of metabolic changes and the phenotypes [4].
Identifying EC numbers to determine enzyme function is essential for identifying key enzymes in metabolic pathways to design and optimize target metabolic pathways. Ryu et al. developed DeepEC, a tool based on convolutional neural network (CNN) which takes protein sequences as input and EC numbers as output, to predict EC numbers with high precision and throughput [5]. Protein engineering methods are used to design new enzymes to meet metabolic requirements when key enzymes are missing in the target metabolic pathway. Directed evolution is a common approach to protein engineering, involving high-throughput screening of enzymes by iterative point mutation. However, this approach is associated with an enormous workload (for example, for a protein of 300 amino acids, there are 5700 single-point mutations and 32,381,700 double-point mutations) [6]. Deep learning methods such as variational autoencoder (VAE) and generative adversarial networks (GAN) can effectively improve the efficiency of predicting protein function and generating protein sequences with new functions to achieve rational protein design [6].

1.2. Metabolic Model-BASED Guidance for Strain Design

After designing a reasonable metabolic pathway, it is necessary to optimize metabolic flux allocation, identify key metabolic fluxes and maximize product titer, rate and yield (TRY) [7]. Metabolic flux analysis (MFA) allows kinetic models to simulate large-scale dynamic metabolic pathway fluxes so that the research cost can be greatly reduced. This method has been validated by simulating glycolytic reaction fluxes in Escherichia coli and human red blood cells [8]. Starepravo et al. proposed a hybrid model that integrated the kinetic model and dynamic MFA to simulate the flux change in the batch fermentation process with the combination of a single-level mixed-integer quadratic program (MIQP) [9]. This model can identify the shortest metabolic pathway from substrate to product, which has been applied in the biosynthetic pathway for astaxanthin production in Saccharomyces cerevisiae, reducing the original metabolic network by 70% [9]. Carinhas et al. updated a stoichiometric model to identify the key metabolic pathways involved in baculovirus production in insect cells by partial least squares (PLS) and MFA [10]. They finally targeted the TCA cycle and mitochondrial respiratory pathways as the key pathway to virus replication, guiding for the feeding operation optimization [10]. Precisely optimizing multi-gene metabolic pathways is a major challenge in metabolic engineering. HamediRad et al. constructed a fully automated robotic platform, named BioAutomata, using an integrated robotic system coupled with machine learning algorithms in order to fully automate the DBTL process for biosystems design [11].

2. Bioprocess Engineering

Hybrid modeling is an effective tool for the prediction of the key state variables in the bioprocess to explore the relationship between the operating parameters of the bioreactor and cell metabolism. Additionally, with advanced biosensors, the bioprocess can be monitored in real-time, which is beneficial to the optimization of the process operation and diagnosis of the fault in the bioprocess. Furthermore, up-scaling the bioreactor to the industrial scale is also essential, as it enables the translation of laboratory-scale production to commercial manufacturing.

2.1. Monitoring and Control of Bioprocess

The operations in the biological process affect cell growth and product formation. Hence, monitoring the changes of important state variables in real-time, such as cell concentration and product concentration, is of great significance to optimizing the operations, the production culture and controlling the product quality. Due to the improvements in spectroscopic techniques and sensors, many advanced sensors have been applied for real-time monitoring of key process parameters in fermentation [12][13]. Most of these spectroscopic techniques require data processing and model setup, like Raman spectroscopy [13] and near-infrared (NIR) spectroscopy [14].
Raman spectroscopy with partial least squares regression (PLSR) is currently used for bioprocess monitoring, and has been applied in the mammalian cell (e.g., CHO cell lines) cultivations at both the lab scale and industrial scale [15][16]. Due to the time-varying, nonlinear and complex characteristics of the fermentation process, some key state parameters are difficult to measure in real-time by the existing sensors. Therefore, hybrid models combining kinetic models and machine learning methods are important tools to predict key parameters and construct soft-sensor models to further guide the optimization in industrial production processes. Zhang et al. constructed a hybrid model of artificial neural network and kinetics information with an automatic model structure identification framework [17]. They identified the optimal kinetic model structure to predict the key state variables, and optimize the production process of lutein from microalgae [17]. In the process of quality control of biotherapeutics, such as monoclonal antibodies, Antonakoudis et al. integrated a stoichiometric model with an artificial neural network to predict the glycosylation profile in CHO cell cultivations [18]. With this hybrid model, the glycan distribution profiles can be computed with accuracy and thus a platform is provided for process control in biotherapeutics production [18].
Many methods have been developed for soft-sensor modeling, and more details about the advanced methods can be found in the review of [19].

2.2. Diagnosis and Analysis of Bioprocesses

Fault diagnosis is a technique that detects abnormal states occurring in production processes, which plays an important role in various biological fermentation processes. For example, Ding et al. constructed a fault diagnosis and rescue system based on a hybrid support vector machine and fuzzy reasoning to identify faults and their types at the earliest fermentation stage, and successfully applied them to glutamate fermentation [20]. By taking the relevant rescue measures based on the diagnosis results, the fermentations were successfully restored with the production of 75–80 g/L at 34 h [20]. Yang et al. proposed a hybrid model based on fast independent component analysis and probabilistic neural network (FICA-PNN) which could diagnose the faulty fermentation process in the fed batch production of penicillin more efficiently and accurately [21]. Abbsi et al. proposed a subspace-aided parity-based residual generation technique for fault detection and problem isolation in penicillin fermentation [22]. The method is based on the Just-In-Time (JIT) method which detects sensor faults and isolates and locates these problems [22]. This approach significantly improved the fault detection rate (FDR) and reduced the model complexity compared to existing diagnostic methods [22]. Yang et al. constructed a hybrid model for fault diagnosis and detection in penicillin fermentation by principal component analysis (PCA) for data dimensionality reduction, recursive feature elimination (RFE) for feature ranking and support vector machine (SVM) for the fault identification [23].

2.3. Optimization and Scale-Up of Bioprocesses

Based on the real-time changes of key parameters in the process, we can optimize the cultivation media, feeding operation, etc., to maintain the cells in the optimal state and finally improve the production efficiency and product quality. Oyetunde et al. integrated genome-scale metabolic models (GEMs) with machine learning methods to assess the microbial bio-production by E. coli [24]. As an example, the key design features (such as reactor volume, temperature and media) of 1200 cell factories from over 100 literature studies were extracted and then ranked to determine the most important factors by PCA [24]. The features selected affected the microbial cell production performance with the constrained GEM iML1515 model [24]. This framework is capable of predicting metabolic changes under different conditions and effectively identifying the indicators for E. coli production performance [24]. Pinto et al. constructed a hybrid semi-parametric model by integrating kinetic models with machine learning methods to optimize the biomass growth setpoint, temperature and biomass concentration at induction in the fed-batch fermentation in E. coli [25]. They successfully optimized the cell growth and recombinant protein expression conditions [25]. Bayer proposed a bioprocess digital twin used for hybrid-model based DoE (design of experiment) to identify optimal process critical process parameters (CPP) by a minimum number of variables with the highest space-time yield in E. coli [26]. Additionally, to control the physical and chemical parameters (such as pressure, pH, DO, etc.) in the bioreactor, Kiran et al. proposed a neural network-based model predictive controller (NNMPC) to regulate the feed rate of the substrate to control the carbon dioxide evolution rate and oxygen consumption rate in the continuous fed-batch fermentation in Saccharomyces cerevisiae [27]. Kim et al. proposed a two-stage control framework for the fed-batch fermentation by a kinetic model with a differential dynamic programming (DDP) to determine the optimal substrate feeding strategy [28].
Bioreactor scale-up is a critical step in bioprocess development. CFD can be employed to simulate flow field changes in industrial-scale bioreactors, while metabolic models can be used to predict the performance of cell growth and production in bioprocesses. Furthermore, machine learning algorithms can be leveraged to reduce computational costs. The integration of these approaches is crucial for the development of multi-scale hybrid models that can capture the spatial—temporal dynamics of bioprocesses. By using such models, bioprocess scale-up can be realized at minimal cost, thereby advancing the bioprocess development process [29]. For example, Kuschel et al. combined a CFD model with a cell cycle model of Pseudomonas putida KT2440 to predict the factors on the change of flow field and glucose gradients in a 54,000 L stirred tank reactor [30]. They explored the effects of culture process conditions on the formation of population heterogeneity in large-scale production from the perspective of cell growth and energy requirements [30]. Bayer et al. established a hybrid model with the integration of ANNs and a kinetic model of CHO cells to predict the viable cell concentrations, and product titers at shake flask (300 mL) scale and 15 L bioreactor scale [31]. This model can identify critical process parameters (CPPs) rapidly and determine the transferability of DoE along process scales with an intensified Design of Experiments (iDoE) [31]. Liu et al. combined a CFD model and cell death dynamics to investigate the effect of shear effect of C. tinctorius L. cells in a 5 L bioreactor, and successfully improved the design and optimization of the cultivation in scale-up process [32]. Yeoh et al. investigated the spatial and temporal effects of mass and gas transfer in the reactor on cell growth and production by integrating a kinetic model of E. coli with a CFD model, effectively increasing the bioconversion to 94% from ferulic acid to vanillin [33].

This entry is adapted from the peer-reviewed paper 10.3390/bioengineering10060744


  1. Nielsen, J.; Keasling, J.D. Engineering Cellular Metabolism. Cell 2016, 164, 1185–1197.
  2. Sanchez, B.J.; Zhang, C.; Nilsson, A.; Lahtvee, P.J.; Kerkhoven, E.J.; Nielsen, J. Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. Mol. Syst. Biol. 2017, 13, 935.
  3. Li, F.; Yuan, L.; Lu, H.; Li, G.; Chen, Y.; Engqvist, M.K.M.; Kerkhoven, E.J.; Nielsen, J. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction. Nat. Catal. 2022, 5, 662–672.
  4. Culley, C.; Vijayakumar, S.; Zampieri, G.; Angione, C. A mechanism-aware and multiomic machine-learning pipeline characterizes yeast cell growth. Proc. Natl. Acad. Sci. USA 2020, 117, 18869–18879.
  5. Ryu, J.Y.; Kim, H.U.; Lee, S.Y. Deep learning enables high-quality and high-throughput prediction of enzyme commission numbers. Proc. Natl. Acad. Sci. USA 2019, 116, 13996–14001.
  6. Yang, K.K.; Wu, Z.; Arnold, F.H. Machine-learning-guided directed evolution for protein engineering. Nat. Methods 2019, 16, 687–694.
  7. Lawson, C.E.; Marti, J.M.; Radivojevic, T.; Jonnalagadda, S.V.R.; Gentz, R.; Hillson, N.J.; Peisert, S.; Kim, J.; Simmons, B.A.; Petzold, C.J.; et al. Machine learning for metabolic engineering: A review. Metab. Eng. 2021, 63, 34–60.
  8. Yugi, K.; Nakayama, Y.; Kinoshita, A.; Tomita, M. Hybrid dynamic/static method for large-scale simulation of metabolism. Theor. Biol. Med. Model. 2005, 2, 42.
  9. Gerken-Starepravo, L.; Zhu, X.; Cho, B.A.; Vega-Ramon, F.; Pennington, O.; Antonio del Río-Chanona, E.; Jing, K.; Zhang, D. An MIQP framework for metabolic pathways optimisation and dynamic flux analysis. Digit. Chem. Eng. 2022, 2, 100011.
  10. Carinhas, N.; Bernal, V.; Teixeira, A.P.; Carrondo, M.J.T.; Alves, P.M.; Oliveira, R. Hybrid metabolic flux analysis: Combining stoichiometric and statistical constraints to model the formation of complex recombinant products. BMC Syst. Biol. 2011, 5, 34.
  11. HamediRad, M.; Chao, R.; Weisberg, S.; Lian, J.Z.; Sinha, S.; Zhao, H.M. Towards a fully automated algorithm driven platform for biosystems design. Nat. Commun. 2019, 10, 5150.
  12. Vigneshvar, S.; Sudhakumari, C.C.; Senthilkumaran, B.; Prakash, H. Recent Advances in Biosensor Technology for Potential Applications—An Overview. Front. Bioeng. Biotechnol. 2016, 4, 11.
  13. Classen, J.; Aupert, F.; Reardon, K.F.; Solle, D.; Scheper, T. Spectroscopic sensors for in-line bioprocess monitoring in research and pharmaceutical industrial application. Anal. Bioanal. Chem. 2017, 409, 651–666.
  14. Liu, G.H.; Jiang, H.; Xiao, X.H.; Zhang, D.J.; Mei, C.L.; Ding, Y.H. Determination of Process Variable pH in Solid-State Fermentation by FT-NIR Spectroscopy and Extreme Learning Machine (ELM). Spectrosc. Spect. Anal. 2012, 32, 970–973.
  15. Kozma, B.; Hirsch, E.; Gergely, S.; Parta, L.; Pataki, H.; Salgo, A. On-line prediction of the glucose concentration of CHO cell cultivations by NIR and Raman spectroscopy: Comparative scalability test with a shake flask model system. J. Pharm. Biomed. 2017, 145, 346–355.
  16. Mehdizadeh, H.; Lauri, D.; Karry, K.M.; Moshgbar, M.; Procopio-Melino, R.; Drapeau, D. Generic Raman-based calibration models enabling real-time monitoring of cell culture bioreactors. Biotechnol. Progr. 2015, 31, 1004–1013.
  17. Zhang, D.D.; Savage, T.R.; Cho, B.A. Combining model structure identification and hybrid modelling for photo-production process predictive simulation and optimisation. Biotechnol. Bioeng. 2020, 117, 3356–3367.
  18. Antonakoudis, A.; Strain, B.; Barbosa, R.; del Val, I.J.; Kontoravdi, C. Synergising stoichiometric modelling with artificial neural networks to predict antibody glycosylation patterns in Chinese hamster ovary cells. Comput. Chem. Eng. 2021, 154, 107471.
  19. Zhu, X.L.; Rehman, K.U.; Wang, B.; Shahzad, M. Modern Soft-Sensing Modeling Methods for Fermentation Processes. Sensors 2020, 20, 1771.
  20. Ding, J.; Cao, Y.; Mpofu, E.; Shi, Z.P. A hybrid support vector machine and fuzzy reasoning based fault diagnosis and rescue system for stable glutamate fermentation. Chem. Eng. Res. Des. 2012, 90, 1197–1207.
  21. Yang, Q.; Yao, J.T.; Zhang, X.; Chao, X.J. FICA-PNN Fault Diagnosis for Penicillin Fermentation Process. In Proceedings of the 2011 30th Chinese Control Conference (CCC), Yantai, China, 22–24 July 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 4351–4354.
  22. Abbasi, M.A.; Khan, A.Q.; Mustafa, G.; Abid, M.; Khan, A.S.; Ullah, N. Data-Driven Fault Diagnostics for Industrial Processes: An Application to Penicillin Fermentation Process. IEEE Access 2021, 9, 65977–65987.
  23. Yang, C.M.; Hou, J. Fed-batch fermentation penicillin process fault diagnosis and detection based on support vector machine. Neurocomputing 2016, 190, 117–123.
  24. Oyetunde, T.; Liu, D.; Martin, H.G.; Tang, Y.J.J. Machine learning framework for assessment of microbial factory performance. PLoS ONE 2019, 14, e0210558.
  25. Pinto, J.; de Azevedo, C.R.; Oliveira, R.; von Stosch, M. A bootstrap-aggregated hybrid semi-parametric modeling framework for bioprocess development. Bioproc. Biosyst. Eng. 2019, 42, 1853–1865.
  26. Bayer, B.; Diaz, R.D.; Melcher, M.; Striedner, G.; Duerkop, M. Digital Twin Application for Model-Based DoE to Rapidly Identify Ideal Process Conditions for Space-Time Yield Optimization. Processes 2021, 9, 1109.
  27. Kiran, A.U.M.; Jana, A.K. Control of continuous fed-batch fermentation process using neural network based model predictive controller. Bioproc. Biosyst. Eng. 2009, 32, 801–808.
  28. Kim, J.W.; Park, B.J.; Oh, T.H.; Lee, J.M. Model-based reinforcement learning and predictive control for two-stage optimal control of fed-batch bioreactor. Comput. Chem. Eng. 2021, 154, 107465.
  29. Wang, G.; Haringa, C.; Noorman, H.; Chu, J.; Zhuang, Y.P. Developing a Computational Framework To Advance Bioprocess Scale-Up. Trends Biotechnol. 2020, 38, 846–856.
  30. Kuschel, M.; Siebler, F.; Takors, R. Lagrangian Trajectories to Predict the Formation of Population Heterogeneity in Large-Scale Bioreactors. Bioengineering 2017, 4, 27.
  31. Bayer, B.; Duerkop, M.; Striedner, G.; Sissolak, B. Model Transferability and Reduced Experimental Burden in Cell Culture Process Development Facilitated by Hybrid Modeling and Intensified Design of Experiments. Front. Bioeng. Biotechnol. 2021, 9, 740215.
  32. Liu, Y.; Wang, Z.J.; Xia, J.Y.; Haringa, C.; Liu, Y.P.; Chu, J.; Zhuang, Y.P.; Zhang, S.L. Application of Euler-Lagrange CFD for quantitative evaluating the effect of shear force on Carthamus tinctorius L. cell in a stirred tank bioreactor. Biochem. Eng. J. 2016, 114, 212–220.
  33. Yeoh, J.W.; Jayaraman, S.S.; Tan, S.G.; Jayaraman, P.; Holowko, M.B.; Zhang, J.; Kang, C.W.; Leo, H.L.; Poh, C.L. A model-driven approach towards rational microbial bioprocess optimization. Biotechnol. Bioeng. 2021, 118, 305–318.
This entry is offline, you can click here to edit this entry!
Video Production Service