From Shallow to Deep Bioprocess Hybrid Modeling- Review

From Shallow to Deep Bioprocess Hybrid Modeling- Review: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

Jorge M. Mendes

Hybrid neural network (HNN) modeling is the combination of artificial neural networks (ANNs) with prior knowledge in a mathematical framework. There are two main approaches to incorporating prior knowledge: design and training methods. Design approaches use prior knowledge to define the network structure, while training approaches use it to guide parameter estimation. Both approaches reduce data dependency, making models less sensitive to sparse and noisy data, and improving their descriptive and predictive capabilities compared to pure ANNs. HNNs are a powerful tool for understanding complex processes like bioprocesses and accelerating product development. Bioprocess modeling is challenging due to nonlinearity, dynamics, and uncertainty. Traditional models based on physical and chemical laws can be overly simplistic or hard to calibrate. Data-driven ANN models lack interpretability and generalization. HNNs combine the strengths of both approaches, enhancing the accuracy, robustness, and efficiency of bioprocess modeling by integrating prior knowledge with ANNs.

artificial neural network
hybrid neural network
deep learning
hybrid model
bioprocess
digitalization
Industry 4.0

1. Introduction

Hybrid neural network (HNN) modeling may be defined as the combination of artificial neural networks (ANNs) with prior knowledge in a common mathematical structure. According to Thompson and Kramer ^[1], there are two main approaches to embodying prior knowledge in neural network models, namely design and training approaches. In design approaches, prior knowledge dictates the structure of the model. Prior knowledge is used to define the ANN topology (e.g., number of layers, types of layers, types of nodes), for modularization of the network, or to include nonnetwork mathematical equations (e.g., physical laws) in tandem with the ANN model. In training approaches, the prior knowledge dictates the parameter estimation problem either in the form of variable constraints, network weights constraints, or the definition of the loss function. In both design and training approaches, the inclusion of prior knowledge reduces data dependency: in design approaches, the dimension of the parameter space is reduced (i.e., the ANN is smaller, and the number of weights is lower), whereas in training approaches, the feasible region of the parameter space is reduced. In both cases, the training becomes less sensitive to sparse and noisy data and the final model improves its descriptive and predictive power in relation to a fully ANN model.

2. Design Approaches

In design approaches, prior knowledge shapes the structure of the HNN model. Design approaches may be further subdivided into modular and semiparametric design. In modular design, the overall model is decoupled into smaller interconnected modules. A biochemical process typically comprehends several interconnected unit operations. Instead of a large ANN to describe the full process, smaller ANNs may be interconnected to match process topology (prior knowledge). Whereas in a single ANN, all input nodes are connected to all output nodes, in modular ANNs, the connectivity is sparser, the number of parameters is reduced, and the data requirements are therefore also reduced. The modular ANN model is also more transparent, as the input/outputs variables of the modules have physical meaning. The hierarchical neural network model proposed by Mavrovouniotis and Chang is an example of modular ANN design ^[2]. A more recent example is the knowledge-based ANN concept proposed in ^[3].

The most frequently reported approach is semiparametric design (Figure 1A,B), in which physical laws are directly incorporated in the model structure. Instead of a large ANN, a smaller one is combined with physical laws in the form of a semiparametric model. Semiparametric models combine per definition parametric and nonparametric functions in the same model structure. Parametric functions are derived from prior knowledge of first principles, well-established mechanisms, and/or empirical correlations. They have a fixed mathematical structure and a fixed number of parameters with physical interpretation. On the contrary, ANNs are nonparametric functions entirely derived from process data. They have a loose structure without physical interpretation. Both model components are trained together.

Figure 1. Different methods to embody prior knowledge in HNN models: (A) Serial semiparametric hybrid structure. (B) Parallel semiparametric hybrid structure. (C) Physics-informed neural network structure.

Semiparametric HNN models can be classified as serial or parallel (Figure 1A,B). In serial structures (Figure 1A), the parametric equations (prior knowledge) cover only some parts of the process. The ANN has the job of learning from data the cause–effect relationships of those process parts lacking prior knowledge. One example of a serial semiparametric structure is the Psichogios and Ungar bioreactor hybrid model ^[4], where an FFNN (nonparametric model that calculates bioreaction kinetics) is connected in series with macroscopic material balance equations (parametric model). Other examples are the general bioreactor hybrid model ^[5] and the neural ordinary differential equation (neural ODE) model ^[6]^[7].

In the case of parallel semiparametric structures (Figure 1B), a full parametric model stemming from prior knowledge exists that, however, is not sufficiently accurate to describe the process. The ANN runs in parallel to compensate for parametric model inaccuracies. The parametric model takes priority in describing the process outputs. Firstly, the parametric model parameters are estimated to minimize the model–process mismatch. Then, the residuals of the parametric model are calculated over the data input space. If the residuals contain relevant information beyond the noise level, an ANN is trained to extract the cause–effect relationship from the parametric model residuals. The size of the ANN compensator is inversely proportional to the explained data variance using the parametric model. In limit, if the explained variance is sufficiently high, then the ANN compensator is not needed. Conversely, if the explained variance is negative, then the parametric model should be removed since a fully ANN will perform better than the hybrid structure. Examples of parallel semiparametric hybrid models are provided by, e.g., Côté et al. ^[8], Piron et al. ^[9] and Peres et al. ^[10]. The bioreactor model proposed by Thompson and Krameris simultaneously parallel and serial ^[1]. The biological kinetics are described with a Monod-type kinetic model (parametric) connected in parallel with a RBFN (nonparametric) compensator. The RBFN performs an additive correction of the Monod-type kinetic model outputs. The corrected kinetics are then connected in series with macroscopic material balances (parametric) in a similar way to the model reported by Psichogios and Ungar ^[4].

3. Training Approaches

Prior knowledge may also be incorporated in HNN models through the training method. It may dictate constraints on process variables, e.g., concentrations or reaction rates of irreversible reactions must be positive. Variable inequality constraints, network weights inequality constraints, and loss function regularizers may be introduced in the training method to enforce such desired output behavior ^[1]. Several methods have been developed to enforce ANN output monotonicity, convexity, concavity, or smoothness by adding parameter constraints and loss function regularizers ^[11]^[12]. More recently, physics-informed neural networks (PINNs) have emerged for modeling CFD problems based on the Navier–Stokes equations ^[13] or partial differential equations (PDEs) in general. The innovative aspect of PINNs is that physical equations are embodied in a “pure” ANN structure via the training approach (Figure 1C). PINNs use a DNN to parameterize state variables over independent variables (time and spatial coordinates). Automatic differentiation (AD) is applied to obtain partial derivatives of the state variables in time and spatial coordinates and to calculate a PDE’s agreement error (the terms on the right and left sides of the PDEs equations must agree with each other). Two different sets of residuals are simultaneously minimized during the training: (i) the measurement residuals between calculated and measured state variables; (ii) the Navier–Stokes PDEs agreement (physics) residuals. PINNs have been shown to converge to PDEs solutions obtained using numerical discretization methods. Moreover, PINNs have been shown to seamlessly integrate data and mathematical models in flow problems ^[14]. A key advantage of PINNs in relation to semiparametric HNNs is that numerical integration, or any other numerical method inherent to the parametric model, are avoided. This may be a substantial advantage in the case of PDEs and stiff systems of ODEs. Possible disadvantages are, however, that the exact mapping of the physical laws to the DNN structure is not guaranteed and that the physical laws may not be obeyed in case of extrapolation.

4. General Bioreactor Hybrid Model

Since the pioneering works by Psichogios and Ungar ^[4], Thompson and Kramer ^[1], and Schubert et al. ^[15]^[16], attempts were made to propose a bioreactor HNN structure that covers a wide range of problems ^[5]^[17]^[18]^[19]^[20]. The general bioreactor hybrid model (Figure 2) is a serial semiparametric structure that combines an ANN and a system of ODEs with information feedback. Prior knowledge is represented by a system of ODEs derived from macroscopic material balances and/or intracellular material balances. These may be divided into a state–space model and a measurement model that computes observable process outputs. The ANN is used to model cell properties (lacking fundamental knowledge) as a function of the process state and exogenous inputs. The observable outputs are compared with training examples under a supervised learning scheme.

Figure 2. General deep hybrid model for bioreactor systems. The model has parametric functions (functions f(.) and h(.)) with fixed mathematical structure; typically material/energy balance equations). Some process properties, v, lacking mechanistic explanation are modelled using a feedforward neural network (FFNN) as a function of the process state, x; exogenous inputs, u; and time, t. FFNN is a nonparametric function with loose structure that must be identified from process data given the absence of explanatory mechanisms for that particular part of the process. The model is dynamic in nature with state vector, x, and observable outputs, y, changing over time.

Most previous studies have adopted a shallow HNN configuration based on three-layers FFNNs (with a single hidden layer with tanh activation function) connected in series with material balance equations. Shallow HNNs are typically trained in a least squares sense employing the Levenberg–Marquardt optimization algorithm. The computation of gradients follows the indirect sensitivity method originally proposed by Psichogios and Ungar ^[4] and detailed by Oliveira ^[5]. Sensitivity equations are required because the observable outputs, y, are not directly linked to the ANN outputs, v. Cross-validation is adopted to prevent overfitting. Recently, the general bioreactor HNN has been extended to deep multilayered FFNNs and deep learning ^[19]^[21]^[22]. Multiple hidden layers with rectified linear unit (ReLU) activation functions were adopted, connected in series with material balance equations. The ADAM algorithm ^[23] was applied to train the deep HNN in a weighted least squares sense. The objective function gradients were computed using modified semidirect sensitivity equations, thereby significantly reducing the CPU time ^[19]. Stochastic regularization based on random training examples and ANN weights dropout was adopted to mitigate overfitting. Deep HNN modeling was shown to systematically generalize better than shallow HNN modeling.

This entry is adapted from the peer-reviewed paper 10.3390/fermentation9100922

References

Thompson, M.L.; Kramer, M.A. Modeling Chemical Processes Using Prior Knowledge and Neural Networks. AIChE J. 1994, 40, 1328–1340.
Mavrovouniotis, M.L.; Chang, S. Hierarchical Neural Networks. Comput. Chem. Eng. 1992, 16, 347–369.
Nagarajan, H.P.N.; Mokhtarian, H.; Jafarian, H.; Dimassi, S.; Bakrani-Balani, S.; Hamedi, A.; Coatanéa, E.; Gary Wang, G.; Haapala, K.R. Knowledge-Based Design of Artificial Neural Network Topology for Additive Manufacturing Process Modeling: A New Approach and Case Study for Fused Deposition Modeling. J. Mech. Des. 2019, 141, 021705.
Psichogios, D.C.; Ungar, L.H. A Hybrid Neural Network-first Principles Approach to Process Modeling. AIChE J. 1992, 38, 1499–1511.
Oliveira, R. Combining First Principles Modelling and Artificial Neural Networks: A General Framework. Comput. Chem. Eng. 2004, 28, 755–766.
Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D. Neural Ordinary Differential Equations. arXiv 2018, arXiv:1806.07366.
Rackauckas, C.; Ma, Y.; Martensen, J.; Warner, C.; Zubov, K.; Supekar, R.; Skinner, D.; Ramadhan, A.; Edelman, A. Universal Differential Equations for Scientific Machine Learning. arXiv 2020, arXiv:2001.04385.
Côté, M.; Grandjean, B.P.A.; Lessard, P.; Thibault, J. Dynamic Modelling of the Activated Sludge Process: Improving Prediction Using Neural Networks. Water Res. 1995, 29, 995–1004.
Piron, E.; Latrille, E.; René, F. Application of Artificial Neural Networks for Crossflow Microfiltration Modelling: “Black-Box” and Semi-Physical Approaches. Comput. Chem. Eng. 1997, 21, 1021–1030.
Peres, J.; Oliveira, R.; Feyo De Azevedo, S. Knowledge Based Modular Networks for Process Modelling and Control. Comput. Chem. Eng. 2001, 25, 783–791.
Richman, R.; Wuthrich, M.V. Smoothness and Monotonicity Constraints for Neural Networks Using ICEnet. SSRN Electron. J. 2023.
Rosca, M.; Weber, T.; Gretton, A.; Mohamed, S. A Case for New Neural Network Smoothness Constraints. Proc. Mach. Learn. Res. 2020, 137, 21–32.
Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-Informed Neural Networks: A Deep Learning Framework for Solving Forward and Inverse Problems Involving Nonlinear Partial Differential Equations. J. Comput. Phys. 2019, 378, 686–707.
Jin, X.; Cai, S.; Li, H.; Karniadakis, G.E. NSFnets (Navier-Stokes Flow Nets): Physics-Informed Neural Networks for the Incompressible Navier-Stokes Equations. J. Comput. Phys. 2021, 426, 109951.
Schubert, J.; Simutis, R.; Dors, M.; Havlík, I.; Lübbert, A. Hybrid Modelling of Yeast Production Processes—Combination of a Priori Knowledge on Different Levels of Sophistication. Chem. Eng. Technol. 1994, 17, 10–20.
Schubert, J.; SIMUTIS, R.; Dors, M.; Havlik, I.; Lübbert, A.; LUBBERT, A.; Lübbert, A.; LUBBERT, A. Bioprocess Optimization and Control: Application of Hybrid Modelling. J. Biotechnol. 1994, 35, 51–68.
Von Stosch, M.; Oliveira, R.; Peres, J.; Feyo De Azevedo, S. A Novel Identification Method for Hybrid (N)PLS Dynamical Systems with Application to Bioprocesses. Expert Syst. Appl. 2011, 38, 10862–10874.
Pinto, J.; de Azevedo, C.R.; Oliveira, R.; von Stosch, M. A Bootstrap-Aggregated Hybrid Semi-Parametric Modeling Framework for Bioprocess Development. Bioprocess Biosyst. Eng. 2019, 42, 1853–1865.
Pinto, J.; Mestre, M.; Ramos, J.; Costa, R.S.; Striedner, G.; Oliveira, R. A General Deep Hybrid Model for Bioreactor Systems: Combining First Principles with Deep Neural Networks. Comput. Chem. Eng. 2022, 165, 107952.
Teixeira, A.P.; Carinhas, N.; Dias, J.M.L.; Cruz, P.; Alves, P.M.; Carrondo, M.J.T.; Oliveira, R. Hybrid Semi-Parametric Mathematical Systems: Bridging the Gap between Systems Biology and Process Engineering. J. Biotechnol. 2007, 132, 418–425.
Pinto, J.; Costa, R.S.; Alexandre, L.; Ramos, J.; Oliveira, R. SBML2HYB: A Python Interface for SBML Compatible Hybrid Modeling. Bioinformatics 2023, 39, btad044.
Pinto, J.; Ramos, J.R.C.; Costa, R.S.; Oliveira, R. A General Hybrid Modeling Framework for Systems Biology Applications: Combining Mechanistic Knowledge with Deep Neural Networks under the SBML Standard. AI 2023, 4, 303–318.
Kingma, D.P.; Ba, J.L. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.