Digital Twins, which are virtual representations of physical systems mirroring their behavior, enable real-time monitoring, analysis, and optimization. Understanding and identifying the temporal dependencies included in the multivariate time series data that characterize the behavior of the system are crucial for improving the effectiveness of Digital Twins. Long Short-Term Memory (LSTM) networks have been used to represent complex temporal dependencies and identify long-term links in the Industrial Internet of Things (IIoT).
1. Introduction
Digital Twins connect the real and virtual worlds
[1][2], offering simulations, projections, and insights that can be applied to decision-making, optimization, and maintenance tasks
[3]. The Digital Twin can learn and capture the underlying patterns and dependencies of the dynamic system by examining the historical and real-time data of the multivariate time series and training an LSTM network with this data
[4]. This enables it to generate precise simulations and predictions. Digital Twin is a technology that is still under development, but it has the potential to revolutionize the way researchers manage assets. Digital Twins are virtual representations of physical assets that can be used to simulate, monitor, and optimize the performance of those assets
[5].
Temporal dependencies are the patterns and relationships that develop over time between the variables in multivariate time series data. These dependencies can include seasonality, trends, lagged relationships, sequential patterns, and other temporal structures. Temporal dependencies and multivariate time series analysis are crucial in many areas, including weather forecasting, industrial operations, and others
[6]. To make effective decisions and maximize system performance, it is essential to have the ability to accurately analyze and forecast
[7] the behavior of complex dynamics. Time series data can be utilized for analyzing various phenomena, including household electricity usage, road occupancy rates, currency exchange rates, solar power generation, and even musical notation. Most of the time, the data collected consists of multivariate time series (MTS) data, such as the local power company monitoring the electricity consumption of numerous clients. Complex dynamic interdependencies between different series
[6] can be significant but challenging to capture and analyze. As science and technology continuously advance, systems used by people are becoming increasingly complex.
Multivariate time series (MTS) and increasingly sophisticated data are required to explain complex systems
[7][8]. The system generates multiple variables at any given time, resulting in a multivariate time series denoted by the matrix
𝑋={𝑋1, 𝑋2, 𝑋3,……, 𝑋𝑚}, which records the values of these numerous variables at different time steps within the same period. In several areas, such as urban air quality forecasting
[9], traffic prediction
[10][11], the COVID-19 pandemic
[12], and the industrial sector
[13], it is essential to analyze the MTS data. Analysts frequently attempt to predict the future using historical data. Forecasting can be more precise when the interdependencies among distinct variables are effectively modeled. In general terms, researchers refer to the concurrent correlation between different variables in the MTS as spatial connections
[12], the concurrent correlation between variables at different time points as temporal dependency correlation, and the concurrent correlation between different variables at different points in time as temporal linkage.
The concept of the Digital Twin represents one potential application of LSTM networks in the context of time series analysis. A Digital Twin is a virtual representation of a real-world system or process. By integrating real-time input from sensors and other sources with the capabilities of LSTM models, a Digital Twin can accurately replicate the behavior and dynamics of the physical system it represents. Long Short-Term Memory (LSTM) networks, a type of Recurrent Neural Network (RNN) explicitly designed to capture and leverage temporal dependencies, offer an effective approach for analyzing multivariate time series data. LSTM networks are particularly well-suited for modeling time series data due to their ability to handle sequences with long-term dependencies.
To build effective predictive models, identifying the appropriate lag order (the number of past observations used as inputs in a time series model) is crucial. The Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF) are used for this purpose. The methodology’s primary contribution to this research lies in its utilization of statistical methods for identifying lags in time series data. These methods play a pivotal role in helping researchers to comprehend the relationships between data points at different time stamps, a critical aspect of identifying and modeling temporal dependencies.
Utilizing temporal dependencies in multivariate time series analysis through LSTM networks, in combination with the concept of Digital Twins, forms a powerful approach to understanding and predicting complex systems. These methodologies open up new avenues for optimization, proactive maintenance, and decision support across various industries, ultimately enhancing productivity, reliability, and overall performance.
2. Digital Twins
Recently, several industries, including manufacturing and the automobile industry
[14], have chosen to make Digital Twin a cornerstone of their technology. Digital Twin offers data fusion and the ability to replicate physical systems
[15].
In the field of processing time series data, the autoregressive model has traditionally been employed. This methodology assumes that the time series
[16] under investigation exhibit a linear relationship with their past values. It predicts future values based on linear modeling of previous values, possibly with a constant term and random error. Autoregressive Integrated Moving Average (ARIMA) is the model
[17] that most commonly employs the autoregressive model concept. It transforms a non-stationary time series into a stationary one
[18], after which the ARIMA model is applied to represent the data. However, as the ARIMA model assumes a linear relationship between the projected value of the time series, past values, and noise, it can only be used for assessing stationary series and cannot effectively predict or address numerous complex time series.
As deep learning advances, more researchers are exploring the use of deep learning to model the problem of multivariate time series analysis. Recurrent Neural Networks (RNNs) and their variations serve as representative models for sequence-based deep learning. However, it can be challenging for these models to converge due to issues like the vanishing gradient and exploding gradient
[19]. The vanishing gradient problem in RNNs has been partially addressed by LSTM, which is still utilized in many sequential models. The authors of
[20] integrated LSTM with the conventional genetic algorithm to forecast time series. The genetic algorithm selected the optimal LSTM structure, which was subsequently successfully tested on time series data from the petroleum industry. In another study, ref.
[18] employed LSTM for supply chain analysis and forecasting, achieving excellent results. LSTM was used to estimate the power load in a power compliance early warning system and assess several sets of time series
[19][20] generated by power consumption. The authors successfully combined the random forest method with LSTM to predict the price of carbon. Furthermore, ref.
[21] constructed a self-encoder network based on LSTM for forecasting daily precipitation time series data, while the authors of
[22] applied LSTM to analyze historical oil well production data and make predictions.
Time series data are present in all aspects of daily life. Researchers collect time series data by observing evolving variables produced by sensors over discrete time increments
[21]. There are some examples prior to the work of
[22] for knowledge discovery in temporal data. These methods mostly handle point-based events and only consider data as chronological series. As a result, the physical arrangement of events is relatively straightforward, and the expressiveness of using temporal relations such as “during” and “overlaps,” etc., is limited. In addition to the parallel and serial ordering of event sequences, when dealing with time series data for events that last across time, researchers may encounter other intriguing temporal patterns
[23]. Examples of patterns that cannot be described as simple sequential ordering are “event A occurs during the time event B happens” and “event A’s occurrence time overlaps with that of event B and both of these events occur before event C appears.” However, it is suggested that temporal logic
[24] be used to express temporal patterns defined over categorical data. Temporal operators are utilized, including since, until, and next. Event A may always occur until Event B appears in the patterns
[25]. Sequence data is typically processed using Recurrent Neural Networks (RNNs), which are a crucial type of neural network. However, vanishing- or exploding-gradient issues, which cannot resolve the long-term reliance problem, severely affect RNNs. Long Short-Term Memory (LSTM), a particular type of RNN, adds a gate mechanism and can prevent back-propagated errors from disappearing or blowing up
[23]. In contrast to other approaches, StemGNN
[26] uses a novel strategy to capture both inter-series correlations and temporal dependence simultaneously in the spectral domain.
Complex models built using Artificial Neural Networks (ANNs) and Deep Learning (DL) architectures typically struggle with issues relating to the need for large training data.
When considering the dynamic system of the Digital Twin and the Industrial Internet of Things Applications, temporal dependencies were considered to discover temporal patterns within the historical time series data based on lags and missing data. Industry 4.0 is the fourth industrial revolution, which is characterized by the integration of digital technologies into manufacturing and other industrial processes. Information serves as the vital foundation for the mass personalization concept, and cooperative, people-centered strategies form the fundamental elements for achieving a significant degree of sustainability
[27]. Sensor technologies play a crucial role in Industry 4.0 (Acme Corporation, San Francisco, CA, USA) by collecting data from the physical world in real time. These data can then be used to create Digital Twins, which are virtual representations of physical systems. However, environmental factors or inherent problems may cause sensors to be faulty
[28].
Some RNN, GRU, and LSTM models have been shown to be able to handle very high missing values and delays in time series data
[29]. This is because they are able to learn the underlying patterns in the data even when there is a lot of missing information.