Recursive Decomposition–Reconstruction–Ensemble Method with Complexity Traits

Recursive Decomposition–Reconstruction–Ensemble Method with Complexity Traits: Comparison

Please note this is a comparison between Version 1 by Fang Wang and Version 2 by Dean Liu.

The subject of oil price forecasting has obtained an incredible amount of interest from academics and policymakers in recent years due to the widespread impact that it has on various economic fields and markets. Thus, a novel method based on decomposition–reconstruction–ensemble for crude oil price forecasting is proposed. Based on the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) technique, in this paper we construct a recursive CEEMDAN decomposition–reconstruction–ensemble model considering the complexity traits of crude oil data was constructed. In this model, the steps of mode reconstruction, component prediction, and ensemble prediction are driven by complexity traits. For illustration and verification purposes, the West Texas Intermediate (WTI) and Brent crude oil spot prices are used as the sample data. The empirical result demonstrates that the proposed model has better prediction performance than the benchmark models.

oil price forecasting
complexity trait
component reconstruction
recursive CEEMDAN algorithm

1. Introduction

Crude oil, which is the world’s most important chemical raw material and strategic resource, ensures the normal operation of the national economy and people’s livelihoods, and it is a critical support for the development of the entire modern industrial society. Crude oil plays an important role in the global economy, political situation, and military strength of various countries as a basic energy source. As a result, changes in crude oil prices have sparked widespread concern worldwide. Because of the interactive impact of various factors such as the global economy, exchange rate changes, speculative behavior, and geopolitics, the oil price always exhibits non-linearity, non-stationarity, and high complexity, which poses significant challenges to crude oil price forecasting.

In the literature, various linear and nonlinear models have been used separately or in combination to make forecast (see, e.g., Buyuksahin & Ertekin [1]). Linear methods assume that a given time series is regular with no sudden movements. It becomes challenging because sudden movements with variation and extreme values are normal in many real-world time series such as financial data and renewable energy data (see, e.g., Xu et al. [2]). Numerous nonlinear time series prediction methods (see, e.g., Kantz & Schreiber [3]) have been proposed in the literature to capture these nonlinearities. Conventional linear methods can better approximate time series with no high volatility and multicollinearity. Zhang et al. [4] and Elman [5] show that nonlinear methods have the advantages when modeling a complex structure in time series with high accuracy. No universal model is suitable for all circumstances because each type of method outperforms others in different domains. Individually capturing general patterns in the time series data using only one linear or nonlinear model appears to be difficult (see, e.g., Khashei & Bijari [6]). To overcome this limitation, Taskaya & Casey [7] proposed hybrid techniques with both linear and nonlinear models. The hybrid methodology is a synthesis of various prediction methods. It is usually a combination of traditional econometric models and AI algorithms (see, e.g., Wang et al. [8]) or a combination of different econometric models or AI algorithms.

In addition to the hybrid methodology, the ensemble learning algorithm is an important paradigm to overcome the limitations of single methods. Both hybrid methodology and the ensemble method consider the shortcomings of single models. With the divide-and-conquer strategy (see, e.g., Yu et al. [9] and Dong et al. [10]), the decomposition–ensemble learning methods are an important branch of ensemble learning paradigms. Because it will take a lot of time to make individual prediction from all decomposed components, the number of decomposed components is necessarily reduced. Yu et al. [11] first proposed a decomposition–ensemble model with a reconstruction step that considered some data characteristics. Recently, Yu & Ma [12] introduced a memory-trait-driven reconstruction method into the decomposition and ensemble framework. Inspired by their work, a new model based on decomposition–ensemble learning with a reconstruction step that considers the data complexity traits is used to explore the price predictions of crude oil. In this model, all steps of mode reconstruction, component prediction, and ensemble prediction are driven by complexity traits.

2. Forecasting by Statistical Models

Statistical models, which are also known as random time series models, include exponential smoothing (ES) (see, e.g., Kourentzes et al. [13]), auto-regressive integrated moving average (ARIMA) model (see, e.g., Guo [14]), generalized auto-regressive conditional heteroskedasticity (GARCH) model (see, e.g., Zhang et al. [15]), hidden Markov model (HMM) (see, e.g., Isah & Bon [16]), and vectorial auto-regression (VAR) (see, e.g., Mirmirani & Li [17]). For example, Zolfaghari & Gholami [18] showed that ARIMA models had a good forecasting impact on international crude oil prices. To modify the mean and variance of the log returns of crude oil prices, Zhu et al. [19] introduced a hidden Markov model to obtain the behavior of random events and subjective factors for time series fluctuations. Using a VAR model, Drachal [20] applied the global economic policy uncertainty index, production, volatility index, and crude oil volatility to predict crude oil prices. Despite their simplicity and ease of implementation, these statistical models cannot directly process time series with nonlinear characteristics due to their linear correlation structure. Meanwhile, as the soft computing technology has advanced, many different intelligent algorithms have been developed and widely used in various data predictions. However, conventional statistical and econometric models are constrained by stringent theoretical assumptions, including linearity, stationarity, and dependence on specific distributional properties. As a result, these methods may encounter limitations in accurately forecasting wind power time series that are non-stationary, nonlinear, and characterized by complex dynamics.

3. Forecasting by Artificial Intelligence and Machine Learning Methods

A crucial presumption in the application of econometric models is that the time series data under study are a linear process. However, crude oil prices do not satisfy this requirement, which can result in less accurate forecasting outcomes. In contrast, various nonlinear intelligence and machine learning methods (e.g., the support vector machine (SVM) proposed by Yu et al. [21] and the extreme learning machine (ELM) proposed by Wang et al. [22]) have emerged to satisfy the requirements, and they can be applied to time series prediction tasks. Moreover, deep learning is gaining popularity in machine learning, since conventional machine learning techniques employ shallow structures. Recently, an artificial neural network (ANN) (see, e.g., Jammazi & Aloui [23]), a back-propagation neural network (BPNN) (see, e.g., Khashei & Bijari [6]), long short-term memory (LSTM) networks (see, e.g., Urolagin et al. [24]), and convolutional neural networks (CNNs) (see, e.g., Li et al. [25]) can implement time series with nonlinear characteristics and have high prediction precision. For example, Wang & Wang [26] created a crude oil price forecasting model that utilized a random Elman recurrent neural network, and the predictive power of the model was analyzed in comparison to other models. Yu et al. [27] incorporated the cutting-edge AI method of EELM into an ensemble model formulation to forecast crude oil prices, and findings showed that the suggested unique ensemble learning paradigm statistically outperformed all investigated benchmark models. However, these models have some drawbacks, including local minima, over-fitting, and a large sample size. While it has been demonstrated that ensemble models can outperform individual models, they are still susceptible to issues such as overfitting and being trapped in local extrema, which can limit their ability to generalize effectively.

4. Forecasting by Hybrid Models

To overcome the limitations of the aforementioned techniques, hybrid models have been proposed. It is not uncommon for researchers to employ a combination of econometric models and artificial intelligence algorithms or even a combination of econometric models and artificial intelligence algorithms. For example, Cheng et al. [28] predicted crude oil prices in 2018 using the vector error correction and nonlinear auto-regressive neural network (VEC-NAR) model. To enhance the technical indicator-based crude oil price forecasting, He et al. [29] implemented a unique hybrid forecast approach using scaled principal component analysis (s-PCA). In-sample and out-of-sample performance comparisons revealed that the s-PCA model was superior to the compared models. Wang & Fang [30] developed a novel combination of the FNN model and stochastic time effective function for crude oil prices forecasting, i.e., the WT-FNN model, and the findings revealed that the WT-FNN model had the best predictive impact. Zhang et al. [15] offered a novel hybrid technique to predict crude oil prices based on the least square support vector machine, particle swarm optimization, and GARCH model. The experimental findings demonstrated that this approach might accurately estimate crude oil prices. To predict crude oil prices accurately, Wang et al. [31] employed a Markov model to implement the GARCH-MIDAS model for both short-term and long-term state conversion, but they discovered that short-term predictions were more accurate. Like the hybrid approach, theour proposed decomposition–ensemble method also takes into account the shortcomings of single models. The biggest difference is that the ensemble learning employs several identical individual methods for ensemble prediction.

5. Forecasting by the Decomposition–Ensemble Learning Method

Recent studies have established a novel ensemble predicting approach called the decomposition ensemble to manage the challenge of forecasting nonlinear time-series data. Similar to the hybrid method, this approach considers the limitations of single models. Ensemble learning employs multiple identical single techniques for ensemble prediction, whereas the hybrid model employs multiple distinct single models for combination prediction. Oil price predictions typically rely on various significant studies. For example, Li et al. [25] and Li et al. [32] decomposed the monthly crude oil futures price data into multiple modes using VMD. Then, they forecast each mode using a SVM that was optimized by a genetic algorithm and a BPNN that was optimized by a genetic algorithm. Using the Akaike information criterion (AIC) to determine a reasonable lag, Ding [33] proposed a decomposition ensemble model using ensemble empirical mode decomposition (EEMD) for crude oil forecasting. Yu et al. [9] used empirical mode decomposition (EMD) to decompose crude oil prices and the feedforward neural network (FNN) to forecast the components. Zheng et al. [34] recently proposed a method combining an empirical mode decomposition algorithm, quadratic surface support vector regression, and the autoregressive integrated moving average method for the stock indices and future price forecasting. The study obtained better forecasting results than the direct forecasting model. However, the existing literature on constructing the decomposition–ensemble framework has some limitations. It primarily focuses on selecting decomposition–reconstruction–prediction–ensemble methods based on the characteristics of the model, rather than taking into account the characteristics of the data themselves. Therefore, the method proposed in this paper has the ability of selecting appropriate decomposition methods, reconstruction methods, prediction methods, and ensemble methods based on the specific traits of the data.