Decomposition for Multivariant Traffic Time Series: Comparison
Please note this is a comparison between Version 1 by Lei Zhang and Version 2 by Alfred Zheng.

Data-driven modeling methods have been widely used in many applications or studies of traffic systems with complexity and chaos. The empirical mode decomposition (EMD) family provides a lightweight analytical method for non-stationary and non-linear data.  A large amount of traffic data in practice are usually multidimensional, so the EMD family cannot be used directly for those data.

  • multivariate traffic
  • empirical mode decomposition

1. Introduction

Because of the complexity and chaos of traffic systems, it is hard to investigate the system state estimates or mode identification via the method of directly modeling. Thus, data-driven modeling methods have been widely used in many applications or studies concerning intelligent traffic, and the research regarding digital traffic has become a branch of intelligent traffic, which includes traffic feature analysis, traffic flow estimate, bus arrival estimate, traffic jam identification, traffic events and accident analysis. On the other hand, the development of artificial intelligence empowers intelligent traffic and brings huge benefits. As one of the three key elements of artificial intelligence, data play an important role in applications and research.
A large number of studies of traffic modeling via time series have taken place. Chin and Quddus [1] use the random-effect negative binomial (RENB) model to investigate the elements appropriate for maintaining safety in road intersections. Brijs, Karlis and Wets [2] and Quddus [3] used integer-valued autoregressive Poisson models to model data of accidents and investigate the relation of accidents and some specific factors. Commandeur, Bijleveld, Bergel-Hayat et al. [4] and Saar [5] used auto-regression and moving average model to investigate the correlation of traffic accidents and some other factors. For panel data analysis, F. Chen, Ma and S. Chen [6][7][6,7] introduced random-effect tobit models to investigate the relationship between traffic crashes and several factors such as traffic states, weather and surface conditions. Later, methods such as full Bayesian hierarchical approach and multivariate Poisson lognormal models were used to investigate traffic crash modeling and factors regarding traffic accidents [8][9][10][8,9,10]. In recent years, while regression models have still been widely used as the foundation of time series data analysis, some machine learning methods have been introduced to improve the performance. Tuli, Mitra and Crews [11] employed a random-effect negative binomial (RENB) model to investigate the demand for shared bicycles. Barroso, Albuquerque-Oliveira and Oliveira-Neto [12] introduced clustering methods to define traffic profiles and the daily traffic periods in trip analyses based on OD data. Chang, Huang, Chan et al. [13] introduced long-memory properties to investigate road fatality factors.
Traffic time series data contain multiple mode characteristics, so mode decomposition of such data is essential for better analysis and modeling. Commonly used modal decomposition methods for time-series data include discrete wavelet transform (DWT), empirical mode decomposition (EMD) and variational mode decomposition (VMD). EMD, in particular, allows for adaptive decomposition of data and efficient handling of nonlinear and non-smooth data without significant computational burden. EMD has been utilized in many traffic and transportation applications such as traffic data denoising [14][15][14,15], traffic infrastructure healthy monitoring [16], traffic flow evolving dynamic evaluation [17] and time variant detection [18], as well as prediction of section traffic flow [19], traffic speed [20] and metro passenger flow [21]. However, current empirical mode decomposition methods such as EMD, EEMD, CEEMD and CEEMDAN are not equipped to handle multivariate data directly. Since the traffic system generates many multivariate time series data, such as trajectory data, there is a pressing need to extend classical empirical mode decomposition methods to deal with multivariate time series.

2. Decomposition for Multivariant Traffic Time Series 

Original empirical mode decomposition [22] provides a method to extract intrinsic mode functions from non-stationary time series signals. The conditions which the IMF satisfies and the procedures to extract IMFs are closely related to the extremum and envelope function. However, for multi-dimension signals, common extremum and envelope functions do not exist. So, the original EMD cannot be applied directory to multi-dimension signals. Several researchers have proposed decomposition methods for complex-valued data. Tanka and Mandic [23] decomposed complex-valued data into positive frequency components and negative frequency components. A band-pass filter was used so that both positive and negative frequency components are analytic signals, which means the real part of those components contains complete information of the original signal. Then the classical EMD was used to extract IMFs. This method made clever use of the band-pass filter and the characteristic of analytic signal, but the two sets of IMFs from positive and negative components cannot be linked intuitively to the original signal. Another idea is to extend the definition of the envelope function or extremum point. Bin Altaf, Gautama, Tanaka et al. [24] proposed a new definition of extremum of complex-valued data series in which the extremum points were found according to whether the first derivative changes its sign. Then the complex-valued envelope functions and the average can be computed. This method decomposes the complex-valued signal directory without separating the signal into two parts so that the results are more intuitive. Rilling, Flandrin, Goncalves et al. [25] extend the “oscillation” in two dimensions to the “rotation” in three dimensions so the task is to decompose the rotation modes, such as “rapid” and “slower” rotation, of complex-valued signals (data series). The extremum points were defined as the tangent points to the top, bottom, left and right. Those points were linked by a cubic spline to be the envelope functions. This method, from the perspective of subsequent studies, uses a fixed projection to calculate the extrema points and the envelope which may miss the combined effect of multiple variables. Rehman and Mandic [26] proposed an extension of EMD for trivariate signals in which projection directions were introduced to find the extrema points and calculate envelope curves. To choose those directions, a sphere in signal space was built and multiple longitudinal lines were uniformly chosen on that sphere. Then a series of equidistant points on each longitudinal line was taken as the projection directions. Along those projection directions, maximum points of the input signal were found and the envelope curves was obtained by interpolating those maximum points (along each direction). After that, the two authors proposed an advanced method for n-variate signals in which the projection directions were chosen more uniformly [27]. For n-variate input, the low-discrepancy pointsets were used to generate uniform points on the n-1 sphere as projection directions. This method was widely used in many subsequent studies. However, the computational effort of this method is very high. Inspired by some studies of non-temporal multidimensional empirical mode decomposition. Thirumalaisamy and Ansell [28] proposed a fast and adaptive multivariate EMD method in which order statistics filter was used to take the place of classical spline interpretation so that the computational cost could be reduced and Delaunay triangulation and sparable filters were used to reduce the computational cost of projection calculation. However, even though the method proposed was an improvement, the algorithm of the Multivariate EMD was still a bit complicated. Fleureau, Kachenoura, Albera et al. [29][30][29,30] proposed a method to obtain a signal’s mean trend by interpolating barycenter which was computed from identified elementary oscillations. A D+1 dimension tangent vector of a D dimension signal was defined. The oscillation extremum defined in this method was the point where the norm of the “tangent” reaches the local minimum. Then, rather than calculating envelope curves to obtain the mean curve, the concept oscillation barycenter was introduced to calculate the mean curve directly. An oscillation barycenter was defined to be a point between two oscillation extremum points. The time coordination of one barycenter point was set to the intermediate moment of its adjacent two extreme points. The variate values of the barycenter were defined to be the average of the signal variate integrals between the two extreme points. The authors improved this method later by changing the calculation of the mean curve [31]. The envelope curves were reintroduced to calculate the mean curve. The even oscillation extremum and odd oscillation extremum were interpolated separately to obtain two envelope curves. This method extended the original EMD to a multidimensional signal. However, the extremum identification algorithm may obtain false extremum points from discrete time series, for the differences of the signals are not continuous. For example, a one-dimension signal [0.1, 0.5, 0.7, 0.9, 0.7, 0.1] has minimum norm of “tangent” at the third and fourth points but the fourth point is the extremum point.
Video Production Service