KPI Anomaly Detection | Encyclopedia MDPI

KPI Anomaly Detection: Comparison

Please note this is a comparison between Version 1 by Liangyin Chen and Version 2 by Vivi Li.

Anomaly detection is the foundation of intelligent operation and maintenance (O&M), and detection objects are evaluated by key performance indicators (KPIs). For almost all computer O&M systems, KPIs are usually the machine-level operating data. Moreover, these high-frequency KPIs show a non-Gaussian distribution and are hard to model, i.e., they are intricate KPI profiles. However, existing anomaly detection techniques are incapable of adapting to intricate KPI profiles. In order to enhance the performance under intricate KPI profiles, this study presents a seasonal adaptive KPI anomaly detection algorithm ASAD (Adaptive Seasonality Anomaly Detection) was presented.

KPI anomaly detection
intricate KPI profiles
adaptive seasonality anomaly detection

1. Introduction

Computer operation and maintenance is always a vital component in guaranteeing the high availability of the application systems. Operation and maintenance must evolve from manual detection to intelligent detection with the explosive increase in the volume of application data. According to Gartner’s report, more than 40% of global enterprises have replaced their outdated O&M systems with intelligent solutions as of 2020. In these intelligent systems, anomaly detection is critical to detect important performance indicators (KPIs) such as CPU utilization, memory utilization and so on. To ensure a stable and reliable O&M system, a rising number of researchers are investigating KPI anomaly detection methods ^[1][2][1,2].

Traditional statistics, supervised learning and unsupervised learning algorithms are the three types of KPI anomaly detection techniques. First, seasonal length is required as an input parameter by traditional statistical approaches such as Argus [3] and TSD [4], but it is frequently given manually. It may cause seasonality to be disrupted in intricate KPI profiles, leading to erroneous anomaly detection. Secondly, supervised learning algorithms such as Opperence [5] and EGADS [6] relied on classical statistical techniques, and they also did not recognize seasonal length under intricate KPI profiles. Finally, among unsupervised learning methods, Zhao, N. [7] developed a periodic adjustable approach called Period. This papentryr considers time series data to be related to daily human activities, and it directly assumed that the basic seasonal length of time-series data is 1 day. However, KPI time series data containing intricate KPI profiles are very common, the seasonal length in these non-Gaussian distributed data is difficult to estimate [8]. In general, there are three key challenges to overcome. To begin, precise seasonal characteristics are hard to extract from the intricate KPI time series data. Second, due to the long sub-sequence length, the clustering process will take too much time. Third, noise and anomalies in the KPI time series data could also result in bad sub-sequence clustering results. Facing the above problems, existing KPI anomaly detection algorithms cannot obtain good performance under intricate KPI profiles.

2. Background

KPI: Key Performance Indicator (KPI) consists of many background system metrics including CPU utilization, memory utilization, network throughput, system response time and so on. Above types of KPI time series data can cover the main information from hardware to software, and reflect the status of the entire system from the bottom up. In brief, it is the focus of the operation and maintenance system.
Intricate KPI Profiles: In KPI time series data, time is the independent variable and KPI value is the dependent variable. The shape of the KPI time series data graph is known as the KPI profile. For most operation and maintenance systems, as time passes, the KPI profile will take on new forms, i.e., the graph of KPI time series data usually contains many KPI profiles. In ouresearchers' work, the situation where many types of KPI profiles exist in KPI time series data graph is referred to as intricate KPI profiles.

Liu, D. et al.
[
5
]
2015	supervised learning
Laptev, N. et al. [6]	2015	supervised learning
Zhou et al. [13]	2019	ensemble learning
Himeur et al. [14]	2020	deep neural network
Himeur et al. [

[24] firstly discussed the anomaly detection in building energy consumption. It comprehensively introduced a method to classify existing algorithms based on different factors, such as the machine learning algorithm, feature extraction approach, detection level, computing platform, application scenario and privacy preservation. Then they introduced a new solution [14] to detect energy consumption anomalies. Besides micro-moment features extraction, they developed a deep neural network architecture for efficient abnormality detection and classification. In 2021, they also used the autoencoder and micro-moment analysis to detect abnormal energy usage [15]. To provide an explainable model, Deng et al. [16] propose a novel Graph Deviation Network (GDN) approach. It can learn a graph of relationships between sensors, and detects deviations from these patterns. Similarly, Chen et al. [17] presented a new framework for multivariate time series anomaly detection (GTA) that involves automatically learning a graph structure, graph convolution and modeling temporal dependency using a Transformer-based architecture. Recently, Zhou et al. [18] put forward an anomaly detection framework. Firstly, this captures more detailed data regarding the time series’ shape and morphology characteristics. Then, it utilizes interval representation to realize data visualization and mine the internal relationships. However, these supervised methods are unable to adapt to intricate KPI profiles due to the inherent lack of labeled anomalies in historical data. For unsupervised learning algorithms, in 2018 Xu, H. et al. [19] studied application monitoring problems based on VAE. In 2021, Thill et al. [25] designed a novel unsupervised temporal autoencoder architecture based on convolutional neural networks (TCN-AE). It can utilize the information from different time scales in the anomaly detection process. Then, Himeur et al. [20] developed two different schemes to detect abnormalities in energy consumption. These are an unsupervised abnormality detection based on one-class support vector machine (UAD-OCSVM) and a supervised abnormality detection based on micro-moments (SAD-M2). In the same year, Li et al. [21] proposed a clustering-based approach to detect anomalies concerning the amplitude and the shape of multivariate time series. They generate a set of multivariate subsequences by setting the sliding window. To improve the detection efficiency, Li et al. [22] proposed FluxEV, a fast and effective unsupervised anomaly detection framework. It can extract appropriate features to indicate the degree of abnormality, and make the features of anomalies as extreme as possible. Recently, Carmona et al. [23] presented a framework Neural Contextual Anomaly Detection (NCAD) that scales seamlessly from the unsupervised to supervised setting. It is a window-based approach which can facilitate learning the boundary between normal and anomalous classes by injecting generic synthetic anomalies into the available data. Moreover, it adopted the moments method to speed up the parameter estimation in the automatic thresholding. Although they achieved good performance, the defect is also unsuitable for intricate KPI profiles due to lack of significant seasonality in original data. In order to solve the above problem, Zhao, N. et al. [7] devised a periodic adaptable algorithm Period, to enhance the accuracy of KPI anomaly detection. The authors of this work assumed that the intricate KPI profiles had a 1-day seasonal length and split the KPI data. However, this strategy is not universal, because not all intricate KPI profiles have a 1-day seasonal length. In fact, KPIs may show distinct patterns in different time intervals, which are referred to as KPI profiles, such as weekly, quarterly or other imperfect or complex periodicity. To deal with the situation described above, a new algorithm to recognize intricate KPI profiles with uncertain seasonal lengths must be developed. Therefore, this enstrudy proposes an adaptive seasonality anomaly detection algorithm under intricate KPI profiles. The notations list of researchers'our research is shown in Table 2.

Table 2.

Notations List.

Notation	Description
S	seasonal component of KPI time series data
$S_{i}$	ith sample seasonal component from S
$s u m_{i}$	sum of sample points for ith sample sequence
$P O_{a l l}$	set of powers derived from the periodogram
$p o_{i}$	ith power from $P O_{a l l}$
$p e r i o d_{i}$	ith element in the candidate period set
15
]	2021	deep neural network
Deng et al. [16]	2021	graph deviation network
Chen et al. [17]	2021	transformer-based architecture
Zhou et al. [18]	2021	federated learning
Xu, H. et al. [19]	2018	unsupervised VAE

KPI Anomaly: KPI anomalies are data that do not meet expectations in KPI time series data
^[5]^[9][5,9]. Anomalies in KPIs are usually a sign that something is wrong with the system. For example, the system’s CPU utilization remains excessively high, indicating that the number of computing tasks executed by the system exceeds the typical level, posing a crash risk. Early detection of KPI deviations can aid in the diagnosis and analysis of issues.
Seasonality of Time Series Data: When time series data vary with seasonal influences, they are said to have seasonality [10]. For example, if time series data frequently exhibit fixed characteristics in a certain time interval, this can indicate that the data are seasonal. The seasonal length is the time between repetitions, and it occurs at an observed or predicted period.

3. KPI Anomaly Detection Algorithm

Existing KPI anomaly detection algorithms shown in Table 1 are divided into three categories, including traditional statistical, supervised learning and unsupervised learning algorithms [11]. For traditional statistical algorithms, Yaacob, A.H. et al. [12] studied the problem of network attack detection based on ARIMA in 2010. In 2012, Yan, H. et al. [3] developed the end-to-end service quality evaluation problem based on Holt–Winter. In 2013, Chen, Y. et al. [4] studied the view of web search response time based on TSD. The disadvantage of them is that they all need to input seasonal fitting parameters and cannot adapt to intricate KPI profiles.

Table 1.

KPI Anomaly Detection Algorithms.

Name	Time	Type
Yaacob, A.H. et al. [12]	2010	traditional statistical
Yan, H. et al. [3]	2012	traditional statistical
Chen, Y. et al. [4]	2013	traditional statistical
$p e r i o d_{S^{'}}$
$p e r i o d_{S^{'}}$
$p e r i o d_{S^{'}}$	period of scaled seasonal component
$S^{'}$	$S^{'}$
U	discrete cosine feature matrix
D	matrix after discrete cosine transform
Himeur et al. [20]	2021	unsupervised temporal autoencoder
Li et al. [21
Z	standard matrix for discrete cosine transform	]	2021	unsupervised learning
Li et al. [
Q	principal information of matrix D	22]	2021	fast unsupervised learning
Carmona et al. [23]	2021	unsupervised learning

For supervised learning algorithms, in 2015 Liu, D. et al. [5] proposed Opperence based on traditional statistical algorithms to solve the problems of service quality monitoring and performance anomaly detection. In the same year, Laptev N et al. [6] presented system anomaly monitoring based on traditional KPI anomaly detection methods. In 2019, Zhou et al. [13] designed an ensemble learning scheme based on extreme learning machine (ELM) algorithm and majority voting method to detect abnormal electricity consumption. In 2020, Himeur et al.