Test-Time Augmentation for Network Anomaly Detection

Test-Time Augmentation for Network Anomaly Detection: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Information Systems

Contributor:

Seffi Cohen

Niv Goldshlager

Bracha Shapira

Lior Rokach

Machine learning-based Network Intrusion Detection Systems (NIDS) are designed to protect networks by identifying anomalous behaviors or improper uses. In recent years, advanced attacks, such as those mimicking legitimate traffic, have been developed to avoid alerting such systems. Test-Time Augmentation for Network Anomaly Detection (TTANAD), which utilizes test-time augmentation to enhance anomaly detection from the data side.

NIDS
TTA
anomaly detection

1. Introduction

Network anomaly detection plays a crucial role in defending against a wide range of cyber attacks, as modern cyber threats become increasingly sophisticated and persistent in evading detection systems. Intrusion detection (ID) is the core element for network security [1]. The main objective of ID is to identify abnormal behaviors and attempts caused by intruders in the network and computer system [2]. Network Intrusion Detection Systems (NIDS) combine information from sensors that monitor different network points around the organization’s network. The sensors monitor the incoming and outgoing traffic and can collect informative network features such as packet payloads, IP addresses, ports, number of bytes transmitted, and other network flow characteristics [3]. NIDS can be broadly categorized into two main groups: Signature-based NIDS and Anomaly-based NIDS. Signature-based NIDS are static in that the detection methods rely solely on a fixed set called a knowledge database, which needs to be updated over time and requires more human effort and time [4]. On the other hand, Anomaly-based NIDS are dynamic because after the normal state of the network is learned, they can detect any irregular and anomalous events [5]. The learning involves creating a baseline profile representing normal network behavior based on historical network traffic or a malicious-free network traffic snapshot. As a result, anomaly-based NIDS are considered the most popular detection method because they can detect unknown attacks (zero-day attacks) [6]. In real-world cyberspace tasks, storing, transferring, and processing the huge amount of data captured by the sensors is a big issue [7]. Sampling techniques have been proposed in several works [8,9,10] in order to cope with this challenge. These techniques aim at taking a portion of the data that gives the same characteristics as the whole dataset. Brauckhoff et al. [11] detailed the complete processing chain from packet capture to the generation of anomaly detection and included temporal aggregation, which extracts statistics such as mean, standard deviation, etc., from the data that arrives during a time window with a length of T. Temporal aggregation is applied to achieve further data compression and to transform the traffic trace into the observation timescale of interest for anomaly detection [11].

Test-time augmentation (TTA) is an application of data augmentation techniques on the test set. TTA techniques generate multiple augmented copies for each test instance, predicting each of them and combining the results with the original instance’s prediction [12]. Intuitively, TTA produces different points of view at inference time, thus predicting the given test instance more robustly. Data augmentation can improve the model’s performance without changing its architecture. However, it requires more training resources since more training data are used [13]. TTA, on the other hand, is more efficient than data augmentation in the training phase because retraining the model is not required. Several studies, mostly from the vision domain, have used various test-time augmentation techniques in their work [14,15].

The TTA is commonly used in image classification tasks to improve the performance of machine learning models by augmenting the test data. It has been shown to provide a significant boost in the predictive performance of various machine learning models. However, no previous works have utilized TTA for network anomaly detection, primarily because TTA has been predominantly applied to image and text data. The lack of application of TTA in network anomaly detection presents an opportunity to explore the potential benefits of this technique for enhancing the performance of NIDS.

2. Network Anomaly Detection

Anomaly detection can be defined as identifying patterns in the data that do not conform to expected behavior in some context [16]. Anomaly detection modeling can be broadly categorized into several types of techniques: statistical methods, neighbor-based methods, and dimensionality-based methods [16]. In statistical methods, the low probability samples under the learned distribution will be considered as an anomaly. Neighbor-based methods assume that normal data has significantly more neighbors than anomalous data. Dimensionality reduction-based methods try to find an approximation of the data using a combination of attributes that capture the bulk of the variability in the data. Additionally, anomaly detection can be accomplished using reconstruction methods that reconstruct the input from latent space. The reconstruction error of anomalous instances will be higher as the model has been adapted to reconstruct only normal data [17]. Despite the progress in this field, detecting sophisticated attacks remains a significant challenge due to the evolving nature of threats and the increasing volume of network traffic. In our experiments, we used an Autoencoder as a reconstruction-based anomaly detector, an Isolation Forest as a statistical-based anomaly detector, and a Local Outlier Factor as a neighbor-based anomaly detector.

2.1. Autoencoder-Based Anomaly Detection

A method was proposed by Dau [18] that uses a replicator neural network, also referred to as an autoencoder, for anomaly detection. It can work in both single and multiple-class settings. The network is trained to reconstruct only “normal” observations, so it is assumed that normal samples should have low reconstruction error. Conversely, anomalous samples are expected to have higher reconstruction error because the network is not trained to replicate them. Autoencoders have been extensively studied for network intrusion detection (NID) [19,20,21,22,23,24]. However, a major weakness of autoencoder-based anomaly detectors is their struggle to identify anomalies in complex or noisy data accurately. This is because autoencoders aim to reproduce the input data closely. However, if the input data are complicated or noisy, the autoencoder may fail to capture the underlying patterns, failing to identify anomalies. Our proposed method, TTANAD, is designed to enhance the performance of various anomaly detection algorithms, including autoencoders, by providing additional perspectives on the test data through temporal augmentations.

2.2. Local Outlier Factor Anomaly Detection

The Local Outlier Factor (LOF) was proposed by Breunig [25] as an unsupervised anomaly detection technique that calculates the anomaly score based on the deviation of a data point’s local density compared to its neighbors. It classifies samples with significantly lower density than their neighbors as outliers. The method involves determining the local density of a sample using its k-nearest neighbors, and the LOF score of observation is calculated as the ratio of its k-nearest neighbors’ average local density to its own local density. Normal samples are expected to have a similar local density to their neighbors, while abnormal data are expected to have a much lower local density. LOF has been widely studied for network intrusion detection [25,26,27,28,29,30], but its internal density-based mechanism can make it less effective at detecting anomalies that are not well-separated from normal data points or are located in low-density regions of the data. Our proposed method addresses this weakness by providing augmented instances for each sample with different values. One of these augmented instances has a better chance of separating anomalies due to its feature. In addition to autoencoders, we also evaluate the effectiveness of our proposed method, TTANAD, by employing the LOF algorithm as one of the anomaly detectors in our experiments. This allows us to assess the performance improvements offered by TTANAD across different anomaly detection techniques.

2.3. Isolation Forest Anomaly Detection

The Isolation Forest method, introduced by Liu [31], is a technique for identifying anomalies by constructing decision trees. The method works by randomly selecting a feature and splitting the values of the selected feature, resulting in partitions. Anomalies are instances with short average path lengths on the trees as they are less common and require fewer splits to separate them from regular observations. Despite being widely used for Network Intrusion Detection [32,33,34,35,36], Isolation Forests are prone to be impacted by outliers and instances that significantly differ from the rest of the data, leading to possible false positive or false negative results. The use of TTA should improve robustness by providing more points of view for each instance. The isolation forest algorithm is another anomaly detector that we incorporate as part of our experiments, similar to autoencoders and LOF.

3. Test-Time Augmentation

Test-time augmentation is the process of producing several enhanced copies of each sample in the test set, applying a prediction for each, then returning an ensemble of those predictions. TTA was extensively shown to improve results in many domains, most notably the vision domain. In Alexnet [15] the authors also applied TTA by averaging the predictions on ten randomly cropped parts of the inference image. Cohen et al. [37] proposed Test-Time Augmentation for the tabular anomaly Detection technique, a TTA-based method to improve anomaly detection performance on all kinds of tabular data. Shanmugam et al. [12] determine the augmentations used in TTA by setting an appropriate weight for each augmentation created. Their method significantly outperforms existing approaches by focusing on the factors influencing TTA augmentation and finding the optimal weight per augmentation. A study by Cohen et al. [38] presented state-of-the-art results using TTA to predict Intensive Care Unit (ICU) survival. Although TTA has been successfully applied to images, text, and tabular data, its application to network anomaly detection has not been extensively explored.

This entry is adapted from the peer-reviewed paper 10.3390/e25050820

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.