1. Introduction
Machine learning (ML) approaches are changing our perceptions of the world and affecting every aspect of our lives in the age of technology, such as autopilot, facial recognition, and spam detection. A distinctive feature of ML is that instead of designing the solution with coding, the programmer creates a method to discover the key to a problem using samples of other issues that have been solved. ML techniques can produce satisfactory results in many situations since machine-generated features are typically more reliable and representative than hand-crafted features
[1]. Furthermore, ML procedures’ training and evaluation phases are generally constructed assuming they are executed in a secure environment
[1]. Therefore, the use of ML has expanded drastically, especially in
cybersecurity, depending on providing a secure environment for users and institutions. Furthermore, due to the efficiency of ML in automatically extracting useful information from massive databases, the use of ML in cybersecurity meets the development of cyberattacks
[2].
Various network attacks affect the users’ or institutions’ network systems, such as denial of service (DoS), distributed denial of service (DDoS), and SQL injection. Thus, cybersecurity specialists propose and utilize various types of defensive methods against network attacks, such as firewalls,
intrusion detection systems (IDS), and intrusion prevention systems (IPSs). Such defense methods are used to detect or deter unauthorized network attacks that may affect network users in a harmful way.
2. Intrusion Detection System Based on ML
In general, machine learning techniques can be divided into two categories
[3].
2.1. Supervised Machine Learning
Supervised learning depends on meaningful information in labeled data. The most common goal in supervised learning (and, therefore, in IDS) is classification. Nevertheless, manually labeling data is costly and time-intensive. As a result, the fundamental barrier to supervised learning is the lack of adequate labeled data.
2.2. Unsupervised Learning
Unsupervised learning recovers useful feature information from unlabeled data, making training material much more straightforward. On the other hand, unsupervised learning approaches often perform worse in terms of detection than supervised learning methods.
IDS is a type of computer security software that seeks to identify a wide range of security breaches, from attempted break-ins by outsiders to system penetrations by insiders
[2]. Furthermore, the essential functions of IDSs are to monitor hosts and networks, evaluate computer system activity, produce warnings, and react to abnormal behavior
[3]. Moreover, one of the significant constraints of typical intrusion detection systems (IDS) is filtering and decreasing false alarms
[4]. In addition, many IDSs improve their performance by utilizing neural networks (NN) for deep learning. Furthermore, deep neural network (
DNN)-based IDS systems have been created to improve tremendous data learning, processing, and a range of assaults for future prediction
[5].
Various machine learning techniques have been used to build intrusion detection models; the following paragraphs summarize the most commonly used techniques.
2.3. Artificial Neural Network (ANN)
An ANN is designed to function in the same way as human brains. An ANN comprises numerous hidden layers, an input layer, and an output layer. Units in neighboring strata are interconnected. Furthermore, it has an excellent fitting ability, particularly for nonlinear functions.
2.4. Deep Neural Network (DNN)
The parameters of a DNN are initially learned using unlabeled data in an unsupervised feature learning stage, and then the network is tweaked using labeled data in a supervised learning stage. The unsupervised feature learning step is mainly responsible for DNN’s remarkable performance.
Furthermore, DNN plays a crucial role in cybersecurity; therefore, DNN could understand the abstract, high-level properties of
APT assaults even if they use the most complex evasion strategies
[6].
2.5. Support Vector Machine (SVM)
In SVMs, the goal is to locate a hyperplane of maximum
margin separation in the n-dimensional feature space. Because a small number of support vectors control the separation hyperplane, SVMs can produce satisfactory results even with small-scale training data. SVMs, on the other hand, are susceptible to noise around the hyperplane. SVMs excel at solving linear problems. Kernel functions are commonly used with nonlinear data. The original nonlinear data can be split using a kernel function that transfers the original space into a new space. SVMs and other machine-learning algorithms are rife with kernel trickery.
2.6. Generative Adversarial Network (GAN)
A GAN model has two subnetworks, one for the generator and one for the discriminator. The generator’s goal is to create synthetic data that looks like actual data, whereas the discriminator’s goal is to tell the
difference between synthetic and natural data. As a result, the generator and discriminator complement each other
[3][7].
Furthermore, GANs are a trendy study area at present. They are being utilized to augment data in attack detection, which helps alleviate the problem of IDS dataset scarcity. GANs, on the other hand, are adversarial learning algorithms that can improve model detection accuracy by including adversarial samples in the training set.
A comprehensive survey of supervised and unsupervised learning techniques used in IDS can also be found in
[3].
3. Adversarial Machine Learning
In AML, an opponent tries to trick the system into selecting the incorrect course of action. In other words, it causes the ML model to misclassify the data, producing inaccurate results. The adversarial sample is a critical element of an adversarial attack. An input to an ML model that has been altered constitutes an adversarial sample. An adversarial sample is a single data point that, for a given dataset containing attributes x and a label y, leads a classifier to predict a different label on x’ from y even if x’ is almost identical to x. One of the various optimization techniques referred to as “adversarial attack techniques” is used to produce adversarial samples. To create adversarial samples, an optimization problem must be solved to identify the minimal
perturbation that maximizes loss for the neural network.
The adversary’s optimization objective is to calculate a perturbation with a tiny norm that would change the classifier’s output. Where the disturbance is
δ[8].
3.1. Adversarial Game-Theoretic
Adversarial learning is a type of ML in which two entities, the learner and the adversary, try to develop a prediction mechanism for data relevant to a specific problem but with various goals. The goal of learning the prediction mechanism is for the learner to predict or classify the data accurately. On the other hand, the adversary’s goal is to force the learner to
make inaccurate predictions about inputs in the future
[9][10].
Game theory is an attractive technique for adversarial learning because it allows the mathematical representation of the behavior of the learner and the adversary in terms of defensive and attack methods, as well as figuring out how to reduce the learner’s loss from adversarial examples
[10].
In general, the exciting objective of this game theory is to figure out how to achieve equilibrium between the two players (classifier and adversarial).
In contrast, for example, is an IDS-based ML classifier that classifies traffic as “benign” or “malicious.” On the other hand, the attacker creates adversarial samples to influence the IDS’s accuracy, allowing it to misinterpret benign traffic as malicious and vice versa. Moreover, AML, based on a game theory perspective, is divided into two categories: simultaneous and sequential games.
In the simultaneous game, each player selects his or her approach without knowing what the other player seems to be doing. In the other game, one player takes on the role of leader and decides on a plan before the other players, who then play optimally against the leader’s approach. In summary, the attacker
will know what affects the model most based on recognizing the model first, i.e., manipulating the features. This type can be divided into Stackelberg games, where the adversary acts as a leader. In this game, the classifier is the follower; for example, the IDS classifier will follow the leader, which is the adversary. The classifier (follower) attempts to discover the adversary features to enhance the ability to discover the adversarial methods.
In addition to Stackelberg games, the other category is a learner as a leader. In this game, the adversary is the follower; for example, in the ML-based IDS, the adversary will follow the IDS classifier to discover its strategies to craft a suitable adversarial sample that could affect the ML-based IDS model.
In the Sequential game theory, the attacker attempts to learn about the model before crafting his attacks, which is an analogy to a white-box attack since it is based on the attacker’s knowledge of the model
[10].
3.2. Adversarial Threat Model
3.2.1. Adversary Capabilities
If an attacker has direct or physical access to the defense system, any cybersecurity protection can ultimately be defeated
[11]. Thus, five particular
things are under the attacker’s control to implement adversarial attacks against the ML/DL model
[12]:
Training Data:
It denotes the availability of the dataset used to train the ML models. It can be read-only, write-only, or completely inaccessible.
Feature Set:
It seeks to understand the features of the ML models used to carry out its detection. It might take the shape of complete, limited, or no knowledge.
Moreover, it is worth noting that the size of an ML model’s feature set can be abused as an attack surface. The fact that an enemy can alter any feature analyzed by a model is a significant challenge
[13].
The authors of
[14] pointed out that large feature sets have more features, giving an adversary more opportunities to manipulate them. As a result, larger feature sets may be perturbed more quickly than smaller feature sets, which have fewer modifiable features and require more perturbation.
Detection Model:
The trained ML model included in the IDS and utilized to carry out the detection was described in detail. There may be zero, some, or all of this knowledge.
However, the detection model (ML-NIDS) demands high administrative rights. In other words, it can be allowed for a small number of carefully chosen devices
[15]. Therefore, assuming that an infected host will grant the attacker access to the NIDS that holds its detection model is unreasonable
[11][16]. Therefore, it is difficult for an attacker to access the detection models.
Oracle:
This component indicates the potential for receiving feedback from an attacker’s input to the ML output. This input may be small, limitless, or nonexistent.
Depth Manipulation:
It represents adversary manipulation that may change the traffic volume or one or more features in the examined feature space.
As a result, these capabilities clarify that implementing the black-box attack is within reach. It is expected that it will not have the same impact as the white-box attack. Indeed, it is considered a weak attack
[17].
3.2.2. Adversaries Challenges
The authors of
[18] mentioned three challenges faced by creating adversarial instances:
Some adversarial attacks are only suited for specific ML or DL models, which means they do not fit other models.
- b.
-
Regulating the Size of the Perturbation
The adversary’s size should not be too small or too large since this would impact their actual purpose.
- c.
-
Maintaining Adversarial Stability in Real-World Systems
Certain adversarial instances cannot be transformed, such as blurring.
3.2.3. Potential Threats
- a.
-
Security Threats
The following security threats in ML can be classified as adversarial attacks based on their intent to attack the ML/DL model
[19][20]:
There are two sorts of influence attacks: (1) causative, which seeks to gain control over training data, and (2) exploratory, which exploits the ML model’s misclassification without interfering with the model’s training.
- c.
-
Adversaries’ Goals in Network Security
In this situation, the CIA triad and privacy were used because they are more appropriate for hostile categorization of the enemy’s aims in the network security sector
[8].
- i.
-
Confidentiality
This attack aims to acquire confidential information shared by two parties, A and B, by intercepting
communication between them. This occurs in the context of AML, in which network security tasks are performed using ML algorithms.
- ii.
-
Integrity
This attack aims to affect the ML model and lead it to misclassification by implementing malicious activities without interfering with regular system functions, but the attacker chooses the model’s output, increasing the false-negative rate. This includes a poisoning attack that affects the training data.
- iii.
-
Availability
During operations, the adversary compromises the system’s functioning to deny service to users. One method is to increase the misclassification or dramatically alter the model’s output to decrease its performance and cause it to crash, increasing the false-positive rate.
- iv.
-
Privacy
The adversary attempts to acquire sensitive user data and essential knowledge about the model architecture from the ML model. For example, equation-solving attacks
[21] may be used against cloud services that offer ML through APIs and models such as multilayer perceptrons, binary logistic regression, and multiclass logistic regression. The attacker should be able to learn about the model and its architecture as a result of these attempts. Moreover, privacy attacks can be categorized as “model inversion attacks” and “membership inference attacks.” Furthermore, inversion attacks are divided into two categories. The first attack uses the person’s unique label generated by a facial recognition system to rebuild a face image. The second assault can obtain the victim’s identity by extracting a clean image from a blurred image by attacking the system
[22][23].
The membership inference attack has access to the model as a black-box attack (this attack will be presented in the following section). It is determined only if a data point is part of the training data for a learning model. Furthermore, in
[24], the authors were able to create membership inference attacks against the Google prediction API and Amazon ML to identify whether a data point belonged to the training set.
- d.
-
Attack Specificity
An attack’s specificity may be described in two ways. First: targeted assault, this attack is aimed at a single input sample or a group of samples, and an adversary attempts to impersonate an authenticated person in a facial recognition/biometric system
[25]. Second: not-targeted attacks, the ML model in this attack fails randomly. Additionally, nontargeted assaults are more straightforward to execute than targeted attacks because they offer more options and room to reroute the output
[20].
- e.
-
Adversarial Attacks
Attacks based on the Level of Knowledge
Adversarial samples can be created using a variety of approaches. These approaches vary in complexity, speed of creation, and performance. Manual perturbation of the input data points is a crude method of creating such samples. On the other hand, manual perturbations of massive datasets are time-consuming to create and may be less precise. Automatically assessing and finding characteristics that best differentiate between target values is one of the more complex ways. These characteristics are discretely disturbed to reflect values comparable to those representing target values different from their own
[10]. Moreover, adversaries may fully understand the ML system or have a limited understanding.
- i.
-
White-Box Attack
This is frequently the case when the ML model is open source, and everyone has access. Thus, in this attack, there is a thorough understanding of the network architecture and the parameters that resulted from training. Furthermore, four of the most well-known white-box assaults for autonomously creating perturbed samples are
[22]:
The fast gradient sign method (FGSM) was proposed by
[26] as a fast method for producing adversarial samples. At each pixel, they perform a one-step gradient update in the direction of the gradient sign. However, this attack includes changing the value of each feature in the input concerning the neural networks. Its focus is on rapidly generating adversarial samples; therefore, it is not regarded as a powerful assault
[27].
The authors in
[28] devised a Jacobian-based saliency map attack (JSMA), which is an excellent saliency adversarial map under L0 distance. The most influential characteristics are used when modest input variances cause substantial output changes.
Carlini and Wagner
[25] devised a tailored approach to avoid defensive distillation. Most hostile detection defenses are vulnerable to CW attacks. The details of this attack can be found in
[22][29].
Moosavi-Dezfooli suggested DeepFool in
[30] to discover the shortest distance between the original input and the judgment boundary of adversarial cases.
-
Basic Iterative Method (
BIM)
This method is responsible for carrying out gradient calculations repeatedly in small steps; it expands the FGSM. To prevent significant changes in traffic characteristics, the value of the perturbation is trimmed
[31].
Furthermore, an experimental study presented these methods utilized in crafting adversaries in detail; it can be found in
[32]. Generally, when deciding on an adversarial attack, the authors in
[33] stated there is a trade-off. JSMA, for example, uses more computing resources than FGSM but alters fewer features. In addition, DeepFool-based techniques can be considered potent adversaries
[34].
- ii.
-
Black-Box Attack
In this attack, assume no prior knowledge of the paradigm and analyze the paradigm’s vulnerability using information from the settings or previous inputs.
Furthermore, two ways are utilized to learn more about the classification algorithm. Firstly, the attacker might alter the malicious samples several times until they are misclassified to identify the model’s parameters to differentiate between malware and benign samples. The attacker can also create a substitute model of the detection system and then use the transferability aspect of ML to create adversarial samples that fool both the substitute classifier and the actual detector
[12][35].
Furthermore, the authors in
[36] implement a black box attack against an ML model by developing a different model to take the place of the target ML model. Thus, the substitute model crafts adversarial samples based on understanding the substitute model and the migration of adversarial samples. Moreover, black-box attacks include
[37][38]:
ZOO does not compute the gradient directly. Instead, ZOO used the
symmetric difference quotient approach to estimate the gradient, which resulted in a higher computing cost. To estimate the gradient, knowledge of the structure of the DNN network is not necessary
[22].
This attack deceives a DNN without understanding its network topology by modifying the value of only one pixel of a clear picture. DNN is vulnerable to very-low-dimension attacks with minimal information
[22].
- iii.
-
Gray-Box Attack
To grow from the black box to the white box, the adversary undergoes an iterative learning process that uses inference processes to gather additional understanding of the model. Thus, it may have partial knowledge of the model.
When knowledge is restricted, such as in gray-box and black-box scenarios, privacy attacks might be performed to learn more about the targeted ML classifier
[39].
- f.
-
Evasion vs. Poisoning Attacks
Avoid the system by injecting adversarial samples, which do not affect the training data
[40]. Thus, the objective is to misclassify malware samples as benign while the model operates
[12][41]. Furthermore, evasion attacks can be classified as (1) error-generic evasion attacks, in which the attacker is interested in deceiving classification regardless of what the classifier predicts as the output class. (2) error-specific evasion attacks; the attacker seeks to deceive classification but misclassifies the adversarial samples as a specific class
[39].
In cybersecurity, adversarial assaults are created using a thorough grasp of computer systems and their security rules. One of the six types of attacks against intrusion detection systems is poisoning
[42]. Thus, in this attack, the enemy seeks to contaminate the training data by introducing precisely planned samples, ultimately jeopardizing the learning process
[40]. Furthermore, poisoning attacks can be classified as (1) error-generic poisoning attacks, in which the attacker attempts to cause a denial of service by causing as many classification errors as possible. (2) Error-specific poisoning attacks: in this situation, the attacker’s goal is to produce certain misclassifications.
4. Machine Learning Adversaries against IDS
4.1. White-Box Attacks against IDS
4.1.1. A White-Box Attack against MLP
In
[43], the authors presented an application of an evasion attack in a white-box setting by using a Jacobian-based saliency map attack (JSMA) against an MLP (multilayer perceptron) model, which is considered an IDS-ML by using two different datasets: CICIDS
[44] and TRAbID
[45] to classify network traffic. Furthermore, the adversary creates hostile samples with minor differences from the actual testing samples to deceive the MLP model during testing and to prove that the attackers can exploit the vulnerabilities to escape intrusion detection systems and misclassification. Moreover, the MLP model achieved an accuracy of almost 99.5% and 99.8% in detecting malware intrusions. Despite this success, the precision was down by about 22.52% and 29.87% for CICIDS and TRAbID, respectively, after applying an evasion attack.
4.1.2. A White-Box Attack against DNN
The research in
[46] crafted adversarial attacks against DNN-based intrusion detection systems to evaluate the robustness of the DNN against these attacks. Furthermore, the authors used SDL-KDD datasets containing standard and Five-categorized attack samples. Moreover, comparing the DNN performance in classification with SVM resulted in similar performances. As a result, the authors concluded that the attacks designed by FGSM and projected gradient descent (PGD) could notably affect the DNN model.
4.1.3. A White-Box Attack against IDS in Industrial Controlling Systems (ICS)
The authors of
[47] presented experimental research about adversarial attacks against IDS in
industrial control systems (ICS). First, they crafted adversarial samples by using a Jacobian-based saliency map. Second, they evaluated the IDS (Random Forest, J48) after exposure to these adversarial samples. Finally, they suggested some solutions for enhancing the robustness of IDSs against adversarial attacks in the practical module of adversarial attacks against IDS methods.
Additionally, this attack was crafted by insiders, for instance, administrators. As a result, the attacker was already aware of the classifier system. According to the authors, the random forest and J48 performance had decreased. In addition, the J48 achieved a lower level of robustness than the random forest model, which achieved a high level of robustness in facing these adversarial attacks. The authors applied these adversarial attacks on two IDSs, which may not affect other MLs. The authors recommend that these attacks be used on other IDS-ML systems.
4.1.4. Monte Carlo (MC)
Researchers in
[48] simulated a white-box assault named a Monte Carlo (MC) simulation for the random generation of adversarial samples and compared their samples to several machine-learning models to clarify performance across a wide range of platforms and detect the vulnerability in NIDS, which assists the organizations in protecting their networks. In addition, the researchers used the NSL-KDD and UNSW-NB15 datasets for evaluation. Then, they confirmed that these two techniques could deceive 11 ML models. Based on their findings, the MLP model has the best accuracy under adversarial attacks with an 83.27% classification rate, followed by the BAG at 80.20% and the LDA at 79.68% with the NSL-KDD dataset. In particular, attackers with knowledge of a target network NIDS may use the most efficient perturbation process to attack that network.
4.2. Black-Box Attacks against IDS
4.2.1. FGSM, PGD, and CW Attacks against GAN
The research in
[49] presented the contribution of generated adversarial network (GAN) attacks against black-box IDS. GAN’s main contribution is misclassifying between actual and adversarial samples. According to the authors, a GAN attack achieved a high rate of compromise and misclassification on IDS. Moreover, they evaluated the research using the NSL-KDD
[50] dataset. Moreover, the experiments resulted in a higher rate of GAN attacks: about 87.18% against NB compared to other attack algorithms.
4.2.2. Deceiving GAN by Using FSGM
A research
paper [51] proposed crafting adversarial attacks to deceive network intrusion detection systems. They trained the GAN classifier and made it robust against adversarial attacks.
Furthermore, the GAN discriminator was evaluated for distinguishing the samples generated from the generator and classifying them as “attack” or “non-attack.” Then, GAN was deceived by using the fast-sign gradient method (FSGM). As a result, the GAN classifier had been deceived by adversarial attacks and misclassified the “attack” samples as “non-attack.” As a limitation, the authors did not mention how to address these attacks and defend against them.
4.2.3. A Black-Box Attack Using GAN
The authors in
[52] crafted black-box attacks using GAN to improve the performance of IDS in detecting adversarial attacks. This experimental research used the KDD99 dataset. Furthermore, they trained the IDS models to detect all kinds of attacks by using GAN since IDSs have difficulty facing new attacks. Moreover, they compared the IDS’s performance before the attacks, during the attacks, and after the GAN training. As a result, the GAN training increased the performance of the IDS. As a limitation, this GAN training worked only for IDSs and did not evolve for other networks.
4.2.4. IDSGAN
Researchers in
[53] devised a black-box attack against an IDS to evade detection. The model’s objective is to provide malicious feature records of the attack traffic that can trick and evade defensive system detection and, ultimately, direct the evasion assault in real networks. The IDSGAN evaluation demonstrated its effectiveness in producing adversarial harmful traffic records of various attacks, effectively lowering the detection rates of various IDS models to near zero.
4.2.5. DIGFuPAS
The DIGFuPAS module was presented in
[54] that crafted adversarial samples using a Wasserstein GAN (WGAN) attack to deceive an IDS in SDN (software-defined networks) in a black-box manner. In addition, they compared nine ML/DL algorithms using two datasets: NSL-KDD and CICIDS-2018. If the detection capability of IDS in SDN deteriorates, they propose adding DIGFuPAS. More specifically, DIGFuPAS-generated assaults were used to repeatedly train the IDS to tackle new threats in SDN-enabled networks preemptively and to evaluate the resilience of the IDS against altered attack types.
4.2.6. Anti-Intrusion Detection AutoEncoder (AIDAE)
A research work
[55] presented a novel scheme named anti-intrusion detection autoencoder (AIDAE) for adversarial features to deceive an IDS by using GAN.
4.2.7. DDoS Attack by Using GAN
To highlight the IDS vulnerabilities, the authors of
[56] devised a DDoS attack using GAN to fool the IDS and determine the robustness of the IDS in detecting DDoS attacks. Additionally, they improved the training of the IDS for defense. Thus, the authors conducted their experiment in three stages. First, they deceived the black-box IDS by generating adversarial data. Then, they trained the IDS with the adversarial data. Finally, they created adversarial data in order to deceive the IDS.