Use of Self-Supervised Learning against DDoS Attacks: Comparison
Please note this is a comparison between Version 1 by Josue Genaro Almaraz-Rivera and Version 2 by Sirius Huang.

The Internet of Things (IoT), projected to exceed 30 billion active device connections globally by 2025, presents an expansive attack surface. The frequent collection and dissemination of confidential data on these devices exposes them to significant security risks, including user information theft and denial-of-service attacks.

  • computer vision
  • contrastive learning
  • DDoS attacks
  • deep learning
  • Intrusion Detection System
  • IoT networks
  • self-supervised learning

1. Introduction

The Internet of Things (IoT) encompasses a broad range of applications, spanning from smart homes to smart cities. It embodies the integration of physical objects, such as wireless healthcare devices, agricultural irrigation systems, and smart grid electric panels, with internet connectivity [1]. The global count of IoT connections is projected to exceed 30 billion by 2025 (IoT active device connections worldwide from 2010 to 2025. https://www.statista.com/statistics/1101442/iot-number-of-connected-devices-worldwide/, accessed on 10 July 2023), thereby amplifying the attack surface susceptible to security breaches. These breaches primarily include denial-of-service attacks (DoS and DDoS) [2], as well as unauthorized data extraction, given the frequent collection and exchange of confidential data by IoT devices [3].
Given the impulse that 5G networks [4] and Software-Defined Networking (SDN) [5] allow for IoT expansion [6], Artificial Intelligence (AI) has been used as a crucial tool for the development of Intrusion Detection (IDS) and Prevention Systems (IPS) [7][8][7,8]. These AI-empowered systems scrutinize traffic within a host or a network, trigger alerts, and counter potential threats in real time. However, achieving the anticipated high detection rates requires datasets that reflect contemporary attack scenarios and network traffic patterns.
Indeed, the scarcity of recent and robust data collections has been identified as a significant gap in contemporary research [9][10][11][9,10,11]. Given the heterogeneous and resource-constrained nature of IoT devices [12], popular datasets like CIC-IDS2017 [13] may not be apt for training necessary machine learning (ML) and deep learning (DL) models, mainly due to the lack of IoT devices in their testbeds. Consequently, alternative datasets have been proposed in the literature, including, but not limited to, Bot-IoT [14], TON_IoT [15], CIC IoT [16], and LATAM-DDoS-IoT [17].
Nonetheless, while the availability of the aforementioned datasets addresses the issue of suitable data quality for the IoT, the success of AI-based IDSs and IPSs is also contingent upon the chosen training strategy. Supervised learning necessitates copious amounts of labeled data to construct predictive models. In contrast, unsupervised learning does not require such ground truth information, but it presents challenges with generalization [18], specifically its limited ability to adapt to unseen, related data. This becomes especially relevant in the face of a rapidly evolving threat landscape, with new types of attacks emerging daily.
Self-Supervised Learning (S-SL) is a promising solution to challenges such as the demand for vast amounts of manually labeled data and the imperative for robust generalization [19]. In fact, S-SL is also considered suitable for dealing with the problems of small and imbalanced datasets [20]. This innovative approach bridges supervised and unsupervised learning. Initially, a model undergoes pre-training without labels, employing either auxiliary pretext tasks or contrastive learning, with the objective of capturing latent representations of the knowledge domain. Subsequently, this pre-trained model is fine-tuned using labeled data for specific downstream tasks, like attack detection or malware family classification [21]. Even though labeled information is still required for this later phase, Few-Shot Learning (FSL) [22] has demonstrated to be enough to obtain strong performance [23]. FSL targets obtaining strong learning performance given a limited number of labeled samples in the training set [22].
S-SL stands as a promising direction for ML advancements [24]. Today’s landscape features a plethora of models adept at leveraging this pioneering training methodology to extract insights from vast amounts of unlabeled data. Examples include Barlow Twins [25], SimCLR [26], Vision Transformers [27], Bootstrap Your Own Latent (BYOL) [28], and Momentum Contrast (MoCo) [29].
Contrastive learning, a training strategy for S-SL, aims to draw similar (or positive) examples closer while distancing dissimilar (or negative) examples [30]. This method capitalizes on data augmentation techniques to learn robust feature representations.

2. Detection of DDoS Attacks against IoT Networks

This section presents related research on the creation of synthetic images for DDoS attack detection against IoT networks, as well as the learning strategies used to train the corresponding AI-based IDSs/IPSs. While the existing literature reports various detection techniques, including those that analyze network traffic at the flow-level through recurrent models such as Recurrent Neural Networks, Long Short-Term Memory, and Gated Recurrent Units [31][32][36,37], the current text specifically concentrates on pattern recognition through visual representations, which is a recognized research avenue that holds considerable potential for enhancing security measures in IoT environments [33][34][35][38,39,40]. In [36][41], the authors trained a ResNet-34 architecture in a supervised way using the CICDDoS2019 dataset [37][42]. This dataset was chosen since it includes 11 different types of denial-of-service attacks (e.g., SYN flood and UDP flood) described by 80 traffic features. To transform the flow-level traffic into images, the authors employed min-max normalization [38][43]. Each feature’s value was re-scaled between 0 and 1 and subsequently multiplied by 255. The resulting input images for the model measured 224 × 224 pixels and had three channels. For training, Stochastic Gradient Descent (SGD) [39][44] with a learning rate of 0.0001 and with a momentum of 0.9 was used. The model was trained for 10 epochs for binary classification and extended to 50 epochs for multiclass classification. The proposed solution achieved an accuracy of 99.99% and 87.06% for the binary and multiclass problems, respectively. Notably, while the authors devised an AI-based solution for denial-of-service attack detection in IoT networks, they neither tested their model in an environment with IoT devices nor sourced a dataset from IoT traffic. The CICDDoS2019 dataset they used originates from a testbed setup involving a victim web server and Windows PCs. Reference [40][45] proposed an anomaly-based IDS using ResNet-50 with convolutional layers of one dimension. This system was trained using three different datasets, namely the NSL-KDD (NSL-KDD dataset. https://www.unb.ca/cic/datasets/nsl.html, accessed on 12 July 2023), CIC-IDS2017, and UNSW-NB15 [41][46], covering several categories of attacks, including denial-of-service, reconnaissance, and brute force. The input data were not transformed into images, but instead, the sequential traffic was fed into the model for classification purposes. The proposed smart IDS outperformed other AI models, such as Decision Trees, Random Forests, and Support Vector Machines, as in the case of UNSW-NB15, with a maximum accuracy of 92.18% and an F1 score of 89%. Nevertheless, the experiments conducted in the paper ignored the S-SL paradigm, as well as IoT traffic. The authors of [18] created a network-based IDS based on S-SL and grayscale images obtained from preprocessing the UNSW-NB15 dataset. Contrastive learning was followed, with a data augmentation policy that included operations such as vertical flipping and random cropping. For the AI approach, the authors utilized the BYOL model, which consists of two neural networks (online and target) that learn from one another through data augmentation. The BoTNet [42][47] encoder was selected as the feature extractor, and the generalization ability of the proposed IDS was evaluated with fine-tuning on the NSL-KDD, KDD CUP 99 [43][48], CIC-IDS2017, and CIDDS_001 [44][49] datasets. Even though these S-SL experiments outperformed purely supervised learning models in some cases by more than 5% in terms of accuracy, this work was not tested on an IoT-related scenario. In [45][50], a custom model based on BYOL was proposed, pre-trained using S-SL and contrastive learning on the UNSW-NB15 dataset. Regarding data augmentation, the authors applied masking, which consisted of randomly assigning a value of zero to a predefined percentage of features of each input sample. The transferability of the proposed model was evaluated under the Bot-IoT dataset, presenting an accuracy of 99.83% and an F1 score of 99.82%. Although there are experiments around the IoT domain, the pre-training phase used the UNSW-NB15 dataset, which may have negatively affected the feature representation quality of the model for IoT networks. The authors of [46][51] used S-SL and contrastive learning along with the UNSW-NB15, CIC-IDS2017, and CSE-CIC-IDS2018 (CSE-CIC-IDS2018 dataset. https://registry.opendata.aws/cse-cic-ids2018, accessed on 13 July 2023) datasets for creating a network-based IDS using a custom model with a Multi-layer Perceptron (MLP) as the backbone. With respect to the data augmentation strategy, the authors generated adversarial examples based on [47][52]. The accuracy for DoS attack detection was 97.63% using the MLP model with the S-SL strategy, compared to the 54.34% accuracy of the MLP model without the S-SL pre-training process. Although these results reflect the potential of S-SL when compared to a purely supervised learning training strategy, the work presented in [46][51] might benefit from extending its experimentation to more testbeds, such as those of smart homes and industrial IoT environments.
Video Production Service