Because of its weaknesses, the Internet of Things is vulnerable to assaults and security threats
[2][3][4][5][6][7][8][9][10][11]. Researchers attempted to categorize attacks, vulnerabilities, and security concerns on the Internet of Things so that researchers could more easily identify answers. For example, according to the layers of the IoT architecture, the researchers categorized the vulnerabilities, and physical security hardening is lacking. Unconfident data storage and transfer, shortage of clarity and device management, botnets, insecure passcodes, ecosystem interfaces, and AI-based assaults have all been concerns for devices on the IoT. While some academics emphasized IoT’s vulnerabilities and security risks, others did not. Among these issues, the researchers pointed out that because IoT employs traditional network architecture, it inherits its flaws
[6]. In addition, the rise in terminal devices (end nodes) with limited processing capabilities was one of the most significant and powerful vulnerabilities exploited by attackers
[7]. End node manufacturers
design them to work without paying attention to security concerns. As a result, these devices must be monitored and managed to protect networks from threats and reach a higher degree of IoT security
[2]. The Open Web Application Security Project (OWASP) produced a comprehensive record of IoT attack areas and locations in IoT systems as a section of its Internet of Things plan. In addition, it recorded applications where alleged unauthorized actions could be encountered
[12]. Following is a summary of the IoT attack areas:
Intrusion detection systems (IDSs) are essential security techniques to conserve network security, and they are installed at a fatal location in the network
[12][13]. Traditional systems contain source and preliminary processing of data and a decision-making technique. This process contains the collection of raw data from
host or network traffic. By analyzing the network data traffic, an intrusion detection system can classify the network behavior as normal or abnormal
[12] and then process the features passed by the decision-making method to recognize threats
[13]. Three main ways to detect intrusions are signature-based IDS, anomaly-based IDS, and a hybrid of signature- and anomaly-based IDS
[13]. Dynamic anomaly-based network detection systems are flexible and superior to static signature-based network intrusion systems because the former can detect new attacks
[14]. They use artificial intelligence (AI) algorithms that are made of both machine learning (ML) and deep learning (DL) architectures. On the other hand, IDSs detect signatures and patterns and then match them with the predefined signature of misuses, which could be worthless with unknown attacks
[13]. The three significant categories of intrusion detection systems are host intrusion detection systems (HIDSs), network intrusion detection systems (NIDSs), and network node intrusion detection systems (NNIDSs)
[13]. The HIDS is installed on the entire network of machines and other parts of the physical and virtual networks and protocols. The NIDS protects vulnerable network parts where the attack opportunities are high. IDSs consider network or host-based methods to recognize and distract attacks. These methods search for attack signatures with patterns that indicate malignant action or suspicious activity. Based on where an IDS is searching for the pattern, either in network traffic or log files, it is classified as network- or host-based
[15].
Machine learning methods are extensively used to build network intrusion detection systems because of their capability to grasp new intrusions
[16]. To develop accurate algorithms that can cluster, classify, and predict, it is vital to utilize considerable-size data sets using supervised machine learning techniques such as SVM and naïve Bayes. In addition, decision trees demonstrate their simplicity, rapid adaptability, and accuracy. In addition, neural networks have been widely used to characterize anomaly and misuse patterns
[12][16]. Accuracy and interpretability are essential factors of artificial intelligence models. To achieve accuracy and interpretability, machine learning and deep learning techniques must be considered. For example, black-box algorithms provide higher accuracy, while white-box algorithms provide feature engineering
[14].
2. Machine Learning Techniques used in IoT Traffic
Rose et al.
[17] generated a dataset and developed a model to detect and investigate the possibilities of utilizing network profiling and machine learning to protect IoT against cyber-attacks. The authors suggested anomaly-based intrusion detection system profiles and
monitoring all networked devices constantly and aggressively to identify IoT device tampering attempts and suspicious network transactions. They evaluated the suggested methodology’s performance using regular and malicious network traffic on the Cyber-Trust testbed. The experimental findings reveal that the suggested anomaly detection system produces good results, with a 98.35% accuracy and 98.35% false-positive alerts.
Ali et al.
[18] present a general machine learning strategy for identifying IoT devices and evaluating the trained models against four publicly available datasets. NFStream extracted 85 attributes from packet capture (.pcap) files to better identify IoT devices in the network using machine learning models. The authors used the information gain approach to choose 20 characteristics and trained six machine learning models in the tests. In the training phase, the authors achieved high accuracy, reaching 99% for IoT device identification using random forest and naïve Bayes classifiers.
El-Sayed et al.
[19] examined and compared seven different supervised learning algorithms with various difficulty levels to pick the best one. The seven algorithms were separated into two groups: The category of CNN classifiers included two-layer CNN, four-layer CNN, VGG16 and logistic regression, support vector machine, and K-nearest neighbors, and the category of ordinary classifiers included logistic regression, support vector machine, and K-nearest neighbors. Experimental findings reveal that the SVM algorithm obtains the maximum performance of 94% on MobileNetv2 features because of its rapid and steady training performance with fewer resources compared with other models. Le K-H et al.
[20] present IMIDS, an intelligent intrusion detection system (IDS) for IoT devices. IMIDS’s core is a lightweight
convolutional neural network model that can categorize numerous cyber threats and surpasses its competitors with an average F-measure of 97.22%. Furthermore, after being further educated by the data supplied by the assault data generator, IMIDS’s detection performance significantly increased. These findings show that IMIDS may be used as an IDS in IoT.
Joo et al.
[21] proposed a deep learning-based IoT intrusion detection system. The categorization was performed with a CNN; the best score was 86.2%. Second, machine learning classifiers were employed for the hybrid technique instead of ultimately linked layers from the vanilla CNN, which delivered roughly 87% with the additional tree classifier. Finally, the Xception model was merged with the bidirectional GRU, yielding the best accuracy at 95.6%. For quicker identification and classification of new malware, Bendiab et al.
[22] propose a unique IoT technique that analyzes malware traffic based on DL and visual representation (zero-day malware). The suggested technique detects fraudulent network traffic at the package level, lowering detection time and optimistic outcomes thanks to the deployed deep learning. To test the efficacy of the proposed technique, the authors created a dataset of 1000 .pcap files of benign and virus traffic obtained from several network traffic sources. The Residual Neural Network (ResNet50) trial findings are quite encouraging, with a detection rate of 94.50% for malware traffic.
Six machine learning (ML) approaches were tested for their ability to identify MQTT-based attacks
[23]. Packet-based, unidirectional, and bidirectional flow characteristics were evaluated at three abstraction levels. An MQTT simulated dataset was created and used for the training and assessment operations. The experimental findings showed that the suggested ML models were sufficient for the IDS needs of MQTT-based networks. Furthermore, the findings highlight the significance of employing flow-based characteristics to distinguish MQTT-based attacks from innocuous traffic, whereas packet-based features are sufficient for typical networking assaults. The results reveal that the model has the highest accuracy of 99.04%. Sapre et al.
[24] employed the KDDCup99 and the NSLKDD, two widely used intrusion detection datasets, in their study. Their major objective was to thoroughly compare both datasets by analyzing the performance of multiple machine learning (ML) classifiers trained on them using a more extensive range of classification criteria than prior studies. Because the classifiers trained on the KDDCup99 dataset were 20.18% less accurate on average, the authors concluded that the NSL-KDD dataset is of better quality than the KDDCup99 dataset. This is because classifiers trained on the KDDCup99 dataset were biased toward redundancy, allowing them to attain a higher accuracy of 96.83%. Liu et al.
[25] looked at assaults that might affect sensors and networks in IoT scenarios using the NSL-KDD dataset. Moreover, the authors investigated eleven machine learning techniques and provided the findings to identify the introduced assaults. They showed that tree-based approaches and ensemble methods surpass the other machine learning methods evaluated through numerical analysis. With 97% accuracy, 90.5%
Matthews correlation coefficient (MCC), and 99.6% area under the curve (AUC), XGBoost is the best of the supervised algorithms. Furthermore, the expectation-maximization (EM) technique, which is an unsupervised approach, performs exceptionally well in identifying assaults in the NSL-KDD dataset and beats the naïve Bayes classifier by 22.0% in terms of accuracy.
To distinguish benign from malicious nodes, Amouri et al.
[26] used a methodology that consists of two stages: in the first stage, the data are collected by dedicated sniffers (DSs), and then the CCI is generated and is regularly sent to the super node (SN). After that, in the second stage, the SN processes a linear regression method on the collected CCIs from different DSs to distinguish benign from malicious nodes. Using two mobility models, namely random waypoint (RWP) and Gauss Markov, the detection characterization is shown for several extreme cases in the network (GM). The
black hole and distributed denial of service (DDoS) assaults are two harmful activities utilized at work. Nodes with high-velocity situations showed detection rates of over 98%, while nodes with low-velocity scenarios showed detection rates of approximately 90%. Fenanir et al.
[27] created a lightweight intrusion detection system (IDS) using two machine learning techniques: the filter-based method was used to pick features due to its cheap computational cost. A comparison of logistic regression (LR), naïve Bayes (NB), decision tree (DT), random forest (RF), k-nearest neighbor (KNN), support vector machine (SVM), and multilayer perceptron yielded the feature classification approach to the system (MLP). Finally, the DT method was chosen for the system due to its excellent performance across various datasets. The study’s outcomes might help choose the optimum feature selection approach for machine learning; the data suggest that the best results are 98% accuracy.
Islam et al.
[28] pointed out numerous types of IoT threats and discussed shallow IDSs in the IoT environment (such as decision tree (DT), random forest (RF), and support vector machine (SVM)), as well as DL (deep neural network (DNN), deep belief network (DBN), long short-term memory (LSTM), stacked LSTM, and bidirectional LSTM (Bi-LSTM))-based IDSs. The models’ execution was assessed using five standard datasets: NSL-KDD, IoTDevNet, DS2OS, IoTID20, and the IoT Botnet dataset. The performance of shallow/deep machine learning-based IDSs was evaluated using several performance indicators such as accuracy, precision, recall, and F1-score. According to the research, a machine learning IDS surpasses shallow machine learning in detecting IoT threats; the most remarkable outcome of the studies is the accuracy of 98.79%. Using characteristics from the UNSW-NB15 dataset, Ahmad et al.
[29] suggest feature clusters regarding its flow, Message Queuing Telemetry Transport (MQTT), and Transmission Control Protocol (TCP). Overfitting, the curse of dimensionality, and an unbalanced dataset are no longer issues. The proposed method used supervised machine learning (ML) methods such as random forest (RF), support vector machine, and
artificial neural networks on the clusters. The model reaches 98.67% and 97.37% accuracy using RF in binary and multiclass classification. Utilizing RF on flow and MQTT features, TCP features, and top features from both clusters, classification accuracies of 96.96%, 91.4%, and 97.54% were obtained using cluster-based approaches. A two-stage hybrid technique was proposed by Saba et al. in
[30]. To increase the accuracy of the suggested system, the genetic algorithm (GA) is first used to pick acceptable characteristics. The support vector machine (SVM), ensemble classifier, decision tree, and other well-known machine learning (ML) algorithms are then used. Using the NSL-KDD database, they attained a 99.8% accuracy using 10-fold cross-validation. Based on a hybrid convolutional neural network model, Smys et al.
[31] suggested an intrusion detection system for IoT networks that can identify many forms of assaults. The proposed paradigm may be used in a variety of IoT scenarios. The proposed study is validated and compared to machine learning and deep learning models. The suggested hybrid model is more sensitive to threats in the IoT network, with a 98.6% accuracy rate. Papafotikas et al.
[32] propose a digital system incorporating a machine learning (ML)-based clustering method for identifying suspected activities while using current supply characteristic dissipation. The K-means clustering algorithm accompanied by supervised training is used in this prototype system. This research demonstrated the successful identification of suspicious activity in intelligent IoT devices. Similarly, a study in
[33] proposed an IDS approach using a fused machine learning model. Three datasets, namely KDD, CUP-99, and NetML-2020, were fused under a novel-built machine learning-based architecture. The trained model was promising in terms of accuracy of 95.18%.
Further, several researchers in the
literature have comprehensively surveyed and emphasized the significance of machine learning and deep learning models in the IDSs involving IoT networks
[34][35][36][37], especially in conjunction with
cloud computing, namely the Cloud of Things security aspect
[38]. This is mainly because it involves several intermediate public networks and stakeholders, making it more vulnerable to attacks.
Table 1 summarizes related work approaches, including the techniques used, dataset type, and the respective study’s advantages and disadvantages.
Table 1. Summary of the research.