Your browser does not fully support modern features. Please upgrade for a smoother experience.

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Safwana Haque	--	4166	2023-08-22 23:09:20	\|
2	format	Jason Zhu	Meta information modification	4166	2023-08-23 03:19:32	\|

Video Upload Options

We provide professional Academic Video Service to translate complex research into visually appealing presentations. Would you like to try it?

No, upload directly Yes

Cite

If you have any further questions, please contact Encyclopedia Editorial Office.

Select a Style

Haque, S.; El-Moussa, F.; Komninos, N.; Muttukrishnan, R. Data-Driven Attack Detection Trends in IoT. Encyclopedia. Available online: https://encyclopedia.pub/entry/48341 (accessed on 07 February 2026).

Haque S, El-Moussa F, Komninos N, Muttukrishnan R. Data-Driven Attack Detection Trends in IoT. Encyclopedia. Available at: https://encyclopedia.pub/entry/48341. Accessed February 07, 2026.

Haque, Safwana, Fadi El-Moussa, Nikos Komninos, Rajarajan Muttukrishnan. "Data-Driven Attack Detection Trends in IoT" Encyclopedia, https://encyclopedia.pub/entry/48341 (accessed February 07, 2026).

Haque, S., El-Moussa, F., Komninos, N., & Muttukrishnan, R. (2023, August 22). Data-Driven Attack Detection Trends in IoT. In Encyclopedia. https://encyclopedia.pub/entry/48341

Haque, Safwana, et al. "Data-Driven Attack Detection Trends in IoT." Encyclopedia. Web. 22 August, 2023.

Data-Driven Attack Detection Trends in IoT

Edit

This entry is adapted from the peer-reviewed paper 10.3390/s23167191

The Internet of Things is perhaps a concept that the world cannot be imagined without today, having become intertwined in everyday lives in the domestic, corporate and industrial spheres. However, irrespective of the convenience, ease and connectivity provided by the Internet of Things, the security issues and attacks faced by this technological framework are equally alarming and undeniable. In order to address these various security issues, researchers race against evolving technology, trends and attacker expertise.

IoT datasets machine learning cyberattack

1. Introduction

Technology is a rapidly evolving paradigm that is especially difficult to keep up with in the field of computing. This can be mainly accredited to the advancements made in semiconductor chips, which are continuously improved and exploited for research purposes. Some of the most recent buzz terms that can be commonly heard and are of relevance are machine learning (ML), federated learning (FL), blockchain and Internet of Things (IoT). These technologies can be further combined with one another to improve their individual outputs or efficiency and to generate an alternate byproduct or result. For example, FL can be used to ensure or enhance data privacy in the IoT and ML can be used to make automated predictions in IoT devices. On the other hand, blockchain can be used to improve trust and transparency in data transactions in IoT networks.

IoT, is a term coined by Kevin Ashton in 1999 ^[1] but only gained traction in 2013. Since 2017, IoT has grown tremendously and will continue to do so at an even greater rate according to market and industry surveys ^[2]^[3]^[4]^[5]^[6]. IoT has penetrated every sector of life, encompassing transportation, health, communication, agriculture, homes, etc., with even traditional devices having become ‘smart’, e.g., smart locks, smart cars, smart fridges, smart lights, smart speakers and smart watches. According to ^[7], as of 2020, there was an equal number of IoT and non-IoT devices in the world, and the amount of the former is estimated to triple by 2025. While making life easier, this explosive growth has introduced many related concerns, such as the need for more speed, storage, capabilities, efficiency, etc., which researchers are continually trying to address and improve.

One of the biggest growing concerns, however, is the security and privacy of users, data, devices and the IoT network, which are often overlooked by both manufacturers and consumers. Implementing failsafe systems can be a painstaking process, yet the failure to do so can lead to serious repercussions for both individual users and companies. Cybercrimes are very common and already impact existing home IoT networks. A recent incident reported by the British Broadcasting Corporation (BBC), for instance, revealed how a family became suspects to a cybercrime that involved child abuse, to the detriment of their domestic life, income and mental health, the crime most likely having occurred via the hacking of their Wireless Fidelity (Wi-Fi) router, whose default password settings had not been changed ^[8]. Most cyberattacks commonly result from exploiting security vulnerabilities, such as weak/default password usage, poor update management, insecure interfaces, lack of user and data privacy, poor user awareness, lack of vendor standardization and many more.

Numerous steps must be continually taken to ensure that cybersecurity is maintained. These include the raising of user awareness/cyber education, security policy implementations, security software and tools (such as antivirus, firewalls, etc.) and, more recently, automated measures using machine and deep learning (DL) techniques. Exhaustive research has been carried out for conventional network and data security, but such work is severely lacking in emerging fields such as IoT. For example, numerous datasets have been generated and created by various studies and researchers on general-purpose networks, the earliest of which—known as the DARPA (Defense Advanced Research Projects Agency) dataset—dates back to 1998 ^[9]. Other datasets, found in ^[10]^[11]^[12], have been used to design intrusion detection and prevention systems (IDSs and IPSs, respectively). With respect to those widely used to train ML algorithms for IoT networks, older datasets, such as Knowledge Discovery in Databases (KDD) and Network Security Laboratory Knowledge Discovery in Databases (NSL-KDD), are believed to have shortcomings, e.g., there are a large number of duplicate records that could skew the machine training and learning process in the KDD dataset ^[13], and NSL-KDD, though an improvement over KDD, does not include more recent attack classes and IoT network properties. UNSW-NB15 ^[14] (by the University of New South Wales) and CIC-IDS2017 and CIC-IDS2018 ^[15] (by the Canadian Institute for Cybersecurity) are the more recent datasets used for IoT ML training, but as these datasets are not primarily concerned with IoT networks attack detection becomes limited.

2. What Are the Datasets Created Specifically for the Study of IoT Networks and Their Security?

The survey addresses this question by finding datasets that have been created using IoT devices in either a simulated environment or a physical network. In most cases, the IoT networks created are exposed to attacks and the network behavior is studied and analyzed under various attack conditions. Benign and attack data are collected and used to train ML and DL algorithms to create intrusion detection systems (IDSs). Ten datasets were found that are being studied and experimented on as part of this survey. Brief descriptions of these datasets are given below, while details of their attack capabilities can be found in Table 1.

Bot-IoT ^[16] is a simulated dataset created to study and analyze network forensics using ML and DL techniques. It is based on five IoT scenarios consisting of a weather station, a smart fridge, motion-activated lights, a remotely activated garage door and a smart thermostat. These simulated environments were exposed to three categories of attacks: information gathering (port scans, operating system (OS) fingerprinting); denial of service (Transmission Control Protocol (TCP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP) for both denial of service (DoS) and distributed denial of service (DDoS)), and information theft (keylogging and data theft), which are commonly exploited by botnets (bots). This dataset consists of more than 72 million packet capture (PCAP) records. The distribution of attack records is not uniform, however, with the information theft attacks having the least number of records.
IoT Network Intrusion Dataset ^[17] (IoTNID) was created using two real devices: a camera and a speaker. The dataset consists of reconnaissance, man-in-the-middle (MiTM), DoS and Mirai attacks. All the attack packets except those of Mirai were captured using the Nmap tool, while the Mirai attack packets were generated using a laptop.
IoT-23 ^[18] is a dataset created using three physical IoT devices: a Philips HUE smart Light Emitting Diode (LED) light, an Amazon Echo device and a Somfy smart door lock. These devices were set up to model 20 different malware scenarios and 3 benign scenarios (one for each device). Each malware scenario was exposed to a botnet (bot) attack, such as Mirai, Gafgyt, Torii, etc. This dataset was manually analyzed to provide benign and attack traffic features.
MedBIoT ^[19] is a dataset that tries to emulate a medium-sized network consisting of 80 simulated devices and 3 real devices. The devices used were a switch, a light bulb, a lock and a fan. The setup was exposed to three types of botnets: Mirai, BASHLITE and Torii. This dataset aims to provide data for intrusion detection of botnets.
MQTT-IoT ^[20] is a dataset based on a publish/subscribe message protocol called Message Queue Telemetry Transport (MQTT) used in the application/middleware layer. It is based on a simulated setup comprising 12 IoT sensors in four different attack scenarios (Table 1) and one benign scenario. This dataset was intended to be used for intrusion detection using ML techniques.
MQTTset ^[21] is another dataset based on the MQTT communication protocol, in this case aimed at aiding the application of ML techniques in MQTT networks. The setup was simulated using eight different sensors of the following types: temperature, light, humidity, carbon monoxide (CO) gas, motion, smoke, door and fan to exploit five MQTT network attacks. This dataset removes features such as source and destination IP (Internet Protocol) addresses, port addresses and communication times among others that can be found in other datasets and focuses mainly on MQTT-based features.
N-BaIoT ^[22]: The Network-based Detection of IoT (N-BaIoT) dataset was created using nine IoT devices, namely, two doorbells, one thermostat, one baby monitor, four security cameras and one webcam. These devices were of different makes and models. The network setup was exposed to two types of botnet attacks: Mirai and BASHLITE. Each of these botnets has other attacks, as specified in Table 1. This dataset comprises both benign and attack traffic intended for the study and detection of botnet attacks.
ToN_IoT ^[23] is a dataset that aims at addressing the properties of both IoT and IIoT by collecting data from telemetric sources, operating systems and network data, hence the name ToN_IoT. Nine types of attacks were studied on the seven types of sensors specified in Table 1. This dataset explores the interaction of network elements across the edge, fog and cloud layers and tries to provide data for intrusion detection in large-scale IoT network scenarios.
Edge-IIoTset ^[24]: This is another dataset that was created to study IoT and IIoT devices and networks. Its design architecture consists of seven layers and 12 IoT (e.g., sound detection sensor, ultrasonic sensor, etc.) and IIoT devices (servo motor, stepper motor, etc.) The testbed was tested with 15 attacks which were categorized into 5 broad attack categories.
CICIoT2023 ^[25] is an IoT-based dataset that is the largest (as of 2023) in terms of the number of devices used to set up the network topology and the number of attacks studied. A total of 105 devices were used to design the testbed, and 33 attacks were carried out on the network for data collection, which were broadly classified into 7 attack categories. These attacks were carried out on the IoT devices using other IoT devices. This dataset also included Zigbee and Z-wave devices along with other IoT devices.

Table 1. IoT datasets summary.

	Year	Testbed Setup	Device Used	Attacks	Normal Traffic Gen Tool	Attack Traffic Gen Tool	Network Sim Tool	Packet Capture Tool
Bot-IoT ^[26]	2018	Virtual	5 devices simulated: smart refrigerator, smart garage door, weather monitoring system, smart lights, smart thermostat	Information gathering (service and OS scanning), denial of service (TCP, UDP, HTTP DoS and TCP, UDP, HTTP DDoS), information theft (keylogging, data theft)	Ostinato software ^[27]	Hping3 ^[28], Nmap ^[29], xprobe2 ^[30], golden-eye ^[31], Metasploit ^[32]	Node-red ^[33]	Tshark ^[34]; features extracted with Argus ^[35]
N-BaIoT ^[36]	2018	Real	9 real devices of types: doorbell, thermostat, baby monitor, security camera, webcam	BASHLITE (scan, junk, UDP flooding, TCP flooding, COMBO attack) and Mirai (scan, ack flooding, syn flooding, UDP flooding, UDP plain flooding)	N/A	Binaries and source code of BASHLITE and Mirai, respectively	N/A	Wireshark ^[37]
IoTNID ^[17]	2019	Real	2 real devices: Wi-Fi camera, speaker	Scanning (host, port, OS), man-in-the-middle, DoS attacks, Mirai (UDP, ACK, HTTP flooding, brute force)	N/A	Nmap	N/A	Monitor mode of wireless network adapter
IoT-23 ^[38]	2020	Real	3 physical: speaker, light bulb, door lock	Mirai, Torii, Hide and Seek, Muhstik, Hakai, Internet Relay Chat Botnet (IRCBot), Hajime, Trojan, Kenjiro, Okiru, Gagfyt	N/A	Malware sample in a Raspberry Pi	N/A	Zeek ^[39]; features extracted with Zeek
MedBIoT ^[40]	2020	Mixed	80 virtual, 3 physical: switch, light bulb, lock, fan	Botnet malware: Mirai, BASHLITE and Torii	Scripts to trigger actions	Mirai and BashLite source codes, Torii sample	Docker ^[41]	tcpdump ^[42]; features extracted with Splunk ^[43]
MQTT-IoT ^[44]	2020	Virtual	12 MQTT sensors simulated	Aggressive scan, UDP scan, Sparta Secure Shell (SSH) brute force, MQTT brute-force attack	“Publish” MQTT command	Nmap, MQTT-PWN ^[45]	Virtual machines, VLC ^[46]	tcpdump
MQTT set ^[47]	2020	Virtual	10 simulated devices: temperature, light intensity, humidity, CO gas, motion, smoke, door opening/closure and fan status	Flooding denial of service, MQTT Publish flood, Slow DoS against Internet of Things Environments (SlowITe), malformed data, brute-force authentication	IoT-Flock ^[48]	MQTT-malaria ^[49], IoT-Flock, Message Queuing Telemetry Transport Security Assistant (MQTTSA) ^[50]	IoT-Flock	Eclipse Mosquitto ^[51]
ToN_ IoT ^[52]	2020	Mixed	7 simulated sensors: fridge, garage door, GPS tracker, modbus, motion light, thermostat, weather sensor	Scanning, DoS, DDoS, ransomware, backdoor, injection, cross-site scripting, password and man-in-the-middle attacks	JavaScript in Node-RED	Nmap, Nessus ^[53], Python script, Metasploitable3, bash scripts on DVWA ^[54] and Security Shepherd ^[55], CeWL (Custom Word List generator) ^[56], Hydra ^[57], Ettercap tool ^[58]	NSX-VMware ^[59], Node-RED	Data logger on Node-RED server, Zeek
Edge-IIoT ^[60]	2022	Real	12 physical IoT and IIoT devices	DoS/DDoS (TCP SYN, UDP, HTTP, ICMP), information gathering (port scan, OS fingerprinting, vulnerability scan), MiTM (DNS and ARP spoofing), injection attack (XSS, SQL injection, uploading attack), malware (backdoor, password cracking, ransomware)	N/A	Hping3, Slowhttptest ^[61], Nmap, Netcat ^[62], Xprobe2, Nikto ^[63], Ettercap, XSSer ^[64], SQLmap ^[65], CeWL, OpenSSL cryptography toolkit ^[66]	N/A	Wireshark, Zeek and Tshark for feature extraction
CICIoT 23 ^[67]	2023	Real	67 IoT devices, 38 Zigbee and Z-wave devices	33 attacks in 7 categories (DDoS, DoS, Recon, web-based, brute force, spoofing, Mirai)	N/A	Hping3, udp-flood, slowloris, golang-httpflood, nmap, fping ^[68], DVWA, remot3d ^[69], BeEF ^[70], hydra, Ettercap, Mirai code	N/A	Wireshark, tcpdump and dpkt package for feature extraction

3. Are There Any Similarities or Differences among These Datasets?

To address this question, the IoT-related datasets found in the literature were compared. It was observed that all the datasets surveyed vary in respect to the number and types of devices used in the setup; the type of setup, whether simulation, real or mixed; the attacks the devices were exposed to, etc., as shown in Table 1. However, there are similarities among them which are discussed below:

Features: Bot-IoT is the earliest IoT dataset considered and has been utilized by a number of researchers to carry out ML techniques for intrusion detection training. Even though this dataset employs the MQTT protocol, similar to the MQTT-IoT and MQTTset datasets, its feature set has no MQTT-based features, such as those found in the latter two, which are the only datasets that contain MQTT-related features. From Table 2, which shows the features common among the datasets studied, it can be seen that N-BaIoT and MedBIoT have 100 similar features to each other but have no common features with other datasets. Similarly, MQTT-IoT and MQTTset have MQTT-related features that are not found in other datasets. Over 15 features common to the ToN_IoT and IoT-23 datasets were also seen.

The most common features found amongst the datasets were the five-tuple network flow features (source/destination IP address, source/destination port and protocol) and timestamps. A difference in opinion and research carried out regarding these features has been observed. While some studies, such as ^[47], removed common features like the source/destination IP and port addresses, as well as communication times, from their MQTTset to allow the identification of features independent of a particular connection/communication, others, such as ^[71], used these features in the IoT Network Intrusion Dataset to carry out ML training and testing for attack detection. These features, while important in identifying a network flow, carrying out network configurations and troubleshooting, could skew the ML training processes, leading to overfitting and the generation of high prediction rates. Other features, such as sequence or identification numbers, found in IoT-23, Bot-IoT, Edge-IIoT and IoTNID, could have similar effects.

Most datasets have one or more of the three features (attack, category and subcategory labels) that are used to tag a flow as benign, attack or type of attack. The attack label is used to tag a traffic flow as either benign or attack traffic, which are sometimes denoted as 0 and 1, respectively. On the other hand, the category and subcategory labels are used in datasets where there are a number of different attack types and classes, e.g., the category is used to indicate that a flow belongs to a DoS attack while the subcategory indicates if it was a UDP, TCP, HTTP or ICMP (Internet Control Message Protocol) DoS attack. These features are not used in the training process, however, but to measure the performance of ML models. The category and subcategory labels are useful for supervised learning where the model is trained for the detection of the related attack class, while the label is useful for both supervised and unsupervised learning. In datasets where the labels are not explicitly given, such as in N-BaIoT, MQTTset, etc., the PCAP or comma-separated values (CSV) files are collected and organized separately for each type of attack or normal class for easy identification.
Attacks: This is another important characteristic of an IoT dataset, as this would determine the type of attack an IDS would be able to detect when trained with the particular dataset. Table 3 shows the types of attacks carried out in the test environment to create the datasets. The attacks have been categorized to show the layer of architecture they belong to. As IoT networks do not have a standardized architecture yet, such as the Open Systems Interconnection (OSI) model used in a conventional network, the attacks have been mapped to the OSI model depending on the layer the attack exploits.

For example, an application-layer attack targets the highest layer of the OSI model, exploiting the application-level protocols and services. Some of the attacks seen in this category were cross-site scripting (XSS), SQL injection and HTTP DoS attacks. The most common form of transport-layer attacks seen in these datasets were the TCP and UDP DDoS/DoS attacks which exploit the weaknesses of transport-layer protocols to overwhelm the network resources. Other layered attacks, such as ICMP flood/DoS attacks in the network layer, were observed, while only ARP (Address Resolution Protocol) spoofing was seen in the datalink layer. No physical layers have been studied in these datasets. Other malware or botnet attacks are more difficult to classify as they can span multiple layers.

Some datasets, such as N-BaIoT, IoT-23 and MedBIoT, contained traffic related to botnet attacks only. The IoT_23 dataset contains the highest number of different botnets, while Mirai and BASHLITE are the most common types seen across all the datasets. DoS and reconnaissance attacks are the next most common attacks found in these datasets. Attacks related to IoT protocols, such as MQTT attacks, were contained only in the MQTT-IoT and MQTTset datasets. Attacks related to other IoT protocols, such as Constrained Application Protocol (CoAP) attacks, have not been explored. It was seen that as more datasets are created, the complexity in terms of the number devices or attacks explored increases. CICIDS23.
Devices Used: Table 1 shows the types of devices used in the experimental setups of the different datasets. It has been observed that there is a huge difference in the number and types of devices chosen for each type of dataset, ranging from just 2 devices in IoTNID to 105 devices in CICIDS23. MedBIoT uses 83 devices in its setup, of which 80 are virtual devices and 3 are physical devices. The MQTT-IoT dataset simulates 12 MQTT sensors to study the MQTT features and attacks, while CICIoT23 incorporates ZigBee and Z-wave devices in its setup. ToN_IoT and Edge-IIoT have included the modbus protocol and motor sensors to allow these datasets to be used for IIoT studies.

Table 2. Feature comparison among IoT datasets.

Common Features	Bot-IoT	N-BaIoT	IoT NID	IoT-23	Med BIoT	MQTT-IoT	MQTTset	ToN_ IoT	Edge-IIoT	CICIoT 2023
Source IP address	✓		✓	✓		✓		✓	✓
Destination IP address	✓		✓	✓		✓		✓	✓
Source port	✓		✓	✓		✓		✓	✓
Destination ports	✓		✓	✓		✓		✓	✓
Transport-layer protocols	✓		✓	✓		✓		✓		✓
Timestamp	✓		✓	✓				✓	✓	✓
Total duration	✓		✓	✓				✓		✓
Source bytes	✓			✓				✓
Destination bytes	✓			✓				✓
Service				✓				✓
Connection state				✓				✓
Missed bytes				✓				✓
Number of bytes per source IP				✓				✓
Number of bytes per destination IP				✓				✓
Number of packets per source IP				✓				✓		✓
Number of packets per destination IP				✓				✓		✓
MQTT message type						✓	✓	✓
MQTT message length						✓	✓	✓
User Name MQTT flag						✓	✓
Password MQTT flag						✓	✓
Will retain MQTT flag						✓	✓
Will flag MQTT flag						✓	✓
Clean MQTT flag						✓	✓
Reserved MQTT flag						✓	✓
All 100 of MedBIoT features		✓			✓
Label/attack	✓			✓		✓		✓	✓
Subcategory	✓
Category	✓							✓	✓	✓

Table 3. Attack distribution in IoT datasets.

Dataset	Attack	A	N	T	D	M
Bot-IoT	Information gathering (service and OS scanning)		✓
	TCP, UDP DoS/DDoS			✓
	HTTP DoS/DDoS, information theft (keylogging, data theft)	✓
N-BaIoT	BASHLITE/Mirai scan		✓
	Mirai (ack flooding, syn flooding, UDP flooding, UDP plain flooding), BASHLITE (junk, UDP flooding, TCP flooding, COMBO attack)			✓
	BASHLITE COMBO attack					✓
IoTNID	Scanning (host, port, OS)		✓
	Man-in-the-middle	✓	✓
	DoS attacks, Mirai (UDP, ACK)			✓
	Mirai (HTTP flooding, brute force)	✓
IoT-23	Mirai, Torii, Hide and Seek, Muhstik, Hakai, Internet Relay Chat Botnet (IRCBot), Hajime, Trojan, Kenjiro, Okiru, Gagfyt					✓
MedBIoT	Botnet malware: Mirai, BASHLITE and Torii					✓
MQTT-IoT	Aggressive scan		✓	✓
	UDP scan			✓
	Sparta Secure Shell (SSH) brute force, MQTT brute-force attack	✓
MQTTset	Flooding denial of service,		✓	✓
MQTTset	MQTT Publish flood, Slow DoS against Internet of Things Environments (SlowITe), malformed data, brute-force authentication	✓
ToN_IoT	scanning,		✓
	DoS, DDoS, and man-in-the-middle attacks		✓	✓
	Ransomware, backdoor, injection, cross-site scripting, password	✓
Edge-IIoT	DoS/DDoS (ICMP), MiTM (DNS spoofing)		✓
	MiTM (ARP spoofing),				✓
	DoS/DDoS (TCP SYN, UDP)			✓
	Information gathering (port scan, OS fingerprinting, vulnerability scan),		✓	✓
	HTTP DoS/DDoS, injection attack (XSS, SQL injection, uploading attack), malware (backdoor, password cracking, ransomware)	✓
CICIoT2023	ACK fragmentation, UDP flood, UDP plain flood, RSTFIN flood, PSHACK flood, TCP flood, SYN flood, synonymous IP flood			✓
	ICMP flood, ICMP fragmentation, DNS spoofing, ping sweep, OS scan, vulnerability scan, port scan, host discovery, GREIP flood, Greeth flood		✓
	SlowLoris, HTTP flood, SQL injection, command injection, backdoor malware, uploading attack, XSS, browser hijacking, dictionary brute-force	✓
	ARP spoofing				✓

4. What ML and DL Techniques Have Been Applied to These Datasets for Attack Detection?

These IoT datasets have been created to facilitate the study of the behavior of network parameters under different attacks and to devise means of either detecting or preventing attacks from occurring in a network. Any IDS designed with these datasets will be signature-based, meaning the IDS will be able to match the characteristics of a network flow with the attack flow it is trained with. An anomaly detection solution, on the other hand, will be trained to detect any traffic that deviates from the norm and alert the system. This has an added advantage in the sense that attack traffic may be easily identifiable. However, it is unable to identify the type of attack, which an IDS may be able to do.

It can be seen that newer ML techniques, such as DL, are gaining prominence. The advantage of DL algorithms, in comparison to ML algorithms, is that their performances can be improved by modifying their underlying hyperparameters. However, they can take longer ^[72] and have more processing overhead to train and test the model than their counter-ML algorithms. For these reasons, researchers have adopted a similar approach to DL as they have with ML, which is selecting the minimum and best features of a dataset to train an algorithm. It can be seen in ^[73], among other studies, that the runtime is reduced with a smaller feature set without (significantly) affecting the efficiency of the algorithm.

Some scientists, on the other hand, have tried to combine algorithms or create different ones similar to ensemble techniques ^[71]^[74]. Overall, it was seen from ^[47]^[73] and ^[75], for example, that tree-based algorithms, such as random trees (RTs), random forests (RFs), etc., performed better on average compared to others. Algorithms like Naïve Bayes (NB), though faster, had poorer performance comparatively ^[52]^[76]^[77]. It was also observed that the most commonly used ML algorithms were tree-based, while neural networks (NNs) are the most common for DL algorithms.

Despite various efforts, it was seen that some classes in the datasets did not yield promising results. For example, ^[75] found the prediction of benign traffic in IoT-23 to be poor, while ^[76] reported low precision rates for data theft and keylogging attack classes. Understanding the reasons behind these outcomes is important so that the datasets can be improved and newer ones without the same shortcomings can be generated in order to yield better detection results.

5. Any Other Methods Applied to These Datasets for Attack Detection?

It was observed that a different approach from the more traditional ML or DL is on the rise now. Known as federated learning, FL allows participating devices (in this case IoT devices or sensors) to retain their individual data (instead of sharing it with a server or datacenter) and to collaboratively train a shared prediction model. This method promotes privacy as node data are not exposed. Another advantage of this method is that data from devices can be non-IID (independent and identically distributed), meaning the devices could train the model at different times with different data sizes or parameters. This is a huge advantage, as IoT sensors differ in terms of their characteristics and the amount of information they gather.

An increasing number of studies using FL have been seen in the last two years. Seven of the discussed datasets in this study have been explored by researchers using FL. It is more common to see the use of DL or neural networks (NNs) in FL than traditional ML algorithms. This can be accredited to the fact that DL and NN models are better at learning and computing complex patterns in data with the use of multiple layers and deep architectures. This also reduces the need for manual feature engineering, as DL and NN algorithms can automatically deduce important features in the data used. A key difference between FL and ML is the use and transfer of models instead of data between devices and the training/testing server that allows privacy preservation of data. This is made possible with the use of transfer learning, where DL models can be pre-trained and deployed on the IoT devices, thereby reducing the need to train models from scratch. However, despite these benefits, DL algorithms are more resource-consuming compared to ML algorithms, e.g., in terms of training time, memory consumption, computational time, etc., which would add to the overheads of IoT devices, as they are usually limited in resources.

References

Ashton, K. That ‘Internet of Things’ Thing. RFID JOURNAL. 22 June 2009. Available online: https://www.rfidjournal.com/that-internet-of-things-thing (accessed on 20 June 2021).
CISCO. Cisco Annual Internet Report (2018–2023) White Paper; CISCO: San Jose, CA, USA, 2020.
Lheureux, B.; Velosa, A.; Thielemann, K.; Schulte, W.R.; Litan, A.; Pace, B. Predicts 2020: As IoT Use Proliferates, So Do Signs of Its Increasing Maturity and Growing Pains; Gartner: Hong Kong, China, 2019.
Hewlett Packard Enterprise. The Internet of Things: Today and Tomorrow; Hewlett Packard Enterprise: Hong Kong, China, 2019.
Ericsson. Connected Industries A Guide to Enterprise Digital Transformation Success A Report on Digital Transformation; Ericsson: Stockholm, Sweden, 2020.
The Economist Intelligence Unit. The IoT Business Index 2020: A Step Change in Adoption; The Economist Intelligence Unit: London, UK, 2020.
IoT Analytics. State of the IoT 2020: 12 Billion IoT Connections. 2020. Available online: https://iot-analytics.com/state-of-the-iot-2020-12-billion-iot-connections-surpassing-non-iot-for-the-first-time/ (accessed on 4 July 2021).
Wakefield, J. ‘Did Weak Wi-fi Password Lead the Police to Our Door?’—BBC News. 2021. Available online: https://www.bbc.co.uk/news/technology-57156799 (accessed on 24 May 2021).
1998 DARPA Intrusion Detection Evaluation Dataset|MIT Lincoln Laboratory. Available online: https://www.ll.mit.edu/r-d/datasets/1998-darpa-intrusion-detection-evaluation-dataset (accessed on 7 October 2021).
Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. 2020, 50, 102419.
Hindy, H.; Brosset, D.; Bayne, E.; Seeam, A.K.; Tachtatzis, C.; Atkinson, R.; Bellekens, X. A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems. IEEE Access 2020, 8, 104650–104675.
Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A Survey of Network-based Intrusion Detection Data Sets. Comput. Secur. 2019, 86, 147–167.
Choudhary, S.; Kesswani, N. Analysis of KDD-Cup’99, NSL-KDD and UNSW-NB15 Datasets using Deep Learning in IoT. In Procedia Computer Science; Elsevier B.V.: Amsterdam, The Netherlands, 2020; pp. 1561–1573.
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference, MilCIS 2015, Canberra, ACT, Australia, 10–12 November 2015; Institute of Electrical and Electronics Engineers: Piscataway, NJ, USA, 2015.
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the International Conference on Information Systems Security and Privacy, Funchal, Portugal, 22–24 January 2018.
IEEE DataPort. Nour Moustafa. The Bot-IoT Dataset. 2019. Available online: https://ieee-dataport.org/documents/bot-iot-dataset (accessed on 10 June 2023).
IEEE DataPort. IoT Network Intrusion Dataset. Available online: https://ieee-dataport.org/open-access/iot-network-intrusion-dataset (accessed on 10 June 2023).
IoT-23 Dataset: A Labeled Dataset of Malware and Benign IoT Traffic—Stratosphere IPS. Available online: https://www.stratosphereips.org/datasets-iot23 (accessed on 10 June 2023).
MedBIoT Data Set. Available online: https://cs.taltech.ee/research/data/medbiot/ (accessed on 10 June 2023).
IEEE DataPort. MQTT-IoT-IDS2020: MQTT Internet of Things Intrusion Detection Dataset. Available online: https://ieee-dataport.org/open-access/mqtt-iot-ids2020-mqtt-internet-things-intrusion-detection-dataset (accessed on 10 June 2023).
Kaggle. MQTTset. 2020. Available online: https://www.kaggle.com/cnrieiit/mqttset (accessed on 10 June 2023).
Meidan, Y. UCI Machine Learning Repository: Detection_of_IoT_botnet_attacks_N_BaIoT Data Set. 2018. Available online: https://archive.ics.uci.edu/dataset/442/detection+of+iot+botnet+attacks+n+baiot (accessed on 10 June 2023).
IEEE DataPort. ToN_IoT Datasets. Available online: https://ieee-dataport.org/documents/toniot-datasets (accessed on 10 June 2023).
IEEE DataPort. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications: Centralized and Federated Learning. Available online: https://ieee-dataport.org/documents/edge-iiotset-new-comprehensive-realistic-cyber-security-dataset-iot-and-iiot-applications (accessed on 10 June 2023).
UNB. CIC IoT Dataset 2023. Available online: https://www.unb.ca/cic/datasets/iotdataset-2023.html (accessed on 10 June 2023).
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the Development of Realistic Botnet Dataset in the Internet of Things for Network Forensic Analytics: Bot-IoT Dataset. Future Gener. Comput. Syst. 2019, 100, 779–796.
Ostinato Traffic Generator for Network Engineers. Available online: https://ostinato.org/ (accessed on 10 June 2023).
Kali Linux Tools. Hping3. Available online: https://www.kali.org/tools/hping3/ (accessed on 11 May 2023).
Nmap: The Network Mapper—Free Security Scanner. Available online: https://nmap.org/ (accessed on 11 May 2023).
Kali Linux Tools. Xprobe. Available online: https://www.kali.org/tools/xprobe/ (accessed on 11 May 2023).
Kali Linux Tools. Goldeneye. Available online: https://www.kali.org/tools/goldeneye/ (accessed on 11 May 2023).
Metasploit. Penetration Testing Software, Pen Testing Security. Available online: https://www.metasploit.com/ (accessed on 11 May 2023).
Node-RED. Available online: https://nodered.org/ (accessed on 11 May 2023).
Tshark. Available online: https://www.wireshark.org/docs/man-pages/tshark.html (accessed on 11 May 2023).
Openargus. Available online: https://openargus.org/ (accessed on 11 May 2023).
Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT: Network-based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22.
Wireshark · Go Deep. Available online: https://www.wireshark.org/ (accessed on 11 May 2023).
Parmisano, A.; Garcia, S.; Erquiaga, M.J. Aposemat IoT-23: A Labeled Dataset with Malicious And Benign IoT Network Traffic—Stratosphere IPS. 2020. Available online: https://www.stratosphereips.org/blog/2020/1/22/aposemat-iot-23-a-labeled-dataset-with-malicious-and-benign-iot-network-traffic (accessed on 19 June 2021).
The Zeek Network Security Monitor. Available online: https://zeek.org/ (accessed on 11 May 2023).
Guerra-Manzanares, A.; Medina-Galindo, J.; Bahsi, H.; Nõmm, S. MedBIoT: Generation of an IoT Botnet Dataset in a Medium-sized IoT Network. In Proceedings of the 6th International Conference on Information Systems Security and Privacy, SCITEPRESS—Science and Technology Publications, Valletta, Malta, 25–27 February 2020; pp. 207–218.
Docker: Accelerated Container Application Development. Available online: https://www.docker.com/ (accessed on 11 May 2023).
TCPDUMP & LIBPCAP. Available online: https://www.tcpdump.org/ (accessed on 11 May 2023).
Splunk. The Key to Enterprise Resilience. Available online: https://www.splunk.com/ (accessed on 11 May 2023).
Hindy, H.; Bayne, E.; Bures, M.; Atkinson, R.; Tachtatzis, C.; Bellekens, X. Machine Learning Based IoT Intrusion Detection System: An MQTT Case Study (MQTT-IoT-IDS2020 Dataset). June 2020. Available online: http://arxiv.org/abs/2006.15340 (accessed on 16 February 2021).
Mqtt-pwn. Available online: https://en.kali.tools/all//?tool=2801 (accessed on 11 May 2023).
VideoLAN. Available online: https://www.videolan.org/ (accessed on 11 May 2023).
Vaccari, I.; Chiola, G.; Aiello, M.; Mongelli, M.; Cambiaso, E. MQTTset, a New Dataset for Machine Learning Techniques on MQTT. Sensors 2020, 20, 6578.
GitHub. ThingzDefense/IoT-Flock. Available online: https://github.com/ThingzDefense/IoT-Flock (accessed on 11 May 2023).
GitHub. etactica/mqtt-malaria. Available online: https://github.com/etactica/mqtt-malaria (accessed on 11 May 2023).
MQTTSA. Available online: https://sites.google.com/fbk.eu/mqttsa (accessed on 11 May 2023).
Eclipse Mosquitto. Available online: https://mosquitto.org/ (accessed on 11 May 2023).
Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A. TON_IoT Telemetry Dataset: A New Generation Dataset of IoT and IIoT for Data-Driven Intrusion Detection Systems. IEEE Access 2020, 8, 165130–165150.
Nessus. Available online: https://www.cs.cmu.edu/~dwendlan/personal/nessus.html (accessed on 11 May 2023).
Kali Linux Tools. Dvwa. Available online: https://www.kali.org/tools/dvwa/ (accessed on 11 May 2023).
OWASP Foundation. OWASP Security Shepherd. Available online: https://owasp.org/www-project-security-shepherd/ (accessed on 11 May 2023).
Kali Linux Tools. Cewl. Available online: https://www.kali.org/tools/cewl/ (accessed on 11 May 2023).
Kali Linux Tools. Hydra. Available online: https://www.kali.org/tools/hydra/ (accessed on 11 May 2023).
Ettercap. Available online: https://www.ettercap-project.org/index.html# (accessed on 11 May 2023).
VMware NSX. Networking and Security Virtualization. Available online: https://www.vmware.com/uk/products/nsx.html (accessed on 11 May 2023).
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A New Comprehensive Realistic Cyber Security Dataset of IoT and IIoT Applications for Centralized and Federated Learning. IEEE Access 2022, 10, 40281–40306.
Kali Linux Tools. Slowhttptest. Available online: https://www.kali.org/tools/slowhttptest/ (accessed on 11 May 2023).
Netcat—SecTools Top Network Security Tools. Available online: https://sectools.org/tool/netcat/ (accessed on 11 May 2023).
Kali Linux Tools. Nikto. Available online: https://www.kali.org/tools/nikto/ (accessed on 11 May 2023).
XSSer: Cross Site ‘Scripter. Available online: https://xsser.03c8.net/ (accessed on 11 May 2023).
Sqlmap. Available online: https://sqlmap.org/ (accessed on 11 May 2023).
GitHub. openssl/openssl. Available online: https://github.com/openssl/openssl (accessed on 11 May 2023).
Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors 2023, 23, 5941.
Fping. Available online: https://fping.org/ (accessed on 11 May 2023).
Remot3d. Available online: https://kalilinuxtutorials.com/remot-3d-tool-large-pentesters/ (accessed on 11 May 2023).
BeEF. Available online: https://beefproject.com/ (accessed on 10 June 2023).
Ullah, I.; Mahmoud, Q.H. A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Networks. In Advances in Artificial Intelligence; Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Cham, Switzerland, 2020; Volume 12109 LNAI, pp. 508–520.
Dutta, V.; Choraś, M.; Pawlicki, M.; Kozik, R. A Deep Learning Ensemble for Network Anomaly and Cyber-Attack Detection. Sensors 2020, 20, 4583.
Alsamiri, J.; Alsubhi, K. Internet of Things Cyber Attacks Detection using Machine Learning. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 627–634.
Kozik, R.; Pawlicki, M.; Choraś, M. A new method of hybrid time window embedding with transformer-based traffic data classification in IoT-networked environment. Pattern Anal. Appl. 2021, 24, 1441–1449.
Stoian, N.-A. Machine Learning for Anomaly Detection in IoT Networks: Malware Analysis on the IoT-23 Data Set. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2020.
Shafiq, M.; Tian, Z.; Bashir, A.K.; Du, X.; Guizani, M. IoT malicious traffic identification using wrapper-based feature selection mechanisms. Comput. Secur. 2020, 94, 101863.
Das, A.; Ajila, S.A.; Lung, C.H. A Comprehensive Analysis of Accuracies of Machine Learning Algorithms for Network Intrusion Detection. In Machine Learning for Networking; Lecture Notes in Computer Science (including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2020; pp. 40–57.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Information Systems

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register : Safwana Haque ,

Fadi El-Moussa

, Nikos Komninos , Rajarajan Muttukrishnan

View Times: 1.7K

Update Date: 23 Aug 2023

Table of Contents

Notice

You are not a member of the advisory board for this topic. If you want to update advisory board member profile, please contact office@encyclopedia.pub.

Confirm

Only members of the Encyclopedia advisory board for this topic are allowed to note entries. Would you like to become an advisory board member of the Encyclopedia?

Yes

${ textCharacter }/${ maxCharacter }

Submit

Cancel

There is no comment~

${ textCharacter }/${ maxCharacter }

Submit

Cancel

${ selectedItem.replyTextCharacter }/${ selectedItem.replyMaxCharacter }

Submit

Cancel

Confirm

Are you sure to Delete?

Yes No