Medthods to Enhance Internet of Things Network Security

Medthods to Enhance Internet of Things Network Security: Comparison

Please note this is a comparison between Version 1 by Nebojsa Budimirovic and Version 2 by Peter Tang.

The internet of things (IoT) has ushered in a new era of connectivity, transforming various industries by enabling faster sensor and data access. The widespread adoption of the IoT is not without challenges. Devices often grapple with limited battery lifetimes, the need to function in remote locations, and demanding transceiver operations. Among these challenges, security stands out as the most daunting.

internet of things
feature reduction
convolutional neural networks
XGBoost

1. Introduction

The internet of things (IoT) has ushered in a new era of connectivity, transforming various industries by enabling faster sensor and data access. This enhanced networking capability has been pivotal in facilitating real-time monitoring, which is indispensable for further process optimization across sectors. Real-time data acquisition has significantly improved healthcare by enabling timely patient monitoring and emergency notification systems [1]. In healthcare, IoT devices range from blood pressure and heart rate monitors to advanced devices capable of monitoring specialized implants. The internet of medical things (IoMT) has even led to the creation of automated systems used to analyze health statistics [2]. The industrial application of the IoT includes asset management, predictive maintenance, and manufacturing process control. Overall, the IoT’s integration into various domains is revolutionizing the way we live and work, providing more efficient, cost-effective, and intelligent solutions [3].

In the manufacturing sector, the IoT has streamlined operations, ensuring efficient production and quality control. The integration of IoT devices has revolutionized the way machines and systems communicate and interact with each other, forming the backbone of smart factories. Through IoT, manufacturers can monitor and control their production processes in real time, with sensors embedded in machines, equipment, and products collecting data on various parameters such as temperature, pressure, vibration, and performance metrics. This data can be transmitted and analyzed instantly, providing valuable insights into operations. IoT also helps manufacturers achieve greater visibility and transparency across the entire supply chain, optimizing inventory management, reducing stockouts, and improving order fulfillment. Technological advancements continue to benefit manufacturers by opening new revenue streams, improving industrial safety, and reducing operational costs, reshaping the way manufacturers operate in the digital revolution [4].

The widespread adoption of the IoT is not without challenges. Devices often grapple with limited battery lifetimes, the need to function in remote locations, and demanding transceiver operations. Among these challenges, security stands out as the most daunting. While a device running out of battery is an observable setback, a data breach, often clandestine, can wreak havoc. IoT devices must communicate with each other and central systems, requiring complex transceiver operations. This can lead to high energy consumption and potential interference with other devices, complicating the network [5]. Security is indeed one of the most critical challenges in the IoT ecosystem. The interconnected nature of these devices means that a breach in one device can potentially compromise an entire network. Issues such as weak authentication, lack of encryption, and insecure interfaces can lead to unauthorized access and data theft [6]. Radio frequency (RF) attacks have become a prevalent attack vector within the IoT ecosystem. Attackers can exploit vulnerabilities in wireless communication protocols to intercept, modify, or disrupt the RF signals between devices. This can lead to data leakage, device malfunction, or even taking control of the devices [7]. Message Queuing Telemetry Transport (MQTT) is one of the standard application layer protocols emerging in the IoT ecosystem [8]. As an emerging technology, it is a popular target for malicious actors seeking new vulnerabilities [9].

To counter these threats, several network security measures, such as blocklists and firewalls, have been implemented ^[10][11][10,11]. However, artificial intelligence (AI) algorithms aim to address security challenges without the constraints of predefined rules or continuous manual intervention [12]. A pivotal factor influencing AI’s performance is the judicious selection of hyperparameters that steer the algorithm. With the burgeoning complexity of emerging algorithms, the conventional trial-and-error approach for hyperparameter tuning is becoming increasingly untenable. This optimization challenge can be equated to NP-hard problems, which are notoriously difficult to resolve using discrete methods [13]. A potential respite from this optimization conundrum lies in metaheuristic algorithms. While not always possible to pinpoint the absolute optimal solution, their iterative nature enhances the probability of identifying a near-optimal solution. Often, in practical scenarios, a ”good enough” solution is more valuable than an elusive perfect one. Additionally, it is important to explore and address emerging challenges that accompany developments in the field.

The history of intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) can be traced back to an academic paper written in 1986 ^[14][15]. The Stanford Research Institute developed the Intrusion Detection Expert System (IDES) using statistical anomaly detection, signatures, and profiles to detect malicious network behaviors. In the early 2000s, IDSs became a security best practice, with few organizations adopting IPSs due to concerns about blocking harmless traffic. The focus was on detecting exploits rather than vulnerabilities. The latter part of 2005 saw the growth of IPS adoption, with vendors creating signatures for vulnerabilities rather than individual exploits ^[15][16]. The capacity of IPSs increased, allowing for more network monitoring. Next-generation intrusion prevention systems (NGIPSs), which include capabilities like application and user control, were developed during this time period, marking a significant turning point. Sandboxing and emulation features were added to fulfill the requirement for defense against zero-day malware. By 2016, most businesses had deployed next-generation firewalls (NGFWs), which contain IDS/IPS functionality. High-fidelity machine learning is the current focus for tackling threat detection and file analysis ^[16][17]. The groundbreaking academic publication “An Intrusion-Detection Model” by Dorothy E. Denning, which inspired the creation of IDES, is one example of earlier studies that addresses intrusion detection in networks. To identify hostile network behaviors, the Stanford Research Institute used statistical anomaly detection, signatures, and profiles. Significant turning points in the development of IPS technology, such as the switch to NGIPSs and NGFWs, have been reached ^[17][18].

2. Convolutional Neural Networks

CNNs are a specialized subclass of artificial neural networks (ANNs) that are particularly well-suited for analyzing visual data. CNNs are designed to automatically and adaptively learn spatial hierarchies of features. This is particularly beneficial for tasks like image recognition, object detection, and even medical image analysis. The concept of residual learning, as introduced by Kaiming He et al., further enhances the capabilities of CNNs by allowing them to benefit from deeper architectures without the risk of overfitting or vanishing gradients ^[18][19]. As opposed to ANNs, CNNs employ local connectivity by linking each neuron to a localized region of the input space. This is in stark contrast to traditional ANNs, where each neuron is connected to all neurons in the preceding and following layers. Yann LeCun’s paper emphasizes that this local connectivity is crucial for the efficient recognition of localized features in images ^[19][20]. Furthermore, they use shared parameters across different regions of the input, which significantly reduces the number of trainable parameters. This is in contrast to traditional ANNs, where each weight is unique, leading to a much larger number of parameters and higher computational costs. CNNs are inherently designed to recognize the same feature regardless of its location in the input space. This is a crucial advantage over traditional ANNs, which lack this form of spatial invariance. Notably, they often employ deeper architectures, which are made computationally feasible through techniques like residual learning, as discussed in Kaiming He et al.’s paper. The Inception architecture, introduced by Christian Szegedy et al., is another example of a deep yet computationally efficient network ^[20][21]. CNNs are designed to be computationally efficient, particularly when dealing with high-dimensional data. The architecture leverages local connectivity and parameter sharing to reduce computational requirements. The concept of residual learning, as discussed in the paper by Kaiming He et al., allows CNNs to be trained more efficiently, even when the network is very deep. Notably, several unique architectural elements are associated with CNNs, and these include filters, kernels, and pooling layers. Filters and kernels use learnable weight matrices that are crucial for feature extraction. They slide or convolve across the input image to produce feature maps. Yann LeCun’s paper highlights the effectiveness of gradient-based learning techniques in training these filters ^[21][22]. Pooling layers serve to reduce the spatial dimensions of the input, thereby decreasing computational complexity and increasing the network’s tolerance to variations in the input. They are particularly useful in making the network robust to overfitting. CNNs can be effectively combined with other types of neural networks, like recurrent neural networks (RNNs), for sequential data processing tasks such as video analysis and natural language processing. Additionally, CNNs can be integrated with traditional machine learning algorithms, like support vector machines (SVMs), for tasks like classification, thereby creating a hybrid model that leverages the strengths of both methodologies. In summary, CNNs offer a robust, adaptable, and computationally efficient approach to a wide range of machine learning tasks. Their unique architecture, as validated by seminal research papers, makes them highly effective for tasks involving spatial hierarchies and structured grid data.

3. Extreme Gradient Boosting

XGBoost is an optimized distributed gradient boosting approach designed to be highly efficient and flexible. It has gained immense popularity in machine learning competitions and is widely regarded as the “go-to” algorithm for structured data. XGBoost has been optimized for both computational speed and model performance, making it highly desirable for real-world applications ^[22][23]. There are several advantages of decision-tree-based techniques ^[23][24]. One of the most significant advantages of decision trees is their ease of interpretation. They can be visualized, and the decision-making process can be easily understood, even by non-experts. Decision trees are computationally inexpensive to build, evaluate, and interpret compared to algorithms like support vector machines (SVMs) ^[24][25] or ANNs. Unlike other algorithms that require extensive pre-processing, decision trees can handle missing values without imputation, making them more robust. Decision trees can also capture complex non-linear relationships in the data, which linear models may not capture effectively. Further, this approach can be used for both classification and regression tasks, making it very versatile. Gini impurity is a metric used to quantify the disorder or impurity of a set of items. It is crucial for the “criterion” parameter in the decision tree algorithm. Lower Gini impurity values indicate more “pure” nodes. The Gini impurity is used to decide the optimal feature to split on at each node in the tree.

G i n i (t) = 1 - munderover

(1)

Further advantages of using XGBoost are that of ensemble learning ^[25][26]. Ensemble methods, particularly boosting algorithms like XGBoost, are less susceptible to the overfitting problem compared to single estimators due to their ability to optimize on the error. By combining several models, ensemble methods can average out biases and reduce the variance, thus minimizing the risk of overfitting. Ensemble methods often achieve higher predictive accuracy than individual models. XGBoost, in particular, has been shown to outperform deep learning models in certain types of data sets, especially when the data are tabular. The objective function optimized by XGBoost includes both a loss term and a regularization term, making it adaptable to different problems:

Obj (θ) = munderover

(2)

4. Metaheuristic Optimization

Metaheuristic optimization algorithms have gained significant attention in the field of computational intelligence for their ability to solve complex optimization problems that are often NP-hard. Traditional optimization algorithms, such as gradient-based methods, often get stuck in local optima and are not well suited for solving problems with large, complex search spaces. In contrast, metaheuristics offer several advantages ^[26][27]. Additionally, addressing the challenges of multi-objective optimization problems has been a focal point for many works, leading to the development of various multi-objective evolutionary algorithms ^[27][28]. However, a common hurdle in these algorithms is the delicate balance required between diversity and convergence. This balance critically impacts the quality of solutions derived from the algorithms ^[28][29]. Designed to explore the entire solution space, metaheuristics often find a near-optimal solution within reasonable time periods. They are problem-independent, meaning they can be applied to a wide range of optimization problems without requiring problem-specific modifications. Metaheuristics are highly scalable and can handle problems with a large number of variables and constraints. They are less sensitive to the initial conditions and can adapt to changes in the problem environment. Metaheuristics can find near-optimal solutions to NP-hard problems in polynomial time, which is a significant advantage over traditional methods that often fail to find feasible solutions within a reasonable time frame. Algorithms often draw inspiration from various natural phenomena, social behaviors, and physical processes. Some notable examples include the genetic algorithms (GAs) ^[29][30], inspired by the process of natural selection and genetics; particle swarm optimization (PSO) ^[30][31], based on the social behavior of birds flocking or fish schooling; the ant colony optimization (ACO) ^[31][32] algorithm, which mimics the foraging behavior of ants in finding the shortest path; and the firefly algorithm (FA) ^[32][33], which draws inspiration from courting rituals of fireflies. Additional recent examples include the salp swarm optimizer (SSA) ^[33][34], the whale optimization algorithm (WOA) ^[34][35], and the COLSHADE ^[35][36] optimization algorithm. Metaheuristics are a popular approach for researchers used to improve hyperparameter selections. Many examples exist in the literature, with some interesting examples originating from medical applications ^[36][37]. Further applications include time-series forecasting ^[37][38][38,39] and computer security ^[39][40][41][40,41,42]. Hybridization techniques have also shown great promise when applied to metastatic algorithms, often producing algorithms that demonstrate performance improvements on given tasks ^[42][43].