Federated Learning for Intrusion Detection Systems in IoV

Federated Learning for Intrusion Detection Systems in IoV: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor:

The Internet of Vehicles (IoV) has garnered significant attention from researchers and automotive industry professionals due to its expanding range of applications and services aimed at enhancing road safety and driver/passenger comfort. However, the massive amount of data spread across this network makes securing it challenging. The IoV network generates, collects, and processes vast amounts of valuable and sensitive data that intruders can manipulate. An intrusion detection system (IDS) is the most typical method to protect such networks. An IDS monitors activity on the road to detect any sign of a security threat and generates an alert if a security anomaly is detected. Federated Learning (FL) is a decentralized machine learning technique, FL allows model training on client devices while maintaining user data privacy.

Internet of Vehicles (IoV)
intrusion detection system (IDS)
Federated Learning (FL)

1. Introduction

The rapid expansion of the Internet of Things (IoT) has led to a number of novel applications, such as smart cities, smart grids, and the Internet of Vehicles (IoV). When these smart objects take the form of interconnected vehicles over the internet, the IoT becomes the IoV. Significant interest in IoV technologies has emerged due to substantial advancements in the smart automobile industry. IoV networks are integrated and open network systems that connect vehicles, human intelligence, neighboring environments, and public networks. These networks aim to increase road safety, reduce human error-related accidents, and mitigate congestion. This is accomplished by continuously monitoring traffic congestion. However, despite the numerous benefits offered by the IoV, several issues must be addressed to safeguard the lives of all road users. The IoV is vulnerable to cyberattacks, which threaten its stability, robustness, and can lead to vehicle unavailability and traffic accidents. Since communication in these networks requires the involvement of multiple components, they are susceptible to a broad array of attacks. Thus, ensuring their security requires advanced intrusion detection systems (IDSs) that can address potential cyberattacks. IDSs excel at identifying anomalies and attacks in the network’s data during communications between vehicles and various devices. Given that the IoV is a relatively new network paradigm, new and ever-evolving attacks against it continue to emerge. The IoV network creates a huge amount of data very quickly, especially when there are cyberattacks.The accuracy of machine learning and deep learning approaches makes them a preferred choice in this high-stakes environment ^[1]. Nevertheless, the need to store and transmit data to a centralized server may compromise privacy and security. In contrast, Federated Learning (FL), a decentralized learning approach that protects privacy, trains models locally before sending only the parameters to the centralized server.

2. Federated Learning for Intrusion Detection Systems in Internet of Vehicles

2.1. Intrusion Detection Systems Based on Federated Learning

The emergence of IDSs that utilize FL represents a significant advancement in cybersecurity. This innovative technique ensures the security of networked environments while upholding data privacy ^[2]. Unlike conventional IDSs that depend on centralized data analysis, FL-based IDSs operate on a decentralized principle. Within this innovative framework, each device independently generates localized ML models by leveraging their own data inputs. These models are subsequently improved through a collaborative learning process, where devices communicate changes to the models rather than exchanging raw data ^[3]. Ongoing research efforts continuously enhance this approach, leading to the emergence of FL-based IDSs as a potential future in the pursuit of secure and privacy-conscious network defense mechanisms ^[4].

Motivation to Adapt Federated Learning in Intrusion Detection Systems

The incorporation of FL into IDSs is driven by the significant demand for heightened security and privacy in our increasingly interconnected society. Despite the notable advancements made by ML and DL in the field of IDSs, various limitations associated with these technologies must be acknowledged, particularly concerning data privacy and communication efficiency. FL addresses these challenges by facilitating localized model training without compromising the privacy of raw data, thereby safeguarding individual privacy while promoting collaborative learning.

FL facilitates decentralized, real-time threat detection in contexts such as the IoT or IoV, where various geographically scattered devices generate data. The IDS’s capacity to adapt to local contexts allows it to detect and recognize distinct threats peculiar to individual environments. The motivations for implementing FL in IDSs revolve around several essential elements, including the following ^[5]:

Privacy preservation: FL enables collaborative model training while ensuring the privacy of sensitive raw data. Data privacy is of utmost importance in contexts where it holds significant value, such as the healthcare, finance, or government sectors. FL guarantees the protection of individual privacy by maintaining data locally and exchanging model updates. This approach aligns with legal and ethical requirements around privacy.
Data efficiency: Data efficiency is a significant concern in conventional centralized systems, as transmitting substantial amounts of raw data to a central server may prove unfeasible. This is particularly true when there are constraints on available bandwidth or communication costs are high. FL addresses this issue by focusing on lowering the volume of data transferred. Specifically, only updates to the model are exchanged, resulting in a substantial reduction in communication overhead.
Adaptability and customization: The adaptability and customization of FL models allow for their adaptation to specific local settings. In the IDS field, various contexts may encounter distinct and specific threats. FL permits individual devices to customize their intrusion detection models based on their unique threat landscapes, ensuring precision in identifying potential threats.
Continuous learning: Continuous learning is essential in the security field as threats perpetually evolve. FL permits the ongoing updating of models as new data become accessible. The capacity to adapt in real time ensures that IDSs remain effective in the face of developing threats, providing a significant advantage in dynamic situations.
Robustness and fault tolerance: The inherent robustness of FL systems is based on their ability to withstand and recover from faults. In the event of a device failure or offline status, the system can maintain operation by utilizing the remaining functional devices ^[6]. The maintenance of fault tolerance is of the utmost importance in guaranteeing uninterrupted intrusion detection capabilities inside diverse and large-scale networks.
Decentralization and edge computing: The utilization of FL facilitates decentralized learning, which aligns with edge computing principles, wherein data processing occurs in close proximity to its origin. In scenarios like IoT or IoV, where devices are dispersed geographically, FL enables localized learning, ensuring prompt reactions to potential risks without dependence on a central server.

These elements make FL a compelling and viable approach for enhancing the efficacy and confidentiality aspects of IDSs in diverse settings.

2.2. Related Surveys

A few reviews have focused on the topic of FL-based IDSs. Table 1 succinctly outlines the primary differentiators. For instance, ref. ^[5] offers a comprehensive survey of FL-based IDS approaches and discusses the difficulties and challenges of using these methods. Meanwhile, the authors of ^[7] focus on the current scientific progress of FL applications in attack detection problems for IoT and explore these applications. The extensive review presented in ^[8] draws from an analysis of 39 research papers published from 2018 to March 2022, with a specific focus on the IoT. The analysis examined evaluation variables related to IoT, particularly concerning FL, and identified and dis-cussed prospects and unresolved issues pertaining to FL-based IoT. The authors of ^[9] also provided an overview and comparison of six studies that use FL to enhance IDS effectiveness for IoT. In the absence of specific datasets for assessing FL, the authors emphasized data partitioning modeling among clients. Additionally, they investigated the modeling of bias in the test data to assess its impact on the effectiveness of the ML model. The authors of ^[10] discussed the implementation of FL-based IDSs in various domains and highlighted distinctions between different architectural configurations. Their structured literature analysis offers a reference architecture that can be used as a set of principles for comparing and designing FL-based IDS. Despite significant progress in FL for IDS development, a comprehensive survey exploring FL for IDS applications within the context of IoV is conspicuously lacking.

Table 1. Summary of related surveys on Federated Learning-based IDS.

Survey Title	Year	Main Focus	Key Contributions	IDS	IoV
Survey ^[5]	2021	FL-based IDS	Discussion on the role of FL in intrusion detection - Comprehensive review of ML/DL/FL in intrusion detection - Highlighting open research challenges	✓	X
Survey ^[7]	2022	FL in IDS within (IoT) domain	Understanding of federated learning, privacy preservation, and anomaly detection in network systems, with a particular focus on applications in IoT and related domains.	✓	X
Survey ^[9]	2022	FL-based IDS	- Review of FL system architectures - Review of Evaluation Datasets - Comparative analysis of proposed systems Open challenges and future directions	✓	X
Survey ^[8]	2022	FL-based IoT	Organizing and reviewing FL-based IoT domains - Creating a taxonomy to organize various aspects of FL-based IoT Providing some research questions about the FL-based IoT area and answering them Reviewing evaluation factors Focusing on open issues and future research challenges	X	X
Survey ^[10]	2022	FL-based IDS	Review of FL application in attack detection and mitigation Proposal of a reference architecture Establishment of a taxonomy Identification of open issues and research directions	✓	X
Survey (10.3390/fi15120403)	2023	FL-based IDS in IoV environment	Offer of a generic taxonomy for describing FL systems A well-organized literature review on IDSs based on FL in an IoV environment. Highlighting challenges and potential future directions based on the existing literature.	✓	✓

Note: In this table, ✓: indicates that the survey discussed the relevant aspect of Federated Learning (FL) or Intrusion Detection Systems (IDS), while X signifies that the aspect was not discussed in the survey.

2.3. Comparative Analysis of Federated Learning-Based Intrusion Detection Systems for Internet of Vehicles

The analysis of the research papers aided us in formulating the following conclusions:

Dataset: The selection of a dataset is a crucial aspect when evaluating the effectiveness and resilience of proposed solutions in the field of IDS based on FL within the context of IoV. Given the dynamic and complex nature of IoV, it is imperative to use datasets that can accurately depict real-world vehicular communication scenarios, encompassing both normal and malicious activities. These datasets play a fundamental role in training and evaluating IDS models, enabling them to effectively identify threats within the IoV environment. The following describes the datasets utilized in the provided papers to assess the efficacy of various IDS solutions. Three of the papers, namely ^[11]^[12]^[13], employed the CAN-intrusion dataset (OTIDS), which was sourced from the Hacking and Countermeasure Research Lab at Korea University. This dataset provides a comprehensive representation of intrusion scenarios within in-vehicle networks, making it suitable for assessing IDSs specifically designed for vehicular contexts. By contrast, refs. ^[14]^[15]^[16] employed the VeReMi dataset for their experimental analysis. The publicly accessible VeReMi dataset was explicitly developed for analyzing mechanisms to detect misbehavior in VANETs. The authors of ^[17]^[18]^[19] employed the Car-Hacking dataset derived from the “Car Hacking: Attack & Defense Challenge” competition held in 2020. Additionally, some papers used simulated datasets, such as ^[20], where a simulated dataset was employed to evaluate the effectiveness of their proposed approach in vehicle-to-vehicle and ve-hicle-to-infrastructure scenarios. The authors of ^[21] employed a simulated attack dataset consisting of simulated Sybil attack flows and normal traffic flows in their experimental analysis. Meanwhile, the simulations in ^[22] were conducted using the authors’ proprietary dataset. Although the NSL-KDD and CIC-IDS 2017 datasets are not dedicated to IoVs and are primarily general intrusion detection datasets, the authors of ^[23]^[24] conducted their experiments on these datasets to evaluate the performance of their proposed methods. Finally, ref. ^[25] utilized the [RAKGZ20] dataset to evaluate the authors’ proposed solutions. These datasets collectively offer a comprehensive view of various intrusion detection scenarios, particularly within automotive networks.
Attacks detected: Within the domain of FL-based IDSs for IoV, numerous research papers have put forth methodologies to identify a diverse range of cyber threats. DoS attacks ^[11]^[12]^[13]^[18]^[19]^[22]^[23] and constant attacks ^[14]^[15]^[16] are the most frequently discussed types of attacks in the literature. In addition, some authors emphasized specific attacks, such as the Sybil assault ^[21] and the black hole attack ^[20]. Several papers also explored detecting advanced attacks in in-vehicle networks, including adversarial attacks like fuzzy attacks ^[12]^[13]^[17]^[18]^[19], flooding attacks ^[17]^[25], and spoofing attacks ^[11]^[17]^[18]^[19]^[22]. These studies highlighted the diverse and persistent nature of cyber threats in the IoV environment, underscoring the critical need for robust IDS solutions. IDSs based on FL in IoV not only demonstrate the adaptability and robustness of FL techniques but also illustrate the essential role these techniques play in protecting the future of connected vehicular systems against a wide array of cyberattacks.
ML models: Researchers have turned to more powerful ML models to construct resilient FL-based IDSs capable of addressing challenges posed by vehicular networks. These models, tailored to meet the unique requirements of vehicular communication, offer promising ways to detect and mitigate potential attacks. To improve detection capacities and ensure vehicular safety, numerous ML models based on FL in IoV have been implemented in the field of IDS. The following summarizes the ML models utilized in the proposed solutions.

-

Long short-term memory (LSTM): This architecture of recurrent neural networks is prominently featured in articles ^[11]^[14]^[16]^[17]^[22]. One notable advantage of this approach is its proficiency in identifying patterns over different time intervals, making it well-suited for analyzing time-series data such as network traffic.

-

Deep convolutional neural network (DCNN): Papers such as ^[15]^[19] utilized DCNNs to effectively handle structured grid data, including images or time-series data. These DCNNs possess the capability to automatically and adaptively learn spatial hierarchies.

-

Support vector machine (SVM): ref. ^[19] utilized SVM, a supervised ML approach applicable to both classification and regression tasks.

-

Statistical adversarial detector: As explicitly stated in ^[13], this approach employs statistical techniques to identify adversarial examples.

-

Random forest: refs. ^[12]^[20] employed the random forest algorithm, an ensemble learning technique. This algorithm constructs numerous decision trees during the training phase and determines the class output by selecting the mode of the classes for classification.

The utilization of a wide array of ML models in the articles highlights the intricate and multifaceted characteristics of intrusion detection in IoV. Researchers have used diverse techniques, such as recurrent networks like LSTM, capable of capturing temporal relationships, and ensemble methods like random forest, which provide robustness. These approaches enhance the security and dependability of vehicular networks.
Communication patterns: Most of the articles provided solutions formulated according to the client–server mode of operation, as exemplified by ^[18]^[19]^[20]^[23]^[24], among others. In this mode, clients engage in the process of training their models on a local level without sharing raw data. Subsequently, the model updates are transmitted to the server, the central entity responsible for aggregating them. This procedure guarantees the protection of data privacy and minimizes the necessity of data centralization. Meanwhile, some papers adopted a federated architecture with an edge-offloading technique ^[15]^[25]. As mentioned above, this approach diminishes latency and reduces dependence on a remote cloud server. As discussed in the publications mentioned above, the client–server mode of operation emphasizes the shifting paradigm of decentralized data processing in IoV. FL-based IDSs not only protect users’ data privacy but also pave the way for more effective and scalable security solutions in rapidly developing vehicular networks. These systems enable vehicles to train models locally, with central servers aggregating the training results.
Communication synchronization: The communication synchronization mode, whether synchronous or asynchronous, significantly impacts the efficiency and effectiveness of the FL process. Ref. ^[11] discussed the operational characteristics of synchronous FL, which involves a single launch point and a single aggregate point for the global model. In this model, the beginning of each iteration occurs concurrently for all clients, and the federated aggregation process is performed without establishing a predetermined objective for the learning rounds. In ^[14], the authors presented a synchronous FL approach, and ref. ^[20] introduced a conventional synchronous FL protocol. This protocol is considered appropriate for a wide range of FL scenarios, including those involving bottlenecks. On the other hand, ref. ^[17] preferred an asynchronous mode, which can provide greater flexibility in dynamic settings and effectively handle frequent model changes and bottlenecks. This strategy enables increased adaptability in the learning process, accommodating partial updates from clients that may impact convergence performance. Nevertheless, not all research explicitly addressed this matter. Most of the publications did not specify their operational mode concerning synchronization. The variations mentioned above highlight the varied approaches that researchers have utilized to enhance the effectiveness of IDSs within the rapidly changing environment of IoV. In summary, while synchronous FL was a prevalent technique in the suggested solutions, some studies acknowledged the advantages of asynchronous methods, particularly in environments characterized by frequent updates and potential bottlenecks.
Evaluation metrics: The evaluation of the efficacy of FL-based IDS systems relied on ML measures that assess the effectiveness of the analytic model. These metrics include accuracy, precision, recall, and F-measure. A limited number of research publications examined the effects of FL. In particular, ref. ^[15] discussed the consensus time, which is impacted by the quantity of FL workers and the number of created blocks. The study additionally assessed the effectiveness of the FL-enabled edge node by manipulating the reward and accuracy of the local model. This evaluation considered various elements, including the reward, energy consumption, and processing overhead. Moreover, the researchers did not overlook the significance of accuracy as a fundamental measure for evaluating the efficacy of their proposed solution. The paper also addressed the issues associated with recruiting FL workers, highlighting the possibility of bias and imbalance when selection is primarily predicated on reputation. The authors proposed various strategies to address these difficulties, such as including randomization in the selection procedure. In addition, in ^[21], the authors considered the “number of global aggregations (NGA)” as an evaluation metric. They presented information regarding the number of global aggregations performed in the proposed system and other state-of-the-art baseline frameworks. Their research demonstrated how many global aggregations are necessary for different numbers of communication rounds (R) to achieve the desired level of accuracy. The FLEMDS framework proposed in the study necessitates a reduced number of global aggregations in comparison to the baseline frameworks to attain a comparable level of accuracy.
Aggregation model: In the domain of distributed ML, the combination of data or model updates from several nodes holds significant importance in determining the overall performance and efficiency of the system. The aggregation process has been extensively explored in contemporary research, with numerous novel approaches and models offered in recent research papers. These aggregation models aim to successfully harness the collective intelligence of all participating nodes while simultaneously overcoming problems such as data heterogeneity, communication overheads, and adversarial threats.

The examined literature suggested a range of aggregation models to improve the effectiveness and precision of distributed systems, particularly in the domain of FL. One of the most common aggregation models used is the federated averaging method, where local model updates are averaged to produce a global model ^[14]^[15]^[17]^[18]^[19]^[25]. This approach is simple yet impactful, particularly in situations involving non-identically and independently distributed (non-IID) data ^[26]. An alternative methodology uses weighted federated averaging, as described in several papers ^[20]^[21]^[23]. This technique involves assigning varying weights to local models, considering factors such as the quantity of data samples or the quality of the model. Secure aggregation is another widely employed model aggregation technique in the field of FL, as observed in ^[11]. In this technique, various cryptographic techniques, including SMPC, are employed to consolidate data while preserving the confidentiality of the unprocessed updates. The authors of ^[24] used the Bagging Classifier technique as aggregation model in their developed solution. This technique aggregates the predictions of multiple models to produce a single, more accurate model. The resulting supermodel, created by the central server, exhibits better robustness than the individual edge device models.

Each aggregation method provides specific benefits designed to address the challenges and requirements of dispersed learning settings. As technology advances and increasingly intricate situations arise, these models are expected to continue to develop, facilitating the implementation of more resilient and effective distributed learning systems. The ongoing investigation and advancement of aggregation models serve as evidence of the dynamic characteristics of ML research and its dedication to optimizing the utilization of distributed nodes’ collective intelligence.
Optimization algorithms: The utilization of FL in IDSs presents a new and innovative method for addressing the issues related to data privacy and effective model training in IoV. Advanced algorithms play a pivotal role in optimizing FL models. For instance, the federated proximal algorithm has been used to fine-tune model parameters, ensuring optimal performance in detecting intrusions ^[11]. Similarly, some studies have adopted federated stochastic gradient descent (federated SGD) to optimize the parameters of the proposed IDS models ^[14]^[15]^[18]. Furthermore, some papers utilized other optimization techniques, such as the Adam optimizer ^[17]^[23], a fuzzy logic-based technique ^[21], Bayesian optimization (BO) ^[19], and the federated averaging (FedAvg) algorithm ^[25]. The authors of ^[24] used the grid search method for hyperparameter tuning as an optimization algorithm in their solution. This method is employed to optimize the Cat Boost model, a gradient boosting algorithm that utilizes decision trees as the classifier model for edge devices. The grid search technique exhaustively searches over a specified set of hyperparameters to improve the model’s accuracy.

The integration of optimization approaches, combined with the decentralized nature of FL, holds the potential to deliver resilient and effective IDSs for the IoV environment. Decentralizing the learning process and applying complex optimization algorithms not only enhances detection capabilities but also ensures that modern concerns regarding privacy and efficiency within the IoV landscape are effectively addressed. This represents a significant advancement for the industry.
Real-time processing: A critical aspect of FL-based IDSs is their ability to process data in real time, ensuring timely detection and response to potential threats. Refs. ^[11]^[14]^[18] highlighted the significance of real-time processing for IDSs, especially when dealing with vehicular networks. In addition, ref. ^[19] introduced ImageFed IDS, a system designed for real-time inference. It employs a lightweight image-based feature extraction for CAN packets, making it suitable for real-time applications. On the other hand, some papers supported a batch processing approach rather than real-time processing ^[17]^[21]^[23]. Some papers did not explicitly mention whether their proposed solutions are designed for real-time or batch processing. Nevertheless, all the papers emphasized the importance of real-time processing in IDSs for IoV, with various solutions and methodologies proposed to achieve this objective. The operational significance of IDSs for vehicle networks increases as these networks undergo continuous evolution and encounter a diverse range of cyber threats. The research presented in these papers offers solutions and approaches that contribute to the development of a more secure and responsive IoV environment by emphasizing the significance of real-time processing.
Data distribution: Five articles, namely ^[11]^[14]^[19]^[23]^[24], that specifically addressed the issues and implications associated with imbalances in data distribution in FL scenarios. They stressed the importance of dealing with this problem to achieve robust and stable model performance. The authors of ^[11] emphasized that in real FL contexts, the data distributed across many nodes or devices may exhibit non-IID characteristics. These characteristics sometimes arise due to an imbalanced distribution of data, wherein certain data classes may be overrepresented in one node while being underrepresented in another. To overcome this difficulty, the study suggested an IDS that uses FL to help handle imbalanced data distribution. The authors of ^[14] examined the vulnerability of models to adversarial attacks, particularly when confronted with data imbalance. The presence of an imbalance in vulnerability can be exploited by adversarial examples, resulting in the misclassification of benign data. The authors presented various techniques for identifying these adversarial examples, indirectly addressing the difficulties associated with data imbalance. From another perspective, the authors of ^[19] showed that data distribution among vehicles in FL scenarios, particularly in the context of IoV, might exhibit a significant imbalance. This imbalance can potentially impact the overall performance of the global model. The paper introduced various methodologies aimed at alleviating the repercussions of this imbalance, thereby ensuring the robustness of the FL framework. The issues presented by imbalanced data distribution were also addressed in ^[23]. The authors highlighted the potential emergence of unexpected attack behaviors in the context of IoV development. The absence of comprehensive analysis and systematic gathering of various attack behaviors has resulted in an imbalanced distribution of sample data categories within intrusion detection for IoV. Consequently, this disparity leads to diminished accuracy in detection. The authors proposed an intrusion detection approach integrating FL and a memory-augmented autoencoder (FL-MAAE) to tackle this issue. They have considered the problem posed by imbalanced data distribution in their produced solution, hence ensuring the continued effectiveness of the model. Lastly, the proposed framework in ^[24] employs the Synthetic Minority Over-sampling Technique (SMOTE) to tackle the issue of class imbalance in the dataset. This approach of oversampling minority classes helps to create a more balanced dataset, which in turn allows for a more accurate and representative evaluation of the classification models. Addressing data imbalance is critical for guaranteeing the resilience and dependability of ML models, particularly in distributed learning scenarios such as FL.
The overhead: One of the primary issues frequently encountered in the domain of IDSs based on FL is the significant overhead associated with these systems. The effectiveness and responsiveness of IDSs in IoV contexts can be significantly affected by overhead, including computing, communication, and storage expenses. Addressing this overhead is crucial to ensure the seamless operation of these systems without compromising their primary function of identifying and mitigating threats. In ^[11], the term “overhead” refers to the complexity of the algorithms offered, and the authors stressed how important it is to reduce this complexity as much as possible to ensure efficient operations. In addition, ref. ^[14] discussed overhead in the context of communication costs, emphasizing the relevance of minimizing overhead to improve system performance. Overhead was explored in relation to the computing expenses of the proposed approaches in ^[15], which emphasized the necessity of striking a balance between accuracy and computational efficiency in the methods offered. The research presented in ^[21] investigated the overhead caused by the consensus process in blockchains and suggested that using a lightweight consensus method can reduce overhead and increase scalability. The topic of overhead was discussed in the context of data transmission in ^[12], which emphasized the significance of effective data-sharing systems to reduce overhead. Lastly, ref. ^[19] provided a comparative analysis of various solutions.

3. Summary

The deployment of Federated Learning on Internet of Vehicles devices: Deploying an FL-enabled IDS architecture on real IoV devices presents many challenges. One notable obstacle involves the presence of resource constraints since IoV devices frequently have restricted processing capabilities and memory capacities, making the efficient execution of intricate FL algorithms difficult. This challenge can be exacerbated when employing deep learning techniques, as they often require more computational resources than traditional ML ^[27]. To overcome these restrictions, a prevailing approach involves the implementation of intermediate nodes positioned at the network edge. These nodes serve as clients for FL, receiving data from end devices. Real-time processing poses an additional challenge in the context of IDSs in IoV. These IDSs need to effectively evaluate incoming data and promptly identify any instances of intrusion, requiring the implementation of algorithms that strike a delicate balance between accuracy and processing speed. Consequently, more work is needed to examine the real-world constraints of FL-enabled IDS techniques in IoV contexts to ensure optimal levels of security and efficiency.
Limitations of existing FL-enabled IDS datasets for IoV: The current datasets available for FL-enabled IDSs in the context of IoV exhibit various limitations. The issue of data diversity presents a notable obstacle as datasets may lack a comprehensive representation of the wide range of real-world scenarios and driving conditions, resulting in the development of biased models. Data imbalance is a significant issue that warrants attention, as specific categories of security threats may be inadequately represented in the dataset, posing challenges for the FL-enabled IDS to detect these less frequent intrusions accurately and efficiently. Data quality is essential, as any inaccuracies or noise present in the data can significantly impact the learning process, potentially leading to the development of intrusion detection models that are less reliable and potentially misleading. Furthermore, the issue of data privacy poses a significant constraint in the context of IoV. The data generated by IoV systems frequently encompass confidential personal and vehicular details, thereby presenting a formidable obstacle in creating extensive datasets that simultaneously safeguard users’ privacy. The concern regarding the scalability of current datasets becomes particularly significant as IoV networks experience rapid expansion. These constraints must be acknowledged and addressed to create resilient IDSs that effectively capture the complicated nature of actual IoV settings while upholding user privacy and data integrity.
Aggregator as a bottleneck: In the context of IoV scenarios involving FL-enabled IDS, the aggregator frequently becomes a bottleneck despite being a central component. The processing capacity of the aggregator can be overwhelmed by the sheer volume of incoming information if data from multiple vehicles are sent to the aggregator for model training and updating ^[27]. The influx of data, especially in extensive IoV networks, has the potential to result in delays when it comes to aggregating and updating FL models. Furthermore, given the real-time nature of intrusion detection in vehicle contexts, introducing any delay at the aggregator level can impede prompt responses to security threats. The challenge of balancing the requirement for comprehensive model updates with the practical constraints of aggregators is of utmost importance. This necessitates using innovative approaches in distributed computing, efficient algorithms, and optimized communication protocols. These measures are necessary to address the bottleneck and ensure the smooth operation of FL-enabled IDSs in IoV scenarios.
Client selection: Identifying suitable clients for FL-enabled IDSs in the IoV context presents a significant challenge. During each training iteration, the coordinator can choose a specific subset of devices to engage as FL clients in the training procedure. The environments in which IoV operates exhibit a high degree of dynamism, characterized by the continuous movement of vehicles within and beyond the network coverage area. The dynamic nature of the environment poses difficulties in maintaining a consistent group of clients who actively participate in the training of FL models. For instance, specific devices may not be accessible during a particular round due to mobility issues or disruptions in connectivity. In addition, the criteria for selection need to consider factors such as the device’s current state, its battery life, its computational and networking capabilities, and even the precision of the ML technique. The client selection process can significantly impact the accuracy achieved and, consequently, the detection of potential security breaches within the framework of an IDS approach. Striking a balance in the client selection process, where a diverse, accurate, and current dataset is maintained, necessitates the utilization of advanced algorithms and real-time decision-making to manage the ever-changing pool of participating vehicles effectively. Addressing this challenge is essential to maintain the integrity and accuracy of FL-enabled IDSs in IoV scenarios. Therefore, future strategies for devising an efficient client selection process in IoV systems must consider the dynamic nature of device conditions throughout each training iteration.
Security attacks: In the context of FL-enabled IDSs in IoV scenarios, security attacks pose a severe threat. Attackers can exploit vulnerabilities inherent in the FL architecture ^[28]. These exploits can manifest as various types of attacks, including data poisoning ^[29], where adversaries inject deceptive data into the training process to manipulate the IDS model ^[5]. Model inversion attacks can also occur, in which attackers attempt to deduce confidential data from the trained model. In addition, the confidentiality and integrity of data might be compromised by eavesdropping attacks that specifically target the communication channels established between vehicles and the central server. To address these security concerns, robust security measures are essential, including strong encryption, secure communication protocols, anomaly detection techniques, and continuous monitoring. Preserving security in FL-enabled IDSs within IoV scenarios is of utmost importance for protecting against a diverse range of potential cyberattacks and maintaining the efficiency of IDSs in interconnected vehicular networks.
Privacy concerns: Privacy considerations emerge as a significant challenge in the context of FL-enabled IDSs in IoV scenarios. Although the primary purpose of FL is to address the privacy concerns associated with centralized learning methods, FL may still inadvertently disclose information from the training data of individual clients. FL relies on data provided by individual vehicles for the purpose of training models, raising issues concerning user privacy and data confidentiality. Within IoV, vehicles can generate substantial quantities of sensitive data, including location information, driving behavior, and recordings of communication. The central issue revolves around the need to effectively utilize this data for training IDS models while safeguarding the privacy of both vehicle owners and occupants. As a result, there has been a notable surge of interest has occurred in implementing privacy-preserving methodologies in the field of FL ^[30]. These methodologies include differential privacy techniques, SMPC, and homomorphic encryption. However, using these advanced approaches often entails a trade-off in terms of precision and effectiveness, potentially compromising the IDS’s ability to identify attacks. Deploying these advanced methods is necessary to strike a balance between the need for effective intrusion detection, strict privacy requirements, and meeting user expectations. Further research is required to find the optimal balance between privacy and performance to develop efficient IDS methodologies.
Communication efficiency: Implementing FL-enabled IDSs within IoV introduces a significant challenge in terms of communication efficiency. In IoV scenarios, where vehicles are in constant motion, transmitting substantial amounts of data to train FL models on a central server can strain network bandwidth and result in significant communication overhead. This challenge is further exacerbated by the need for real-time intrusion detection, where rapid responses are crucial. Optimizing communication protocols and data transmission techniques is essential to alleviate the network’s burden while ensuring the timely delivery of relevant data to the central server for model updates. Future research in this field is oriented towards developing sophisticated communication-efficient techniques tailored specifically for IoV scenarios. Approaches such as model quantization, edge computing, and strategic data sampling can be leveraged to minimize the volume of transferred data, thereby enhancing communication efficiency. Balancing the requirement for extensive data exchange with the constraints imposed by network bandwidth is essential for the effective implementation of FL-enabled IDSs in dynamic and bandwidth-limited IoV environments. Research efforts also focus on exploring 5G and beyond-5G technologies, which hold the potential to provide increased bandwidth and reduced latency. These advancements can significantly transform the communication landscape of FL-enabled IDSs in IoV.
Encryption standards: Encryption standards play a significant and multifaceted role in the context of FL-enabled IDSs within IoV. Ensuring the security and privacy of sensitive vehicular data during the transmission process is of paramount importance. The main challenge lies in adopting encryption standards that combine robustness and efficiency to effectively manage the substantial volumes of data transmitted between vehicles and central servers. Moreover, within the FL framework, which entails collaborative model training on various devices, selecting encryption methods that can protect data while preserving the integrity of the collaborative learning process is a complex task ^[31]. Future advancements in this field primarily focus on developing encryption techniques that successfully reconcile the requirements of security, efficiency, and the necessity for collaborative learning. Research efforts aim to establish standardized encryption protocols tailored specifically for IoV settings. These protocols are intended to ensure data security and integrity while facilitating seamless model updates and promoting collaborative learning within a broad spectrum of vehicular networks.
Edge computing: Incorporating edge computing into FL-enabled IDSs within IoV introduces both challenges and potential solutions. While local data processing on devices has the potential to alleviate network bandwidth demands, it also brings about issues related to resource limitations and data diversity. IoV devices, often constrained in terms of available resources, face difficulties when attempting to execute computationally intensive FL algorithms on the device itself. Furthermore, ensuring consistency and accuracy in updating models across various vehicles with different hardware configurations and data formats presents a significant challenge. Future research in this domain seeks to enhance the effectiveness of edge computing methodologies, facilitating efficient local data processing and collaborative learning while mitigating the variations in device capabilities. Leveraging edge computing, IDSs empowered by FL in IoV can realize benefits such as reduced communication overhead and improved real-time intrusion detection capabilities ^[32]. This, in turn, contributes to the establishment of more secure and responsive vehicular networks
Optimization of Federated Learning and intrusion detection system parameters: FL predominantly relies on deep learning models that involve a diverse set of trainable parameters, which the user can configure. Additionally, IDSs are highly sensitive to these parameters. The next research avenue in FL-enabled IDSs for IoV involves optimizing FL and IDS parameters, as this directly impacts performance and training effectiveness ^[5]. Given the dynamic and diverse nature of IoV environments, it becomes imperative to identify the most suitable parameters for FL algorithms. This includes determining appropriate learning rates, aggregation methods, and local model parameters. In addition, customizing these parameters for specific intrusion detection tasks and diverse vehicular datasets can significantly improve the performance and accuracy of FL-enabled IDSs ^[10]. Future research should explore these factors in greater depth, utilizing methodologies such as hyperparameter tuning and adaptive learning algorithms ^[10]. By optimizing these parameters, researchers can finely tailor FL-enabled IDSs to suit specific IoV scenarios. This optimization process ensures effective collaboration, precise intrusion detection, and minimized communication overhead, ultimately paving the way for the development of more robust and responsive vehicular security systems.
Heterogeneity and interpretability of the Federated Learning model: In the realm of FL-enabled IDSs for IoV, the heterogeneity and interpretability of FL models are of paramount importance. Heterogeneity stems from the distinct characteristics of vehicular data and the varying capabilities of different vehicles and their sensors. Coordinating multiple models for effective collaboration, especially in real-time intrusion detection, introduces a high degree of complexity. Moreover, prioritizing the interpretability of these models is crucial, as it enables a comprehensive understanding of the rationale behind intrusion alerts. This understanding is valuable for both developers and end-users. Future research endeavors are geared towards developing approaches that harmonize these diverse models, ensuring their seamless integration to enhance intrusion detection accuracy Simultaneously, researchers are dedicated to enhancing the interpretability of FL models through methodologies like explainable AI, which provides insights into the decision-making processes of these models. By effectively addressing these challenges, FL-enabled IDSs in IoV can achieve a state of equilibrium that encompasses various data sources, model interpretability, and efficient intrusion detection. This, in turn, fosters confidence and comprehension among stakeholders in vehicular security.
Big data management: Effective management of big data poses a significant challenge within the context of FL-enabled IDSs in IoV. The sheer volume, velocity, and diversity of data generated by vehicles require robust storage, processing, and analysis capabilities ^[33]. The integration of FL-enabled IDSs necessitates the use of extensive data for training and model updates. Efficiently handling this vast amount of data is paramount. The complexity lies in maintaining timely data collection, aggregation, and storage while preserving real-time intrusion detection capabilities, particularly when considering the limited resources of vehicle networks. Future studies will concentrate on creating distributed and scalable storage systems, better data processing algorithms, and advanced data analytics methods. By addressing big data management challenges, FL-enabled IDSs in IoV can harness the wealth of vehicular data efficiently, enhancing the precision and agility of IDSs in dynamic and networked vehicular environments.
Sparse data: Vehicle data, especially regarding specific types of security threats, can be sparse and unevenly distributed across vehicles. Data sparsity may lead to biased models, as they might not adequately capture certain types of intrusions. Consequently, this limitation can hinder the overall effectiveness of the IDS. Addressing the issue of sparse data requires innovative methodologies, such as data augmentation, imputation approaches, or customized algorithms designed to handle incomplete datasets effectively ^[34]. Future research efforts aim to develop algorithms that can successfully enable FL models to learn from limited and irregular data. By effectively tackling the issue of sparse data, FL-enabled IDSs in IoV can enhance their precision, ensuring a more comprehensive and nuanced understanding of various intrusion patterns across different vehicular scenarios.
Stability: Stability is a significant challenge within the context of FL-enabled IDSs in IoV. The inherent instability of the FL process is introduced by the dynamic nature of vehicular networks, characterized by the continuous changes in the composition and positions of vehicles. This variability can potentially disrupt the FL environment, affecting the consistency and accuracy of the IDS models. Maintaining stability requires the implementation of robust systems to address fluctuations in participation rates, network disconnections, and intermittent data availability ^[33]. Future research aims to develop algorithms that can adapt dynamically to changes in the network, ensuring the stability of FL models, even when confronted with evolving IoV scenarios. By addressing this challenge, FL-enabled IDSs in IoV can consistently perform at a high level, providing reliable capabilities for detecting unauthorized access despite the everchanging characteristics of vehicular networks.
Reliability: Applications related to intelligent transportation and unmanned aerial vehicle detection demand a high level of reliability due to their safety-critical nature. Failures in meeting reliability standards can lead to severe consequences, including significant loss of life and property. Achieving reliability in the context of intrusion detection within a diverse and dynamic vehicle network presents significant challenges. Maintaining constant and accurate IDS performance is complicated by factors such as network latency, fluctuations in data quality, and the reliability of data transfer from individual vehicles. To ensure reliability, robust FL algorithms are needed to manage data discrepancies, adapt to changing network conditions, and effectively integrate data from diverse vehicles. Moreover, the timely and accurate deployment of intrusion detection solutions depends on the reliability of model updates and communication protocols. Future research aims to enhance the reliability of IDSs in IoV by refining FL algorithms, improving data preprocessing methods, and optimizing communication protocols. This will ultimately ensure the consistent and reliable operation of FL-enabled IDSs across diverse IoV environments.
Real-time data: In the context of vehicle environments, responding promptly to security threats is crucial for ensuring passenger safety and network security. Swift and effective intrusion detection relies on processing the substantial volume of real-time data provided by vehicles. The primary challenge lies in developing FL algorithms capable of handling this increased data volume efficiently, with a focus on enabling timely anomaly or intrusion identification. Moreover, optimizing communication protocols to efficiently transmit relevant real-time data to central servers for model updates is of paramount importance. Future research in this area is directed towards creating FL models that combine lightweight characteristics with high-performance capabilities. This involves exploring the use of edge computing for local real-time analysis and improving communication protocols to facilitate seamless and swift sharing of real-time data ^[32]. By effectively addressing this challenge, the utilization of FL-enabled IDSs in IoV can offer immediate responses to security threats, thereby enhancing the overall safety and security of vehicular networks.

This entry is adapted from the peer-reviewed paper 10.3390/fi15120403

References

Alladi, T.; Kohli, V.; Chamola, V.; Yu, F.R. Securing the internet of vehicles: A deep learning-based classification framework. IEEE Netw. Lett. 2021, 3, 94–97.
Alazab, M.; RM, S.P.; Parimala, M.; Maddikunta, P.K.R.; Gadekallu, T.R.; Pham, Q.V. Federated learning for cybersecurity: Concepts, challenges, and future directions. IEEE Trans. Ind. Inform. 2021, 18, 3501–3509.
Rani, P.; Sharma, C.; Ramesh, J.V.N.; Verma, S.; Sharma, R.; Alkhayyat, A.; Kumar, S. Federated Learning-Based Misbehaviour Detection for the 5G-Enabled Internet of Vehicles. IEEE Trans. Consum. Electron. 2023.
Rashid, M.M.; Khan, S.U.; Eusufzai, F.; Redwan, M.A.; Sabuj, S.R.; Elsharief, M. A Federated Learning-Based Approach for Improving Intrusion Detection in Industrial Internet of Things Networks. Network 2023, 3, 158–179.
Agrawal, S.; Sarkar, S.; Aouedi, O.; Yenduri, G.; Piamrat, K.; Alazab, M.; Bhattacharya, S.; Maddikunta, P.K.R.; Gadekallu, T.R. Federated learning for intrusion detection system: Concepts, challenges and future directions. Comput. Commun. 2022, 195, 346–361.
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated learning: Challenges, methods, and future directions. IEEE Signal Process. Mag. 2020, 37, 50–60.
Belenguer, A.; Navaridas, J.; Pascual, J.A. A review of federated learning in intrusion detection systems for iot. arXiv 2022, arXiv:2204.12443.
Hosseinzadeh, M.; Hemmati, A.; Rahmani, A.M. Federated learning-based IoT: A systematic literature review. Int. J. Commun. Syst. 2022, 35, e5185.
Fedorchenko, E.; Novikova, E.; Shulepov, A. Comparative review of the intrusion detection systems based on federated learning: Advantages and open challenges. Algorithms 2022, 15, 247.
Lavaur, L.; Pahl, M.-O.; Busnel, Y.; Autrel, F. The evolution of federated learning-based intrusion detection and mitigation: A survey. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2309–2332.
Yang, J.; Hu, J.; Yu, T. Federated AI-enabled in-vehicle network intrusion detection for internet of vehicles. Electronics 2022, 11, 3658.
Aliyu, I.; Feliciano, M.C.; Van Engelenburg, S.; Kim, D.O.; Lim, C.G. A Blockchain-Based Federated Forest for SDN-Enabled In-Vehicle Network Intrusion Detection System. IEEE Access 2021, 9, 102593–102608.
Aliyu, I.; Engelenburg, S.V.; Mu’Azu, M.B.; Kim, J.; Lim, C.G. Statistical Detection of Adversarial Examples in Blockchain-Based Federated Forest In-Vehicle Network Intrusion Detection Systems. IEEE Access 2022, 10, 109366–109384.
Uprety, A.; Rawat, D.B.; Li, J. Privacy preserving misbehavior detection in IoV using federated machine learning. In Proceedings of the 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 9–12 January 2021; pp. 1–6.
Boualouache, A.; Brik, B.; Senouci, S.-M.; Engel, T. On-Demand Security Framework for 5GB Vehicular Networks. IEEE Internet Things Mag. 2023, 6, 26–31.
Xu, Q.; Zhang, L.; Ou, D.; Yu, W. Secure Intrusion Detection by Differentially Private Federated Learning for Inter-Vehicle Networks. Transp. Res. Rec. 2023, 2677, 421–437.
Driss, M.; Almomani, I.; e Huma, Z.; Ahmad, J. A federated learning framework for cyberattack detection in vehicular sensor networks. Complex Intell. Syst. 2022, 8, 4221–4235.
Zainudin, A.; Akter, R.; Kim, D.-S.; Lee, J.-M. FedIoV: A Federated Learning-Assisted Intrusion Messages Detection in Internet of Vehicles. 2022, pp. 305–306. Available online: https://www.dbpia.co.kr/Journal/articleDetail?nodeId=NODE11197063 (accessed on 6 October 2023).
Taslimasa, H.; Dadkhah, S.; Neto, E.C.P.; Xiong, P.; Iqbal, S.; Ray, S.; Ghorbani, A.A. ImageFed: Practical Privacy Preserving Intrusion Detection System for In-Vehicle CAN Bus Protocol. In Proceedings of the 2023 IEEE 9th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS), New York, NY, USA, 6–8 May 2023; pp. 122–129.
Hbaieb, A.; Ayed, S.; Chaari, L. Federated learning based IDS approach for the IoV. In Proceedings of the 17th International Conference on Availability, Reliability and Security, Vienna, Austria, 23–26 August 2022; pp. 1–6.
Vinita, L.J.; Vetriselvi, V. Federated Learning-based Misbehaviour detection on an emergency message dissemination scenario for the 6G-enabled Internet of Vehicles. Hoc Netw. 2023, 144, 103153.
Yu, T.; Hua, G.; Wang, H.; Yang, J.; Hu, J. Federated-lstm based network intrusion detection method for intelligent connected vehicles. In Proceedings of the ICC 2022-IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 4324–4329.
Xing, L.; Wang, K.; Wu, H.; Ma, H.; Zhang, X. FL-MAAE: An Intrusion Detection Method for the Internet of Vehicles Based on Federated Learning and Memory-Augmented Autoencoder. Electronics 2023, 12, 2284.
Sebastian, A. Enhancing Intrusion Detection in Internet of Vehicles Through Federated Learning. arXiv 2023, arXiv:2311.13800.
Korba, A.A.; Boualouache, A.; Brik, B.; Rahal, R.; Ghamri-Doudane, Y.; Senouci, S.M. Federated Learning for Zero-Day Attack Detection in 5G and Beyond V2X Networks. In AlgoTel 2023-25èmes Rencontres Francophones sur les Aspects Algorithmiques des Télécommunications. 2023. Available online: https://hal.science/hal-04087452/ (accessed on 22 September 2023).
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450.
Campos, E.M.; Saura, P.F.; González-Vidal, A.; Hernández-Ramos, J.L.; Bernabé, J.B.; Baldini, G.; Skarmeta, A. Evaluating Federated Learning for intrusion detection in Internet of Things: Review and challenges. Comput. Netw. 2022, 203, 108661.
Billah, M.; Mehedi, S.T.; Anwar, A.; Rahman, Z.; Islam, R. A systematic literature review on blockchain enabled federated learning framework for internet of vehicles. arXiv 2022, arXiv:2203.05192.
Shejwalkar, V.V. Quantifying and Enhancing the Security of Federated Learning. 2023. Available online: https://www.cics.umass.edu/event/20230426/quantifying-and-strengthening-security-federated-learning (accessed on 6 September 2023).
Ruzafa-Alcázar, P.; Fernández-Saura, P.; Mármol-Campos, E.; González-Vidal, A.; Hernández-Ramos, J.L.; Bernal-Bernabe, J.; Skarmeta, A.F. Intrusion detection based on privacy-preserving federated learning for the industrial IoT. IEEE Trans. Ind. Inform. 2021, 19, 1145–1154.
Duy, P.T.; Hao, H.N.; Chu, H.M.; Pham, V.H. A Secure and Privacy Preserving Federated Learning Approach for IoT Intrusion Detection System. In Proceedings of the Network and System Security: 15th International Conference, NSS 2021, Tianjin, China, 23 October 2021; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 353–368.
Duan, Q.; Huang, J.; Hu, S.; Deng, R.; Lu, Z.; Yu, S. Combining Federated Learning and Edge Computing Toward Ubiquitous Intelligence in 6G Network: Challenges, Recent Advances, and Future Directions. IEEE Commun. Surv. Tutor. 2023, 25, 2892–2950.
Danba, S.; Bao, J.; Han, G.; Guleng, S.; Wu, C. Toward collaborative intelligence in IoV systems: Recent advances and open issues. Sensors 2022, 22, 6995.
Thonglek, K.; Takahashi, K.; Ichikawa, K.; Nakasan, C.; Leelaprute, P.; Iida, H. Sparse communication for federated learning. In Proceedings of the 2022 IEEE 6th International Conference on Fog and Edge Computing (ICFEC), Messina, Italy, 16–19 May 2022; pp. 1–8.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.