Federated Meta-Learning for Driver Distraction Detection: Comparison
Please note this is a comparison between Version 1 by Zihan Guo and Version 2 by Sirius Huang.

Driver distraction detection (3D) is essential in improving the efficiency and safety of transportation systems. Federated learning (FL) is emerging as a feasible solution that can train models without private and sensitive information leaving its local repository. Even though various solutions are proposed by using FL to upgrade the model learning paradigm of 3D, considering the requirements for user privacy and the phenomenon of data growth in real-world scenarios, existing methods are insufficient to address four emerging challenges, i.e., data accumulation, communication optimization, data heterogeneity, and device heterogeneity. 

  • federated learning
  • meta-learning
  • incremental federated meta-learning
  • driver distraction detection

1. Introduction

Currently, even though vehicles are upgraded to support a higher level of autonomy, humans are still their primary operators. Therefore, driver distraction is still a major problem that can disrupt and jeopardize transportation systems [1][2][1,2]. In general, driver distraction occurs when a driver’s attention is diverted, leading to a delay in recognizing vital information to keep vehicles running safely [3]. Especially with the proliferation of in-vehicle multimedia devices and personal smart gadgets, diverse in-vehicle activities exacerbate driver distraction. To prevent potential hazards and incidents, warnings to distracted drivers need to be fast and precise, which shows the necessity of driver distraction detection (3D).
With the rapid development of advanced technologies, e.g., Artificial Intelligence and Internet of Things (IoT), the capabilities of in-vehicle devices have improved, such as sensing, communication, computing, etc. Intelligent vehicle systems are often equipped with rich computing capabilities to support various tasks. Particularly, data-driven approaches to deep learning models have been widely developed and applied to support 3D, e.g., with various core models that are trained on driver face poses [4], driving actions [5], electroencephalography signals [6], and other sensed information to detect distractions, such as unfocused eyesight [7], inattention [6], and inappropriate operation [5].
Traditional deep learning methods centrally process data, namely, vehicles need to upload signals, images, and other sensed data to a central server. After the collection of sensed data, the server will train the required model based on the data consolidated from multiple sensing devices, also known as smart vehicles. However, in this process, the data to be transmitted may contain private or sensitive information, such as travel trajectories and passenger profiles. It is vulnerable to being intercepted and attacked via network connections between vehicles and servers. Under the restrictions listed in recently announced data protection laws and regulations, more isolated data silos are formed and become unbreakable barriers to applying centralized model learning solutions [8]. Therefore, federated learning (FL) is emerging as a feasible solution that can train models without private and sensitive information leaving its local repository [8][9][8,9].
Even though various solutions are proposed by using FL to upgrade the model learning paradigm of 3D [10][11][12][10,11,12], they are still incapable of handling the dynamics and heterogeneity encountered in the daily usage of 3D. First, most recent research is conducted based on predefined experimental settings, in which clients possess preassigned data and exchange model parameters directly without any optimizations. Specifically, in more realistic scenarios, data can be sensed continuously. Since current solutions focus more on old data, if they are applied to the incremental data directly, it may make the model learning inefficient, leading to catastrophic forgetting of knowledge [13]. Second, even though current solutions based on FL do not need to transmit raw data, it is still costly to train high-performance models based on more frequent and excessive client–server interactions [9]. Finally, it is common to see that the availability of local data and computing powers of moving vehicles may change over time and place, and this may make current solutions inefficient at not only accommodating heterogeneous devices but also data with various distributions, uneven sizes, and missing label classes [9][14][15][9,14,15].

2. Challenges and Related Solutions

This section first introduces four challenges regarding the dynamics and heterogeneity encountered in incremental federated meta-learning (IFM), and accordingly, related solutions are discussed.

2.1. Emerging Challenges

First, as for the dynamics in real-world scenarios, the following two critical challenges are faced by 3D:
  • C1.1 Data Accumulation. While the 3D service is installed, the vehicles can continuously sense driver status and increase the samples to be used for model updates. In comparison to static scenarios where the training samples will not change frequently, data accumulation can cause pre-trained knowledge to be obsolete in processing new data [13][16][13,16].
Moreover, the heterogeneity embedded in 3D is represented by two main aspects, namely:
  • C2.1 Data Heterogeneity. Due to the restriction in IFM, data sensed are stored locally to protect user privacy, and as a result, the local data of different users may vary to be non-iid (non-independent and identically distributed), i.e., different distribution of samples, uneven data quality, etc. Such heterogeneity can significantly complicate the learning process of IFM [15][18][15,18].
[25,32] can be used to compress parameters exchanged without degrading model performance. Finally, the significance of each model layer can be determined in order to perform layer-wise uploading based on user similarity [33], model similarity [34][35][34,35], etc.

2.2.3. Solutions to Data Heterogeneity

Data heterogeneity, in general, can be addressed by knowledge distillation or meta-learning. Specifically, as for knowledge distillation, Lin et al. [24] proposed a FedDF framework, combining federated learning with knowledge distillation. Shang et al. [10] presented FedBiKD, which is a simple and effective federated bidirectional knowledge distillation framework. Moreover, meta-learning as the process of learning how to learn can guide local learning for better performance. There are many meta-learning algorithms, e.g., Model-Agnostic Meta-learning (MAML) [36], First-Order Model-Agnostic Meta-learning (FOMAML) [37], and Reptile [38]. The joint utilization of meta-learning algorithms and federated learning enables quick, personalized, and heterogeneity-supporting training [14][15][39][14,15,39]. Federated meta-learning (FM) offers various similar applications in transportation to overcome data heterogeneity, such as parking occupancy prediction [40][41][40,41] and bike volume prediction [42].

2.2.4. Solutions to Device Heterogeneity

In general, client heterogeneity can be resolved by client selection prior to task start and weighting during global aggregation. To simplify the learning process, random or full client selection is commonly utilized [8][26][34][8,26,34], under the prerequisite that all clients need to be available with little performance disparity. Thus, more advanced strategies are designed to mitigate the unreliability among clients, e.g., a compensatory first-come-first-merge algorithm adopted by Wu et al. [43], and the dynamic selection based on the status and availability of clients considered by Huang et al. [44]. Moreover, aggregation weights are also widely discussed. Particularly, the size of local samples [8][31][34][8,31,34] is the most common weight, but with drawbacks to handling IFM as the size of samples can change over time and place. Hence, weights relevant to the characteristics of devices are introduced, such as information richness [30], temporal weight [28][30][28,30], etc. In summary, as summarized in Table 1, existing methods focus more on solving the optimization issues related to communication (i.e., C1.2), and also present visible progress in addressing the two challenges in heterogeneity (i.e., C2.1 and C2.2). However, it is still missing a solution that can resolve the four challenges encountered in IFM. Therefore, ICMFed is proposed with dedicatedly designed model learning and adaptation processes to not only boost the learning performance but also improve the service quality.
Table 1.
The overview of related works (○ NOT SUPPORTED and ● SUPPORTED).
  • C1.2 Communication Optimization. To train a model jointly, 3D services require frequent interaction between the clients and the server. Even though in IFM, model parameters are exchanged instead of entire data, which can reduce the network traffic [8], the client–server interaction frequency increases to update the model iteratively, resulting in high latency to update the model on the fly [9]
  • C2.2 Device Heterogeneity. The devices to support 3D may have different configurations in terms of software and hardware, e.g., operation systems, sensing capabilities, storage spaces, computing powers, etc. Moreover, on the device, more than one service is running in parallel, and as a result, the availability of learning resources may vary among them. Therefore, how to select proper clients becomes an emerging challenge in IFM to remedy the impact of such heterogeneity [19][20][17][21][19,20[18][9,17,,2118].].

2.2. Related Solutions

To tackle the abovementioned challenges, related solutions are proposed.

2.2.1. Solutions to Data Accumulation

The data accumulation of IFM can be solved by timely updating of global models or optimizing local training patterns. While considering incremental scenarios, if the global model or training task is not updated adequately and in a timely manner, it will lead to poor performance [22][23][22,23]. Current research commonly adopts a predefined configuration for model learning [8][15][24][8,15,24]. Moreover, without modifying the model structure, several methods optimize local training patterns to improve knowledge retention on both old and new samples. For example, Wei et al. [22] proposed a method named FedKL utilizing knowledge lock to maintain the previously learned knowledge. Yoon et al. [25] introduced FedWeIT, allowing clients to leverage indirect experience from other clients to support continuous learning of federated knowledge. Le et al. [26] suggested a weighted processing strategy for model updating to prevent catastrophic forgetting. However, to achieve the optimal performance of these methods, the training will become less efficient, especially to process non-iid data.

2.2.2. Solutions to Communication Optimization

There are two major approaches for communication optimization, i.e., minimizing the amount of data exchanges or reducing the size of data transmitted. Specifically, the first approach can be achieved by reducing model upload frequency [27][28][27,28], adjusting aggregation schedules [28][29][30][28,29,30], and optimizing network topology [9][31][9,31]. In addition, technologies such as knowledge distillation [10][24][10,24] and sparse compression [25][32]
Video Production Service