2. Data-Based Approaches
Several studies have attempted to alleviate the non-IID issue among clients. Zhao et al.
[26] improved training on non-IID data by constructing a small, globally shared, uniformly distributed data subset for all clients. Similarly, Seo et al.
[27] mitigated the quality degradation problem in FL via data sharing, using an auction approach to effectively reduce the cost, while satisfying system requirements for maximizing model quality and resource efficiency. In
[28], the authors assume that a small segment of clients are willing to share their datasets, and the server collects data from these clients in a centralized manner to aid in updating the global model. Although such data-sharing-based methods have obtained significant performance improvements, they go against the original intention of FL and pose a threat to privacy. And in the absence of the client’s original data, the server cannot obtain the global data distribution information and use it to build a globally shared IID dataset.
3. Algorithm-Based Approaches
Another research aspect focuses on addressing the negative impact of heterogeneous data by designing algorithms to enhance the local training phase or improve the global aggregation process. In
[11], the authors introduce a new algorithm called SCAFFOLD. The algorithm uses control variables to correct for local updates, preventing “client drift”, and leverages the similarity in client data to accelerate the convergence of FL. Li et al.
[12] balances the optimization differences between global and local objectives using a regularization term. In addition, the authors
[13] introduced a normalized averaging algorithm called FedNove. This algorithm normalizes local updates by the number of local training iterations per client. It ensures rapid error convergence while maintaining objective consistency. The authors of
[14] propose the FedRS method, which constrains the updates of missing category weights during local training via a classification layer in a neural network. MOON
[15] is proposed as model-contrastive federated learning. It introduces a contrastive loss for the clients, utilizing the representations of the global model and historical local models for learning, to correct the local model updates of each client. Similarly, the authors of
[16] proposed FedProc, a prototypical contrastive federated learning approach. The authors design a global prototypical contrastive loss for local network training and use prototypes as global knowledge to correct local training for each client. The authors of
[18] demonstrate a contribution-dependent weighting design, named FedAdp. It calculates the association between the client’s local goals and the global goal of the overall FL system based on the gradient information during the training process, assigning different weights to each participating client. Zhang et al.
[19] address the challenge of direct model aggregation by transferring knowledge from the local model to the global model through data-free distillation. Long et al.
[29] propose FedCD, which removes classifier bias from non-IID data by introducing hierarchical prototype comparison learning, global information distillation, and other methods to understand the class distribution of clients.
4. System-Based Approaches
In addition, several studies have attempted to design client selection policies for servers. In
[20], the authors determine the level of IID data among clients by analyzing differences in local model weights. They assign a higher probability of selection to clients with lower degrees of non-IID, ensuring their more frequent participation in FL training. But the assumption of accessible IID public data is challenging to meet in the real world. Wu et al.
[21] use the inner product of the local model gradient and the global model gradient as a measure to determine the subset of clients participating in model aggregation, ensuring that clients contributing more to reducing the global loss have a higher probability of being selected. Some studies have designed a client selection strategy by considering the local training loss values. Goetz et al.
[25] evaluate the contribution of different client data in each FL round according to the local loss value, calculate the corresponding evaluation score, and select an optimized subset of clients according to the evaluation value. Cho et al.
[23] theoretically demonstrate that favoring client selection with larger local loss values can improve the convergence rate compared to random client selection. Other studies employ reinforcement learning to select clients for servers. Chen et al.
[30] use an UCB approach to heuristically select participating clients during each round of optimization, utilizing the cosine distance weights (CDW) of the historical global model and the current local model to measure the client’s contribution and assign rewards. Moreover, the author of
[22] proposed an experience-driven control framework that uses a deep reinforcement learning algorithm to intelligently select clients in each round of federated learning (FL) by reducing the dimensionality of clients’ local model weights and using them as states to enhance the performance of the global model. Xiao et al.
[31] proposed a client selection strategy based on clustering and bi-level sampling. Firstly, a subset of candidate clients is constructed using MD sampling, and then a WPCS mechanism is proposed to collect the weighted per-label mean class scores of the clients to perform clustering and select the final client.