Privacy Protection in Mobile Edge Computing: Comparison
Please note this is a comparison between Version 2 by Zhimo Cheng and Version 1 by Zhimo Cheng.

在移动边缘计算中,不同设备之间的数据共享和分析对社会创新和发展具有重要价值。实现这一目标的局限性在于数据隐私风险。因此,现有研究主要集中在增强数据隐私保护能力上。一方面,通过联邦学习将原始数据转换为模型参数进行传输,避免了直接的数据泄露。另一方面,通过隐私保护技术进一步增强联邦学习的安全性,以防御推理攻击。Data sharing and analyzing among different devices in mobile edge computing is valuable for social innovation and development. The limitation to the achievement of this goal is the data privacy risk. Therefore, existing studies mainly focus on enhancing the data privacy-protection capability. On the one hand, direct data leakage is avoided through federated learning by converting raw data into model parameters for transmission. On the other hand, the security of federated learning is further strengthened by privacy-protection techniques to defend against inference attack. However, privacy-protection techniques may reduce the training accuracy of the data while improving the security. Particularly, trading off data security and accuracy is a major challenge in dynamic mobile edge computing scenarios. To address this issue, we propose a federated-learning-based privacy-protection scheme, FLPP. Then, we build a layered adaptive differential privacy model to dynamically adjust the privacy-protection level in different situations. Finally, we design a differential evolutionary algorithm to derive the most suitable privacy-protection policy for achieving the optimal overall performance. The simulation results show that FLPP has an advantage of 8$\%$$\sim$34$\%$ in overall performance. This demonstrates that our scheme can enable data to be shared securely and accurately.

  • mobile edge computing
  • privacy protection
  • differential privacy

1. 引言Introduction

随着移动边缘计算 (MEC) 的兴起,各种传感器、控制器和智能设备正在生成大量数据 [1]。在万物互联时代,数据利用是实现创新、推动增长和解决重大挑战的关键[2]。通过数据挖掘,研究人员可以揭示隐藏的模式、趋势和相关性。这些信息有助于我们做出最佳决策,例如,在医疗领域对疾病进行精确诊断和治疗,或在城市规划中优化交通流量和资源配置。显然,数据的综合利用可以带来巨大的价值和效益[3]。

With the rise of mobile edge computing (MEC), massive amounts of data are being generated by a wide variety of sensors, controllers and smart devices \cite{7786106}. In the era of the Internet of Everything, data utilization is key to enabling innovation, driving growth and solving our major challenges \cite{9083958}. By data mining, we can reveal the hidden patterns, trends and correlations. This information helps us make optimal decisions, for instance, the precise diagnosis and treatment of diseases in the medical field, or the optimization of traffic flow and resource allocation in urban planning. Evidently, the integrated utilization of data can bring great value and benefits \cite{9139976}.

However, it is often difficult to derive value from the data of a single user. More user data needs to be involved in the analysis and refinement to get comprehensive information~ \cite{10.1007/978-981-13-0695-2_8}. In traditional centralized machine learning, data is often stored centrally in a centralized server. This leads to the isolated data island effect, i.e., data cannot be fully utilized and shared. Meanwhile, data privacy protection has become a key issue because of the centralization of users' sensitive personal data \cite{8436047}. Data from mobile devices generally should not be shared with others in mobile edge computing scenarios. Therefore, breaking the isolated data island and ensuring data privacy is a current issue \mbox{\cite{10.1145/3298981}.}

Federated learning (FL) \cite{pmlr-v54-mcmahan17a}, as a new technology paradigm based on cryptography and machine learning, can achieve information mining without local data. It can unite data distributed in different mobile devices and train them into a unified global model with more comprehensive information. Thus, it solves the problem of isolated data islands. The clients and server interact with data information through the model parameters without sharing the original data, improving their data privacy \cite{9055478}.

However, federated learning also leads to several security and privacy risks \cite{MOTHUKURI2021619}. One of the main threats is model inference attack. Although communication is channeled through the model parameters, Zhu et al. \cite{zhu2019deep} revealed that exchanged model parameters may also leak private information about the training data. They demonstrated that the original training data, including image and text data, can be inferred from the gradients. This poses a new challenge for data privacy-preserving techniques based on federated learning.

但是,通常很难从单个用户的数据中获取价值。需要更多的用户数据参与分析和细化,以获得全面的信息[4]。在传统的集中式机器学习中,数据通常集中存储在集中式服务器中。这导致了孤立的数据孤岛效应,即数据无法得到充分利用和共享。同时,由于用户敏感个人数据的集中化,数据隐私保护已成为一个关键问题[5]。在移动边缘计算场景中,来自移动设备的数据通常不应与他人共享。因此,打破孤立的数据孤岛,确保数据隐私是当前面临的一个问题[6]。
联邦学习(FL)[7]作为一种基于密码学和机器学习的新技术范式,可以在没有本地数据的情况下实现信息挖掘。它可以将分布在不同移动设备中的数据联合起来,并将它们训练成一个具有更全面信息的统一全球模型。因此,它解决了孤立的数据孤岛问题。客户端和服务端通过模型参数与数据信息进行交互,而不共享原始数据,提高了数据隐私性[8]。
然而,联邦学习也会导致一些安全和隐私风险[9]。主要威胁之一是模型推理攻击。尽管通信是通过模型参数进行的,但Zhu等[10]发现,交换的模型参数也可能泄露有关训练数据的私人信息。他们证明,原始训练数据,包括图像和文本数据,可以从梯度中推断出来。这给基于联邦学习的数据隐私保护技术带来了新的挑战。

2. 移动边缘计算中的隐私保护 Privacy Protection in Mobile Edge Computing

现有研究通过结合多种隐私保护技术来增强联邦学习的安全性,主要包括同态加密(HE)、安全多方计算(SMPC)和差分隐私(DP)[11]。大量研究表明,联邦学习与这些隐私保护技术的结合可以提供足够强大的安全性。
Fang等[12]提出了一种多方隐私保护机器学习框架,名为PFMLP,部分基于HE和联邦学习。在实现训练准确性的同时,也提高了训练效率。Xu等[13]提出了一种将HE应用于IoT-FL场景的隐私保护方案,该方案与当前的物联网架构具有较强的适应性。Zhang等[14]提出了一种隐私增强联邦学习(PEFL)方案来保护不受信任的服务器上的梯度。这主要是通过使用Paillier同态密码系统加密参与者的局部梯度来实现的。HE方法可以提高联邦学习的安全性,尽管它会导致很大的计算负载。这对移动边缘计算场景中设备有限的可计算性提出了挑战。
Kalapaaking等[15]提出了一种联邦学习框架,该框架结合了基于SMPC的聚合和加密推理方法。该框架维护数据和模型隐私。Houda等[16]提出了一种名为MiTFed的新框架,该框架允许多个软件定义网络(SDN)域在不共享其敏感数据集的情况下协同构建全局入侵检测模型。该方案结合了SMPC技术,可以安全地聚合本地模型更新。Sotthiwat等[17]提出对模型参数(梯度)的关键部分进行加密,以防止梯度攻击导致深度泄漏。Fereidooni等[18]提出了SAFELearn,这是一种用于高效私有联邦学习系统的通用设计,可防止推理攻击。此外,最近关于秘密共享技术作为一种SMPC的研究[19,20,21]也有望实现联邦学习和数据共享的安全性。上述研究实现了模型的安全构建,但无法承受大量参与者的通信开销。
差分隐私技术是避免计算负载和通信开销的好方法。Wang等[22]提出了一种基于联邦学习和端边云计算的协同过滤算法推荐系统。通过DP技术在训练模型中加入拉普拉斯噪声,进一步防止了私有数据的暴露。Wei等[23]提出了一种基于DP的新型框架NbAFL,在聚合之前,将人工噪声添加到客户端的参数中。Zhao等[24]提出了一种用于工业大数据挖掘的匿名和隐私保护联邦学习方案,该方案利用了共享参数上的差分隐私。他们还测试了不同隐私级别对准确性的影响。Adnan等[25]进行了一个案例研究,将差分私有联邦学习框架应用于组织病理学图像的分析,组织病理学图像是最大且可能最复杂的医学图像。他们的工作表明,差分私有联邦学习是医学图像分析中机器学习模型协作开发的可行且可靠的框架。但是,这些作品的 DP 隐私级别是固定的,因此它无法适应动态变化的参与聚合客户端集。特别是,具有固定隐私级别的非 IID 数据分发可能会减慢 FL 模型训练的速度以达到预期的准确性。
综上所述,隐私级别可调的DP技术显然更适合于移动边缘计算中联邦学习的隐私保护。

Existing studies enhance the security of federated learning by combining with a variety of privacy-protection techniques, mainly including homomorphic encryption (HE), secure multi-party computation (SMPC) and differential privacy (DP) \cite{9084352}. Extensive research demonstrates that the combination of federated learning with these privacy-protection techniques can provide sufficiently strong security.

Fang et al. \cite{fang2021privacy} proposed a multi-party privacy-preserving machine learning framework, named PFMLP, based partially on HE and federated learning. Training accuracy is achieved while also improving the training efficiency. Xu et al. \cite{xu2023privacy} proposed a privacy-protection scheme to apply HE in IoT-FL scenarios, which is highly adaptable with current IoT architectures. Zhang et al. \cite{9014272} propose a privacy-enhanced federated-learning (PEFL) scheme to protect the gradients over an untrusted server. This is mainly enabled by encrypting participants' local gradients with a Paillier homomorphic cryptosystem. The HE approach can improve the security of federated learning, although it causes a large computation load. This poses a challenge to the limited computability of devices in mobile edge computing scenarios.

Kalapaaking et al. \cite{kalapaaking2022smpc} proposed a federated-learning framework that combines SMPC-based aggregation and Encrypted Inference methods. This framework maintains data and model privacy. Houda et al. \cite{abou2023mitfed} presented a novel framework, called MiTFed, that allows multiple software defined network (SDN) domains to collaboratively build a global intrusion detection model without sharing their sensitive datasets. The scheme incorporates SMPC techniques to securely aggregate local model updates. Sotthiwat et al. \cite{9499372} propose to encrypt a critical part of model parameters (gradients) to prevent deep leakage from gradient attacks. Fereidooni et al. \cite{9474309} present SAFELearn, a generic design for efficient private FL systems that protects against inference attacks. In addition, recent \mbox{studies \cite{10086673,8875319,9724512}} on secret sharing techniques as a kind of SMPC also hopefully enable federated learning and data sharing security. The above studies implement the secure construction of models but cannot afford the communication overhead of a large number of participants.

The differential privacy technique is a good way to avoid the computation load and communication overhead. Wang et al. \cite{wang2020trusted} proposed a collaborative filtering algorithm recommendation system based on federated learning and end--edge--cloud computing. The exposure of private data was further prevented by adding Laplace noise to the training model through DP technology. Wei et al. \cite{9069945} proposed a novel DP-based framework, NbAFL, in which artificial noise is added to parameters at the clients’ side before aggregating. The strategy for achieving the optimal performance and privacy level is performed by selecting the number of clients participating in FL. Zhao et al. \cite{9325934} propose an anonymous and privacy-preserving federated-learning scheme for the mining of industrial big data, which leverages differential privacy on shared parameters. They also test the effect of different privacy levels on accuracy. Adnan et al. \cite{Adnan2022} conduct a case study of applying a differentially private federated-learning framework for analysis of histopathology images, the largest and perhaps most complex medical images. Their work indicates that differentially private federated learning is a viable and reliable framework for the collaborative development of machine learning models in medical image analysis. However, the DP privacy level of these works is fixed so it cannot adapt to the dynamically changing sets of participating aggregation clients. In particular, non-IID data distribution with fixed privacy level may slow down the speed of FL model training to reach the anticipated accuracy.

In summary, the DP technique with adjustable privacy levels is clearly more suitable for privacy protection for federated learning in mobile edge computing. To this end, we propose FLPP, a privacy-protection scheme based on federated learning to adaptively determine a privacy level strategy, aiming to jointly optimize the accuracy and security of the training model.

Video Production Service