|
Scope
|
Paper
|
Challenges
|
Group
of Techniques
|
Different Application
Domains
|
Al-Rakhami et al. [13]
|
0/6
|
2/8
|
1/6
|
Wang et al. [14]
|
1/6
|
4/8
|
4/6
|
Verbraeken et al. [16]
|
1/6
|
0/8
|
0/6
|
Zhou et al. [15]
|
2/6
|
4/8
|
0/6
|
Dianlei Xu et al. [10]
|
6/6
|
3/8
|
0/6
|
The researchers' work
|
6/6
|
8/8
|
6/6
|
3. Research Methodology
The research methodology used in this paper consists of a Systematic Literature Review (SLR), where a rigorous protocol of searching the literature is defined and applied to extract information that answers specific research questions. The use of this methodology enables impartial results and an auditable process. This section details the methodology used in the review.
According to Brereton et al.
[17], an SLR is performed procedurally through distinct processes. This proposal includes an initial phase called ’Plan Review’, which includes: (i) specifying research questions; (ii) developing review protocol; (iii) validating review protocol. In the second phase, ’Conduct Review’, the following are carried out: (iv) identifying relevant research; (v) selecting primary studies; (vi) assessing study quality; (vii) extracting required data; (viii) synthesising data. In the last phase, ’Document Review’, the activities of producing and validating the reports with the reviewed findings are performed, respectively: (ix) writing of the review report, and (x) validating the report.
4. Answering the RQs
4.1. RQ1—Research Challenges in Edge Intelligence (EI)
In this section, the researchers summarize the challenges faced by the Edge Intelligence (EI) paradigm that the analyzed studies either mentioned or aimed to tackle. The discussion presented in this section aims to provide answers to RQ1: What are the main challenges and open issues in the distributed learning field?
As mentioned earlier, performing ML techniques at the edge of the network promises to bring several benefits, but it raises several challenges. As this field is still in its beginning, solutions to such challenges are still being investigated. The surveyed studies tackle several challenges, which can be broadly grouped into six categories, displayed in Table 2 and described in what follows.
Table 2. Challenges in distributed machine learning in edge computing.
Challenges
|
CH1
|
Running ML/DL on devices with limited resources
|
CH2
|
Ensuring energy efficiency without compromising the accuracy
|
CH3
|
Communication efficiency
|
CH4
|
Ensuring data privacy and security
|
CH5
|
Handling failure in edge devices
|
CH6
|
Heterogeneity and low quality of data
|
CH1 consists of dealing with the typical low processing power of edge devices. Edge devices often have little processing capacity, mainly when compared to the powerful data centers at the cloud. On the other hand, many ML applications require high computational power that outweighs the possibilities of resource-constrained IoT and edge devices. Limited resources also include memory and storage capacities. NN and ML algorithms generally require storing of and access to a handful of parameters that describe the model architecture and weight values forming the classification model. With limited storage, it may not be possible to have continued access to the original training data, or the data may have been removed altogether to free up space. Therefore, a significant challenge is reducing memory access and storing the data locally to avoid costly reading and writing to external memory modules.
CH2 consists of ensuring the energy efficiency of edge devices without compromising the accuracy of the system. In general, the higher the complexity of the required processing, the more energy is consumed. Edge devices can be battery-powered. In these cases, the energy consumption of algorithms must be minimized to reach energy efficiency. However, this should be done with care so as not to compromise the quality of the data generated and the decisions/inferences made. So, there is an important trade-off to be managed.
CH3 concerns communication issues, where edge intelligence models must consider that the devices might face poor connectivity. In such cases, the model update time in training tasks may be delayed. Valerio, Passarella and Conti
[18] claim that the inference is highly sensitive to the available bandwidth in communication. Challenges in communication include network traffic, fluctuations in the bandwidth, intermittent or unavailable connectivity.
CH4 is related to data privacy and security. Several applications in edge intelligence handle sensitive data, such as healthcare. Thus, distributed ML algorithms must be able to preserve user privacy and information security when data are transferred throughout the devices. Distributed Edge-Intelligence (EI) has multiple points of vulnerability to possible malicious attacks or leakage of confidential or important data in the ML workflow.
CH5 is the challenge posed by failures in edge devices. Since devices might fail at some point, the distributed algorithm must consider ways to overcome this situation. Lastly, heterogeneity and lack of quality in available data rise challenge CH6. For most ML algorithms, especially in supervised machine learning, high accuracy depends on the high quality of training data. However, this often does not apply in edge intelligence scenarios, where the collected data are sparse and unlabelled
[10]. Distributed edge intelligence can handle data from different sources in different formats and is subject to noise. The application must handle noise and heterogeneity in the sensed data used as input to attain good accuracy.
Table 3 presents references to each of the described challenges, as well as studies that propose approaches to tackle these challenges. This table aims to only show an overview on the number of papers by each challenge. The researchers can observe that challenge CH1 is the one with more papers present in literature. All of the cited works are better described later in this paper.
Table 3. References to the challenges of Edge Intelligence.
4.2. RQ2—Techniques and Strategies
Here, the researchers focus on three main aspects, namely: (i) the system architecture, (ii) how the ML tasks are distributed among the devices, and (iii) the underlying adopted techniques. The researchers classify the several approaches used in distributed learning based on these three aspects. The researchers identified nine groups of techniques and strategies, described in what follows: Federated learning; Model partitioning; Right-sizing; Edge pre-processing; Scheduling; Cloud pre-training; Edge only; Model Compression; and Other techniques.
4.3. RQ3—Frameworks for Edge Intelligence
This section describes the studies that provided answers to the RQ3 of this survey. Table 4 lists the main frameworks currently used in distributed ML applications. The table also correlates each framework with the corresponding EI group of techniques or the main related strategy.
Table 4. EI frameworks.
Framework
|
Groups of
Techniques or
Strategies
|
Comments
|
Neurosurgeon [73]
|
Model Partitioning
|
Lightweight scheduler to automatically partition DNN computation between edge devices and cloud at the granularity of NN layers
|
JointDNN [74]
|
Model Partitioning
|
JointDNN provides an energy- and performance-efficient method of querying some layers on the mobile device and some layers on the cloud server.
|
H. Li et al. [32]
|
Model Partitioning
|
They divide the NN layers and deploy the part with the lower ones (closer to the input) into edge servers and the part with higher layers (closer to the output) into the cloud for offloading processing. They also propose an offline and an online algorithm that schedules tasks in Edge servers.
|
Musical chair [75]
|
Model Partitioning
|
Musical Chair aims at alleviating the compute cost and overcoming the resource barrier by distributing their computation: data parallelism and model parallelism.
|
AAIoT [76]
|
Model Partitioning
|
Accurate segmenting NNs under multi-layer IoT architectures
|
MobileNet [43]
|
Model Compression
Model Selector
|
Presented by Google Inc., the two hyperparameters introduced allow the model builder to choose the right sized model for the specific application.
|
Squeezenet
|
Model Compression
|
It is a reduced DNN that achieves AlexNet-level accuracy with 50 times fewer parameters
|
Tiny-YOLO
|
Model Compression
|
Tiny Yolo is a very lite NN and is hence suitable for running on edge devices. It has an accuracy that is comparable to the standard AlexNet for small class numbers but is much faster.
|
BranchyNet
|
Right sizing
|
Open source DNN training framework that supports the early-exit mechanism.
|
TeamNet [77]
|
Model Compression
Transfer Learning
|
TeamNet trains shallower models using the similar but downsized architecture of a given SOTA (state of the art) deep model. The master node compares its uncertainty with the worker’s and selects the one with the least uncertainty as to the final result.
|
OpenEI [43]
|
Model Compression
Data Quantization
Model Selector
|
The algorithms are optimized by compressing the size of the model, quantizing the weight. The model selector will choose the most suitable model based on the developer’s requirement (the default is accuracy) and the current computing resource.
|
TensorFlow Lite [78]
|
Data Quantization
|
TensorFlow’s lightweight solution, which is designed for mobile and edge devices. It leverages many optimization techniques, including quantized kernels, to reduce the latency.
|
QNNPACK (Quantized Neural Networks PACKage) [79]
|
Data Quantization
|
Developed by Facebook, is a mobile-optimized library for high-performance NN inference. It provides an implementation of common NN operators on quantized 8-bit tensors.
|
ProtoNN [80]
|
Model Compression
|
Inspired by k-Nearest Neighbor (KNN) and could be deployed on the edges with limited storage and computational power.
|
EMI-RNN [81]
|
Right Sizing
|
It requires 72 times less computation than standard Long Short term Memory Networks (LSTM) and improves its accuracy by 1%.
|
CoreML [82]
|
Model Compression
Data Quantization
|
Published by Apple, it is a deep learning package optimized for on-device performance to minimize memory footprint and power consumption. Users are allowed to integrate the trained machine learning model into Apple products, such as Siri, Camera, and QuickType.
|
DroNet [34]
|
Model Compression
Data Quantization
|
The DroNet topology was inspired by residual networks and was reduced in size to minimize the bare image processing time (inference). The numerical representation of weights and activations reduces from the native one, 32-bit floating-point (Float32), down to a 16-bit fixed point one (Fixed16).
|
Stratum [83]
|
Model Selector
Dynamic Scheduling
|
Stratum can select the best model by evaluating a series of user-built models. A resource monitoring framework within Stratum keeps track of resource utilization and is responsible for triggering actions to elastically scale resources and migrate tasks, as needed, to meet the ML workflow’s Quality of Services (QoS). ML modules can be placed on the edge of the Cloud layer, depending on user requirements and capacity analysis.
|
Efficient distributed deep learning (EDDL) [54]
|
Model Compression
Model Partitioning
Right-Sizing
|
A systematic and structured scheme based on balanced incomplete block design (BIBD) used in situations where the dataflows in DNNs are sparse. Vertical and horizontal model partition and grouped convolution techniques are used to reduce computation and memory. To speed up the inference, BranchyNet is utilized.
|
In-Edge AI [5]
|
Federated Learning
|
Utilizes the collaboration among devices and edge nodes to exchange the learning parameters for better training and inference of the models.
|
Edgence [84]
|
Blockchain
|
Edgence (EDGe + intelligENCE) is proposed to serve as a blockchain-enabled edge-computing platform to intelligently manage massive decentralized applications in IoT use cases.
|
FederatedAveraging (FedAvg) [85]
|
Federated Learning
|
Combines local stochastic gradient descent (SGD) on each client with a server that performs model averaging.
|
SSGD [86]
|
Federated Learning
|
System that enables multiple parties to jointly learn an accurate neural network model for a given objective without sharing their input datasets.
|
BlockFL [87]
|
Blockchain
Federated Learning
|
Mobile devices’ local model updates are exchanged and verified by leveraging blockchain.
|
Edgent [6]
|
Model Partitioning
Right-Sizing
|
Adaptively partitions DNN computation between the device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference. DNN right-sizing accelerates DNN inference through the early exit at a proper intermediate DNN layer to further reduce the computation latency.
|
PipeDream [88]
|
Model Partitioning
|
PipeDream keeps all available GPUs productive by systematically partitioning DNN layers among them to balance work and minimize communication.
|
GoSGD [89]
|
Gossip Averaging
|
Method to share information between different threads based on gossip algorithms and showing good consensus convergence properties.
|
Gossiping SGD [90]
|
Gossip Averaging
|
Asynchronous method that replaces the all-reduce collective operation of synchronous training with a gossip aggregation algorithm.
|
GossipGraD [91]
|
Gossip Averaging
|
Asynchronous communication of gradients for further reducing the communication cost.
|
INCEPTIONN [92]
|
Data Quantization
|
Lossy-compression algorithm for floating-point gradients. The framework reduces the communication time by 70.9 80.7% and offers 2.2 3.1× speedup over the conventional training system while achieving the same level of accuracy.
|
Minerva [93]
|
Data Quantization
Model compression
|
Quantization analysis minimizes bit widths without exceeding a strict prediction error bound. Compared to a 16-bit fixed-point baseline, Minerva reduces power consumption by 1.5×. Minerva identifies operands that are close to zero and removes them from the prediction computation such that model accuracy is not affected. Selective pruning further reduces power consumption by 2.0× on top of bit width quantization.
|
AdaDeep [94]
|
Model Compression
|
Automatically selects a combination of compression techniques for a given DNN that will lead to an optimal balance between user-specified performance goals and resource constraints. AdaDeep enables up to 9.8× latency reduction, 4.3× energy efficiency improvement, and 38× storage reduction in DNNs while incurring negligible accuracy loss.
|
JALAD [95]
|
Data Quantization
Model Partitioning
|
Data compression by jointly considering compression rate and model accuracy. A latency-aware deep decoupling strategy to minimize the overall execution latency is employed. Decouples a deep NN to run a part of it at edge devices and the other part inside the conventional cloud.
|
When correlating the EI strategies with frameworks, it is possible to notice some interesting associations. There are ten of these techniques and strategies, of which only three are present in more than 60% of the papers. They are: (i) Model Compression with 24%, (ii) Model Partitioning with 20%, (iii) Data Quantization with 17%. Federated Learning, Right-Sizing, Gossip Averaging and Model Selector correspond to 9% each. The others have less than 8%. Figure 1 illustrates these ten classes of strategies.
Figure 1. Edge Intelligence strategies.
Among these strategies, Model Compression is the most suitable for solving the process of training and testing with the raw data and reducing the dimensionality in real-time. This strategy allows ML algorithms to have faster responses, using lower resources of bandwidth, power and processing. In addition, this technique has proven to be more economical and better at data security once the processing is realized entirely on the edge. In terms of algorithms, the most common is the DNN paradigm of machine learning, which segments models into successive parts (layers). This algorithm allows for the deployment of each part on distinguished sites (model partitioning). DNN also enables compression techniques such as removing nodes or layers, allowing offloading of a whole model in resource-constrained devices.
EI techniques tackle latency problems when part of the entire process is realized on edge devices, decreasing data traffic on the network and, consequently, decreasing the inherent delay in data transmission. Regarding security and privacy issues, it is possible to train and infer on sensitive data partially or fully at the edge, preventing their risky propagation throughout the network, where they are susceptible to attacks.
4.4. RQ4—Edge Intelligence Application Domains
In this section, the researchers present a taxonomy to characterize the application domains where the field of EI has been adopted, providing inputs to answer the RQ4. According to the researched articles, it was possible to group them into six main domains: (i) Industry, (ii) Surveillance, (iii) Security, (iv) Intelligent Transport, (v) Health, and (vi) Energy Management. This does not mean that other domains cannot be created due to new research. Figure 2 illustrates this taxonomy up to a third level. Table 5 shows the works that tackle these domains. Figure 3 summarizes the statistics of the six domains of the publishing by field.
Figure 2. EI application domains.
Figure 3. Publications by domain application.
Table 5. Application domains and corresponding works.