Distributed edge intelligence is a disruptive research area that enables the execution of machine learning and deep learning (ML/DL) algorithms close to where data are generated. Since edge devices are more limited and heterogeneous than typical cloud devices, many hindrances have to be overcome to fully extract the potential benefits of such an approach (such as data-in-motion analytics).
1. Introduction
Nowadays, with the rise of the Internet of Things (IoT), a large number of smart applications are being built, taking advantage of connecting several types of devices to the internet. These applications will generate a massive amount of data that need to be processed promptly to generate valuable and actionable information. Edge intelligence (EI) refers to the ability to bring about the execution of machine learning tasks from the remote cloud closer to the IoT/Edge devices, either partially or entirely. Examples of edge devices are smartphones, access points, gateways, smart routers and switches, new generation base stations, and micro data centers.
Some edge devices have considerable computing capabilities (although always much smaller than cloud processing centers), but most are characterized by very limited capabilities. Currently, with the increasing development in the area of MEMS (Micro–Electro–Mechanical Systems) devices, there is a tendency to carry out part of the processing within the data producing devices themselves (sensors)
[1][2][3][4]. There are certainly several challenges involved in performing processing on resource-limited devices, including the need to adapt complex algorithms and divide the processing among several nodes.
Therefore, in Edge Intelligence, it is essential to promote collaboration between devices to compensate for their lower computing capacity. Some synonyms of this concept found in the literature are: distributed learning, edge/fog learning, distributed intelligence, edge/fog intelligence and mobile intelligence
[5][6][7].
The leverage of edge intelligence reduces some drawbacks of running ML tasks entirely in the cloud, such as:
- High latency [8]: offloading intelligence tasks to the edge enables achievement of faster inference, decreasing the inherent delay in data transmission through the network backbone;
- Security and privacy issues [9][10]: it is possible to train and infer on sensitive data fully at the edge, preventing their risky propagation throughout the network, where they are susceptible to attacks. Moreover, edge intelligence can derive non-sensitive information that could then be submitted to the cloud without further processing;
- The need for continuous internet connection: in locations where connectivity is poor or intermittent, the ML/DL could still be carried out;
- Bandwidth degradation: edge computing can perform part of processing tasks on raw data and transmit the produced data to the cloud (filtered/aggregated/pre-processed), thus saving network bandwidth. Transmitting large amounts of data to the cloud burdens the network and impacts the overall Quality of Service (QoS) [11];
- Power waste [12]: unnecessary raw data being transmitted through the internet demands power, decreasing energy efficiency on a large scale.
The steps for data processing in ML vary according to the specific technique in use, but generally occur in a well-defined life cycle, which can be represented by a workflow. Model building is at the heart of any ML technique, but the complete life cycle of a learning process involves a series of steps, from data acquisition and preparation to model deployment into a production environment. When adopting the Edge intelligence paradigm, it is necessary to carefully analyze which steps in the ML life cycle can be successfully executed at the edge of the network. Typical steps that have been investigated for execution at the edge are data collection, pre-processing, training and inference.
2. Related Work
Some surveys have been published that address the edge intelligence subject recently. However, they adopt different perspectives from the one adopted in this SLR. Al-Rakhami et al.
[13] propose and analyze a framework based on the distributed edge/cloud paradigm using docker technology which provides a very lightweight and effective virtualization solution. This solution can be utilized to manage, deploy and distribute applications onto clusters (e.g., small board devices such as Raspberry PI). It is able to provide an advantageous combination of various benefits and lower costs of data processing performed at the edge instead of central servers. However, the authors base their proposal on experiments to support the proposal of a new framework. The research does not mention any of the nine groups of techniques the researchers present in the work.
Wang et al.
[14] survey is centered on the connection between Deep Learning and the edge, either to apply DL in optimizing the edge or to use the edge to run DL algorithms. The study is divided into five fronts: DL applications on edge; DL inference in edge; edge computing for DL; DL training at the edge; DL for optimizing the edge. The paper discusses hardware and virtualization aspects. Concerning the (groups of) techniques and strategies, it is more restricted to Federated Learning and the optimization of the edge with DL.
Xu et al.
[10] approach edge intelligence under the perspectives of edge caching, edge training, edge inference, and edge offloading in a very comprehensive way. The researchers discuss all these aspects in the work but explore additional techniques, and strategies related to pre-processing, federated learning, and scheduling. One intersection of this paper with
the researcheours
' research is the overlap of three groups of techniques
the researchers present (Federated Learning, Edge Pre-processing and Scheduling). However,
the researchers deepened the discussion into more groups of techniques.
The work presented by Zhou et al.
[15] covers artificial intelligence to edge AI, showing a generalized representation of application architecture used in the lifecycle management of ML. In the edge layer: sensors/actuators; edge analytics; logging and monitoring. In the fog layer: visualization; live streaming engines; batch processing; data ingestion; storage and ML model development platforms and libraries. The researchers'
rpape
searchr approaches several more domains in which edge intelligence is used, which are not present in this survey. Compared to these other surveys, the researchers analyze the literature more comprehensively, including a discussion on application domains of edge intelligence and their correlation with identified techniques.
Verbraeken et al.
[16] provide an extensive overview of the current state-of-the-art in terms of outlining the challenges and opportunities of distributed machine learning over conventional machine learning, discussing the techniques used for distributed machine learning. The paper follows the same line of research of Wang et al.
[14], with a focus on machine learning applied to the distributed environment. To this end, it makes inroads into the various types of algorithms to solve problems using ML.
Table 1 shows the comparison between the researchers' work and the other surveys mentioned in this section. In summary, the main gaps of the analyzed works are focused on aspects such as “Techniques and Strategies” on the edge. The table also shows the aspects of “Challenges”and “Different Application Domains”, where edge intelligence can be used.
Table 1.
Comparison of existing surveys.
|
| Challenges |
| Scope |
|
|
| Paper |
|
| Challenges |
|
| Group |
| of Techniques |
|
| Different Application |
| Domains |
|
| Al-Rakhami et al. [13] |
|
| 0/6 |
|
| CH1 |
|
[ |
66 |
67 |
] |
|
3.2. RQ2—Techniques and Strategies
4.2. RQ2—Techniques and Strategies
Here, the researchers focus on three main aspects, namely: (i) the system architecture, (ii) how the ML tasks are distributed among the devices, and (iii) the underlying adopted techniques. The researchers classify the several approaches used in distributed learning based on these three aspects. The researchers identified nine groups of techniques and strategies, described in what follows: Federated learning; Model partitioning; Right-sizing; Edge pre-processing; Scheduling; Cloud pre-training; Edge only; Model Compression; and Other techniques.
3.3. RQ3—Frameworks for Edge Intelligence
4.3. RQ3—Frameworks for Edge Intelligence
This section describes the studies that provided answers to the RQ3 of this survey.
Table 4 lists the main frameworks currently used in distributed ML applications. The table also correlates each framework with the corresponding EI group of techniques or the main related strategy.
Table 4.
EI frameworks.
| Framework |
|
| Groups of |
| Techniques or |
| Strategies |
|
| Comments |
|
2/8 |
|
61 |
| Neurosurgeon [ |
| 1/6 |
|
7273] |
|
| Running ML/DL on devices with limited resources |
|
62][6263]
|
| Model Partitioning |
|
| Lightweight scheduler to automatically partition DNN computation between edge devices and cloud at the granularity of NN layers |
|
| Wang et al. [14] |
|
| CH2 |
|
| JointDNN [7374] |
| 1/6 |
|
| [10][15][1920][2223][2425][2627][3738] |
|
| 4/8 |
|
|
| [8][2021][2425][2728][2930][3233] |
| 4/6 |
|
[ | 33 | 34 |
| Verbraeken et al. [16] |
|
| 1/6 |
|
| 0/8 |
|
| CH2 |
|
| Ensuring energy efficiency without compromising the accuracy |
|
] | [ | 3435][4546][4849 |
| Model Partitioning |
| ][5253][5556][5758][6162][6263][6364] |
|
| JointDNN provides an energy- and performance-efficient method of querying some layers on the mobile device and some layers on the cloud server. |
|
| CH3 |
|
|
| CH3 |
| Communication efficiency |
| H. Li et al. [3132] |
|
| [10 |
| 0/6 |
|
] | [2021][2425][2526][4243][6263] |
|
| Model Partitioning |
|
| [16][1718][1920][2930][3031][3132][3233][3536][3940][4243][5253][6465][6566] |
|
| They divide the NN layers and deploy the part with the lower ones (closer to the input) into edge servers and the part with higher layers (closer to the output) into the cloud for offloading processing. They also propose an offline and an online algorithm that schedules tasks in Edge servers. |
|
| Zhou et al. [15] |
|
|
| CH4 |
| 2/6 |
|
| CH4 |
|
| Musical chair [7475] |
| Ensuring data privacy and security |
| [10][2021][2324][3233][4041][6667] |
|
| 4/8 |
|
| Model Partitioning |
|
| [8][ |
| 0/6 |
|
|
| Dianlei Xu et al. [10] |
|
| CH5 |
| 6/6 |
|
|
| Handling failure in edge devices |
| 3/8 |
|
| 0/6 |
|
| The researchers' work |
|
| 6/6 |
|
| 8/8 |
|
| 6/6 |
|
3. Research Methodology
3. Answering the RQs
The research methodology used in this paper consists of a Systematic Literature Review (SLR), where a rigorous protocol of searching the literature is defined and applied to extract information that answers specific research questions. The use of this methodology enables impartial results and an auditable process. This section details the methodology used in the review.
According to Brereton et al. [17], an SLR is performed procedurally through distinct processes. This proposal includes an initial phase called ’Plan Review’, which includes: (i) specifying research questions; (ii) developing review protocol; (iii) validating review protocol. In the second phase, ’Conduct Review’, the following are carried out: (iv) identifying relevant research; (v) selecting primary studies; (vi) assessing study quality; (vii) extracting required data; (viii) synthesising data. In the last phase, ’Document Review’, the activities of producing and validating the reports with the reviewed findings are performed, respectively: (ix) writing of the review report, and (x) validating the report.
3.1. RQ1—Research Challenges in Edge Intelligence (EI)
4. Answering the RQs
4.1. RQ1—Research Challenges in Edge Intelligence (EI)
In this section, the researchers summarize the challenges faced by the Edge Intelligence (EI) paradigm that the analyzed studies either mentioned or aimed to tackle. The discussion presented in this section aims to provide answers to RQ1: What are the main challenges and open issues in the distributed learning field?
As mentioned earlier, performing ML techniques at the edge of the network promises to bring several benefits, but it raises several challenges. As this field is still in its beginning, solutions to such challenges are still being investigated. The surveyed studies tackle several challenges, which can be broadly grouped into six categories, displayed in
Table 2 and described in what follows.
Table 2.
Challenges in distributed machine learning in edge computing.
|
CH6 |
|
| Heterogeneity and low quality of data |
|
CH1 consists of dealing with the typical low processing power of edge devices. Edge devices often have little processing capacity, mainly when compared to the powerful data centers at the cloud. On the other hand, many ML applications require high computational power that outweighs the possibilities of resource-constrained IoT and edge devices. Limited resources also include memory and storage capacities. NN and ML algorithms generally require storing of and access to a handful of parameters that describe the model architecture and weight values forming the classification model. With limited storage, it may not be possible to have continued access to the original training data, or the data may have been removed altogether to free up space. Therefore, a significant challenge is reducing memory access and storing the data locally to avoid costly reading and writing to external memory modules.
CH2 consists of ensuring the energy efficiency of edge devices without compromising the accuracy of the system. In general, the higher the complexity of the required processing, the more energy is consumed. Edge devices can be battery-powered. In these cases, the energy consumption of algorithms must be minimized to reach energy efficiency. However, this should be done with care so as not to compromise the quality of the data generated and the decisions/inferences made. So, there is an important trade-off to be managed.
CH3 concerns communication issues, where edge intelligence models must consider that the devices might face poor connectivity. In such cases, the model update time in training tasks may be delayed. Valerio, Passarella and Conti
[1718] claim that the inference is highly sensitive to the available bandwidth in communication. Challenges in communication include network traffic, fluctuations in the bandwidth, intermittent or unavailable connectivity.
CH4 is related to data privacy and security. Several applications in edge intelligence handle sensitive data, such as healthcare. Thus, distributed ML algorithms must be able to preserve user privacy and information security when data are transferred throughout the devices. Distributed Edge-Intelligence (EI) has multiple points of vulnerability to possible malicious attacks or leakage of confidential or important data in the ML workflow.
CH5 is the challenge posed by failures in edge devices. Since devices might fail at some point, the distributed algorithm must consider ways to overcome this situation. Lastly, heterogeneity and lack of quality in available data rise challenge CH6. For most ML algorithms, especially in supervised machine learning, high accuracy depends on the high quality of training data. However, this often does not apply in edge intelligence scenarios, where the collected data are sparse and unlabelled
[10]. Distributed edge intelligence can handle data from different sources in different formats and is subject to noise. The application must handle noise and heterogeneity in the sensed data used as input to attain good accuracy.
Table 3 presents references to each of the described challenges, as well as studies that propose approaches to tackle these challenges. This table aims to only show an overview on the number of papers by each challenge. The researchers can observe that challenge CH1 is the one with more papers present in literature. All of the cited works are better described later
in herethis paper.
Table 3.
References to the challenges of Edge Intelligence.
|
| References |
|
| Works That Tackle the Challenges |
|
| CH1 |
|
| [10][15][1819][1920][2021][2122][2223][2324][2425][2526][2627][2728] |
|
[ |
94 |
95 |
] |
|
|
Data Quantization |
|
Model Partitioning |
|
|
Data compression by jointly considering compression rate and model accuracy. A latency-aware deep decoupling strategy to minimize the overall execution latency is employed. Decouples a deep NN to run a part of it at edge devices and the other part inside the conventional cloud. |
|
When correlating the EI strategies with frameworks, it is possible to notice some interesting associations. There are ten of these techniques and strategies, of which only three are present in more than 60% of the papers. They are: (i) Model Compression with 24%, (ii) Model Partitioning with 20%, (iii) Data Quantization with 17%. Federated Learning, Right-Sizing, Gossip Averaging and Model Selector correspond to 9% each. The others have less than 8%.
Figure 1 illustrates these ten classes of strategies.
Figure 1.
Edge Intelligence strategies.
Among these strategies, Model Compression is the most suitable for solving the process of training and testing with the raw data and reducing the dimensionality in real-time. This strategy allows ML algorithms to have faster responses, using lower resources of bandwidth, power and processing. In addition, this technique has proven to be more economical and better at data security once the processing is realized entirely on the edge. In terms of algorithms, the most common is the DNN paradigm of machine learning, which segments models into successive parts (layers). This algorithm allows for the deployment of each part on distinguished sites (model partitioning). DNN also enables compression techniques such as removing nodes or layers, allowing offloading of a whole model in resource-constrained devices.
EI techniques tackle latency problems when part of the entire process is realized on edge devices, decreasing data traffic on the network and, consequently, decreasing the inherent delay in data transmission. Regarding security and privacy issues, it is possible to train and infer on sensitive data partially or fully at the edge, preventing their risky propagation throughout the network, where they are susceptible to attacks.
3.4. RQ4—Edge Intelligence Application Domains
4.4. RQ4—Edge Intelligence Application Domains
In this section, the researchers present a taxonomy to characterize the application domains where the field of EI has been adopted, providing inputs to answer the RQ4. According to the researched articles, it was possible to group them into six main domains: (i) Industry, (ii) Surveillance, (iii) Security, (iv) Intelligent Transport, (v) Health, and (vi) Energy Management. This does not mean that other domains cannot be created due to new research.
Figure 2 illustrates this taxonomy up to a third level.
Table 5 shows the works that tackle these domains.
Figure 3 summarizes the statistics of the six domains of the publishing by field.
Figure 2.
EI application domains.
Figure 3.
Publications by domain application.
Table 5.
Application domains and corresponding works.
| Domains |
|
| Works That Approach the Theme |
|
| Industry (8) |
|
| [8][2728][4748][4950][6465][9596][9697][9798] |
| [14][2425][2728][2829][2930][3031][3132][3233][3334][3435][3536][3637][3738][3839][3940][4041][4142][4243][4344][4445][4546][4647][4748][4849][4950][5051][5152][5253][5354][5455][5556][5657][5758][5859][5960][6061][ |
| Surveillance (5) |
|
| [3839][6566][9899][99100][100101] |
|
| Security (4) |
|
| [9][2526][6768][101102] |
|
| Intelligent Transport Systems (ITS) (13) |
|
| [2223][3637][3940][4142][4748][4849][5253][5455][7172][102103][103104][104105][105106], |
| 9][10][2021][4041][4748][6768][6869][6970] |
|
| Musical Chair aims at alleviating the compute cost and overcoming the resource barrier by distributing their computation: data parallelism and model parallelism. |
|
| CH5 |
|
| AAIoT [7576] |
| Health (14) |
| [10][2324] |
|
|
| Model Partitioning |
|
| [13][2122][2425][4344][4445][4748][6364][66 |
| – |
|
67 | ][106107][107108][108109][109110][110111][111112] |
|
| Accurate segmenting NNs under multi-layer IoT architectures |
|
| CH6 |
|
| MobileNet [4243] |
|
|
| Energy Management (4) |
| [10] |
| [3435][2021][4041][7071][7172] |
|
| Model Compression |
| Model Selector |
| [ |
| [10][3435] |
| Presented by Google Inc., the two hyperparameters introduced allow the model builder to choose the right sized model for the specific application. |
|
46 | 47][4748][112113]
|
| Squeezenet |
|
| Model Compression |
|
| It is a reduced DNN that achieves AlexNet-level accuracy with 50 times fewer parameters |
|
| Tiny-YOLO |
|
| Model Compression |
|
| Tiny Yolo is a very lite NN and is hence suitable for running on edge devices. It has an accuracy that is comparable to the standard AlexNet for small class numbers but is much faster. |
|
| BranchyNet |
|
| Right sizing |
|
| Open source DNN training framework that supports the early-exit mechanism. |
|
| TeamNet [7677] |
|
| Model Compression |
| Transfer Learning |
|
| TeamNet trains shallower models using the similar but downsized architecture of a given SOTA (state of the art) deep model. The master node compares its uncertainty with the worker’s and selects the one with the least uncertainty as to the final result. |
|
| OpenEI [4243] |
|
| Model Compression |
| Data Quantization |
| Model Selector |
|
| The algorithms are optimized by compressing the size of the model, quantizing the weight. The model selector will choose the most suitable model based on the developer’s requirement (the default is accuracy) and the current computing resource. |
|
| TensorFlow Lite [7778] |
|
| Data Quantization |
|
| TensorFlow’s lightweight solution, which is designed for mobile and edge devices. It leverages many optimization techniques, including quantized kernels, to reduce the latency. |
|
| QNNPACK (Quantized Neural Networks PACKage) [7879] |
|
| Data Quantization |
|
| Developed by Facebook, is a mobile-optimized library for high-performance NN inference. It provides an implementation of common NN operators on quantized 8-bit tensors. |
|
| ProtoNN [7980] |
|
| Model Compression |
|
| Inspired by k-Nearest Neighbor (KNN) and could be deployed on the edges with limited storage and computational power. |
|
| EMI-RNN [8081] |
|
| Right Sizing |
|
| It requires 72 times less computation than standard Long Short term Memory Networks (LSTM) and improves its accuracy by 1%. |
|
| CoreML [8182] |
|
| Model Compression |
| Data Quantization |
|
| Published by Apple, it is a deep learning package optimized for on-device performance to minimize memory footprint and power consumption. Users are allowed to integrate the trained machine learning model into Apple products, such as Siri, Camera, and QuickType. |
|
| DroNet [3334] |
|
| Model Compression |
| Data Quantization |
|
| The DroNet topology was inspired by residual networks and was reduced in size to minimize the bare image processing time (inference). The numerical representation of weights and activations reduces from the native one, 32-bit floating-point (Float32), down to a 16-bit fixed point one (Fixed16). |
|
| Stratum [8283] |
|
| Model Selector |
| Dynamic Scheduling |
|
| Stratum can select the best model by evaluating a series of user-built models. A resource monitoring framework within Stratum keeps track of resource utilization and is responsible for triggering actions to elastically scale resources and migrate tasks, as needed, to meet the ML workflow’s Quality of Services (QoS). ML modules can be placed on the edge of the Cloud layer, depending on user requirements and capacity analysis. |
|
| Efficient distributed deep learning (EDDL) [5354] |
|
| Model Compression |
| Model Partitioning |
| Right-Sizing |
|
| A systematic and structured scheme based on balanced incomplete block design (BIBD) used in situations where the dataflows in DNNs are sparse. Vertical and horizontal model partition and grouped convolution techniques are used to reduce computation and memory. To speed up the inference, BranchyNet is utilized. |
|
| In-Edge AI [5] |
|
| Federated Learning |
|
| Utilizes the collaboration among devices and edge nodes to exchange the learning parameters for better training and inference of the models. |
|
| Edgence [8384] |
|
| Blockchain |
|
| Edgence (EDGe + intelligENCE) is proposed to serve as a blockchain-enabled edge-computing platform to intelligently manage massive decentralized applications in IoT use cases. |
|
| FederatedAveraging (FedAvg) [8485] |
|
| Federated Learning |
|
| Combines local stochastic gradient descent (SGD) on each client with a server that performs model averaging. |
|
| SSGD [8586] |
|
| Federated Learning |
|
| System that enables multiple parties to jointly learn an accurate neural network model for a given objective without sharing their input datasets. |
|
| BlockFL [8687] |
|
| Blockchain |
| Federated Learning |
|
| Mobile devices’ local model updates are exchanged and verified by leveraging blockchain. |
|
| Edgent [6] |
|
| Model Partitioning |
| Right-Sizing |
|
| Adaptively partitions DNN computation between the device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference. DNN right-sizing accelerates DNN inference through the early exit at a proper intermediate DNN layer to further reduce the computation latency. |
|
| PipeDream [8788] |
|
| Model Partitioning |
|
| PipeDream keeps all available GPUs productive by systematically partitioning DNN layers among them to balance work and minimize communication. |
|
| GoSGD [8889] |
|
| Gossip Averaging |
|
| Method to share information between different threads based on gossip algorithms and showing good consensus convergence properties. |
|
| Gossiping SGD [8990] |
|
| Gossip Averaging |
|
| Asynchronous method that replaces the all-reduce collective operation of synchronous training with a gossip aggregation algorithm. |
|
| GossipGraD [9091] |
|
| Gossip Averaging |
|
| Asynchronous communication of gradients for further reducing the communication cost. |
|
| INCEPTIONN [9192] |
|
| Data Quantization |
|
| Lossy-compression algorithm for floating-point gradients. The framework reduces the communication time by 70.9 80.7% and offers 2.2 3.1× speedup over the conventional training system while achieving the same level of accuracy. |
|
| Minerva [9293] |
|
| Data Quantization |
| Model compression |
|
| Quantization analysis minimizes bit widths without exceeding a strict prediction error bound. Compared to a 16-bit fixed-point baseline, Minerva reduces power consumption by 1.5×. Minerva identifies operands that are close to zero and removes them from the prediction computation such that model accuracy is not affected. Selective pruning further reduces power consumption by 2.0× on top of bit width quantization. |
|
| AdaDeep [9394] |
|
| Model Compression |
|
| Automatically selects a combination of compression techniques for a given DNN that will lead to an optimal balance between user-specified performance goals and resource constraints. AdaDeep enables up to 9.8× latency reduction, 4.3× energy efficiency improvement, and 38× storage reduction in DNNs while incurring negligible accuracy loss. |
|
| JALAD |