Large-Scale Service Function Chaining in Smart City: Comparison
Please note this is a comparison between Version 1 by Prohim Tam and Version 2 by Rita Xu.

Smart cities leverage the Internet of Things (IoT) to collect data from various sources and employ data-driven approaches to improve the management, evaluation, and decision-making processes. From a core network perspective, service function chaining (SFC) is an enabling paradigm for elastically controlling the massive network services (NS) from IoT-empowered smart cities. SFC can effectively enforce policies and regulations set by city authorities, including data retention policies, content filtering, compliance checks, etc. Moreover, SFC optimizes service delivery, resource efficiency, quality of experience (QoE)/quality of service (QoS), and service-specific routing. To activate all the beneficial factors, mobile network operators need solutions to reflect SFC orchestration policies while ensuring efficient resource utilization and preserving QoS in large-scale networking congestion states.

  • graph neural networks
  • quality of service
  • service function chaining
  • smart city
  • virtual network functions

1. IoT Service Composition and QoS Class Identifier (QCI) for Smart City Applications

IoT signifies the advancement of smart cities by offering technical functionalities, such as sensor installation, information exchange protocols, and massive stream data, to enable the integration of smart technologies, industries, and management [1][23]. On top of these functionalities, IoT service composition in smart cities involves combining multiple services from various data sources to develop impactful policies. The existing service composition mechanisms were initially designed for static enterprise services, which lack the ability to address the scalability challenges posed by IoT systems [2][24]. Semantic web service composition has leveraged semantic descriptions to enhance the efficiency and effectiveness of discovering and composing IoT services, which addresses challenges of heterogeneity and dynamism [3][25]. Furthermore, machine or deep (reinforcement) learning offers various solutions that can streamline IoT service composition by discovering services from diverse sources, selecting them based on criteria such as functionality and QoS, and automating the composition process (e.g., deep reinforcement learning for moving IoT services [4][26], genetic algorithms for QoS-based composition [5][27], and machine learning-driven QoS-aware service composition [6][28]). To analyze the QoS factors for the service compositions, key considerations include availability, response time, scalability, cost, and reliability, which ensure that the composed services are responsive, cost-effective, and flexible for end users and urban development initiatives [7][29]. QCI is used to define the characteristics and requirements of different traffic types, which ensures that the network controller can effectively prioritize data flows based on specific demands. While there is no specific standard for labeling smart city use cases, certain QCIs from 3GPP TS 23.203 V12.2.0 [8][30] can be used to represent the relevant example services. Table 1 presents the QCI-index, resource type, priority level, packet delay budget (PDB), and packet error loss rate (PELR) for smart city examples. In terms of resource types, guaranteed bit rate (GBR) assures minimum bandwidth to the end users even if the network is congested. In contrast, non-GBR provides the end users with optimal service, but there is no guarantee that the end user will always get the requested bandwidth. PDB and PELR represent the upper-bound thresholds as maximum tolerable delays and packet loss between end users to the policy and charging enforcement function. Each QCI-index is associated to a smart city use case, and reswearchers can describe it with examples as follows:
Table 1.
Background studies on standardized QCI-associated smart city examples.
QCI-Index Resource Type Priority Level PDB PELR Smart City Examples
QCI 1 (Conversational Voice) GBR 2 100 ms 10−2 Smart Emergency Services
QCI 2 (Conversational Video) 3 150 ms 10−3 Smart Surveillance
QCI 4 (Buffered Streaming) 5 300 ms 10−6 Smart PIDs
QCI 70 (Mission Critical Data) Non-GBR
Smart Transportation
QCI 9 (Background) 9 300 ms
  • QCI 1 (conversational voice) is typically used for voice communication services, including smart emergency services (e.g., emergency response systems, public safety networks, or other government services).
5.5
200 ms
10−6 Smart Infrastructure Control
QCI 79 (V2X messages) 6.5 50 ms 10−2
10−6 Smart Waste Management
  • QCI 2 (conversational video) is designed for real-time video surveillance systems in traffic monitoring, object detection, facial recognition, or other visual monitoring applications.
  • QCI 4 (buffered streaming) is suitable for smart public information displays (PIDs) such as real-time information and interactive features to engage and inform the public (e.g., digital signage, message boards, or even live event streaming).
  • QCI 70 (mission critical data) is designed for services that require low latency and high reliability, including smart applications to control critical infrastructure in the city (e.g., utility grid management, building automation, or electricity distribution).
  • QCI 79 (V2X messages) is indicated for the components of modern urban mobility by gathering information on vehicles-to-vehicles, vehicles-to-infrastructure, vehicles-to-pedestrians, and vehicle-to-networks to develop a smart transportation system. With guaranteed performance indicators, smart transportation offers traffic flow optimization and safety enhancement in the city.
  • QCI 9 (background) is intended for applications that have lower priority and can tolerate delay, including sensor data collection, environmental monitoring, waste management, or other non-real-time data flows.
Table 1 is organized to provide a background for specifying smart city use cases and their required metrics, in accordance with the standard that network providers and operator policies should be followed. With all these services activated, urban life is enhanced and made more efficient, sustainable, and livable.

2. Related Works on GNN-Based Optimization

Deep learning integration has been used for various aspects of network optimization [9][10][31,32] and in the context of large-scale SFC management in smart cities [11][12][33,34]. GNN is a class of deep learning models designed for analyzing and extracting hidden features from graph-structured data. GNN operates by propagating information across nodes and edges in a graph, which captures key relationships and dependencies within the data [13][14][35,36]. In a networking MANO perspective, GNN can generate representations from graph-based SFC data and predict the chain performance, which can be used for various tasks such as load balancing and congestion control [15][37]. In [16][38], a new neural network architecture for SFC based on GNN was proposed. The encoder-decoder architecture indicated representations of network topology and estimated probabilities of neighbor nodes and VNF processing. The GNN-based modeling outperforms the baseline deep neural network model and provides flexibility to topology changes. In [17][39], a knowledge-defined networking system was proposed to predict the optimal path for SFC deployment and traffic steering using GNN. GNN-based modeling, RouteNet [18][40], was used to extract hidden information on network topology, routing, and traffic metrics for predicting the delay and loss ratio from source to destination.

3. Working Flow of Large-Scale SFC with GNN

TheOur orchestration policy consists of two primary objectives, namely QoS guarantee and efficient VNF backup, for ensuring high availability and fault tolerance in large-scale and high congestion of SFC request rates. The system model prioritizes each smart city service criticality following the upper-bound delay and remaining resources. ResearchersIn this study, we focus on GNN node classification that detects the efficiency of VNF- instances following the service criticality. To avoid single-server failure, when the duplicating decision is set, the proposed large-scale SFC with GNN (LS-SFC-GNN) spreads the VM-VNF placement on different physical nodes. Each physical node with a set of VNF-f placement has an assigned feature vector. In theour use case of large-scale SFC for smart city applications, the features contain 6-tuple information as follows: (1) node indicator and the output decision variable in initial timeslot, (2) resource capacity, (3) expected latency, (4) current loading, (5) operating statues (whether VNF node is currently operational or standby), and (6) service upper-bound requirement. The input features are used to create feature vectors for each VNF node in the SFC graph.

By obtaining the feature vector, message aggregation is executed. For each node and VNF, messages from the neighboring nodes and sequential instances are jointly operated. ResWearchers use 3 different aggregation methods for 3 following conditions: (1) if all neighboring nodes are related to the current service chain; therefore, the proposed system can capture the cumulative impact of neighboring nodes on the feature representation of the central node, (2) if all neighboring nodes are balanced for future duplications, and (3) if there is different link bandwidth capacities that have bias values to guides the next VNF instances for duplicating in a changing physical node. After the aggregated message is obtained, the algorithm proceeds with the combination with the current feature using the update function to get the hidden features. The update function can be a neural network layer or a sequence of layers.

The integrated GNN consists of multiple layers of message passing. In each layer, the nodes exchange information with the neighbors and update the features. The number of layers is reflected in the message aggregation and update. After all layers of message passing, the final feature vectors represent the nodes in the graph and capture information from the local and global neighborhoods. Later, rwesearchers apply a classification head to the final node representations for predicting the class probabilities on decision variables. The objective is to indicate whether the node requires duplication or is still efficient in the current timeslot. The GNN module is iteratively executed to minimize the loss using backpropagation and gradient descent. ResearchersWe compute the gradients for the model parameters, which can be learned weights in the aggregation and update functions.

After the output of GNN is obtained, the policy adjusts to managing and orchestrating VNF instances while leveraging SDN flow rule installation. ResWearchers study large-scale request rates of SFC; therefore, the placement of VNF backup and duplication decisions are emphasized by following the orchestration conditions as follows:

  • If current loading and operating statutes weight the class probability on decision variable output 0, the orchestration policy duplicates that particular VNF in other physical nodes. The allocation's decision variable is re-configured to alter the placement, and the SDN controller installs the flow rule accordingly.
  • Otherwise, if the total time consumption approximation from GNN output reflects 0, rwesearchers reconsider the decision variable (allocation) to check for alternative VNF instances that can be matched.
  • Furthermore, resource allocation properties (CPU, Memory, Disk) are re-adjusted to ensure that the approximation satisfies the QoS constraints.

4. Experiment Settings

Figure 1This section presents the simulation architecture with multi-purpose VNFs and chaining isolation to support our SFC orchestration policy on offering alternative VNF instances and duplicating on different VM resource blocks. Figure 1 is given to represent the ingress data sources and egress end-user interfaces of each smart city service. ResearchWers propose an intelligent service function (ISF) to perform modern services; however, due to the lack of real-world open-source data of smart city SFC, researcherswe approximate the executing delays of each ISF by following the maximum thresholds of computing time in each QCI class. Each VM placement is determined based on the criticality and resource consumption of each service, corresponding to the nodes in the experiment.

Figure 1. Use cases of smart city applications from ingress data sources to the egress end-user interfaces, namely (a) smart emergency services, (b) smart surveillance, (c) smart PIDs, (d) smart infrastructure control, (e) smart transportation, and (f) smart waste management.

For example, the case of smart surveillance begins with live video camera streaming to feed service S1, namely video analytics (object detection). TIn theis study, our primary focus is on scaling SFC; therefore, the intelligent model responsible for executing S1 is configured to be well-trained with high accuracy to preemptively address any potential issues. S1 is allocated to 5 VMs, which is higher than the allocation for other services in the chain because it demands a significant computational capacity due to early staging video input and preprocessing requirements. ResearchersWe replicated different VM amounts for various services in the chain to ensure that each service receives the appropriate level of computational resources and can operate optimally. The allocation of VMs and vCPU is based on theour model output, specific resource requirements, criticality weights, and resource consumption of each service. However, the allocation of VMs is dynamic in the experiments, which refers to the feasibility of changing replication in case of overloading congestion detected in other services.

Table 2 gives the detailed deployment properties of each use case in theour experiment including bandwidth (Mbps), vCPU, RAM, and replication VMs. The modular architecture with multi-purpose VNFs is designed to serve various purposes within the SFC process by globalizing the service objectives and expecting to be chained in a flexible and scalable way (using LS-SFC-GNN output).

Table 2.

Deployment properties of each use case with its proposed ISFs.

Use Case

ISF

Bandwidth (Mbps)

vCPU

RAM (GB)

Replication VMs

Smart Emergency Services

Emergency call or data handling

100

4

8

3

Action modelling and recommendation

50

2

4

5

Resource dispatch

80

3

6

3

Emergency response coordination

120

5

10

2

Smart Surveillance

Object Detection

150

6

12

5

Video Streaming

200

8

16

3

Anomaly Detection

100

4

8

2

Facial Recognition

80

3

6

2

Smart PIDs

API for data aggregation

100

4

8

3

Content scheduling

50

2

4

4

Real-time data feeds

120

5

10

3

Display analytics

80

3

6

4

Smart Infrastructure Control

API for data aggregation

120

6

12

2

Real-time analytics and detection

150

6

12

5

Command settings

80

3

6

3

Predictive maintenance

120

5

10

2

Smart Transportation

Vehicle state gathering

100

4

8

5

Traffic prediction

100

4

8

2

Route optimization

120

5

10

3

Policy-making

80

3

6

2

Smart Waste Management

API for data aggregation

100

4

8

3

Waste collection scheduling

60

3

6

4

Driving route planning

100

5

10

3

Environment impact evaluation

70

4

8

3

The high-performance hosting infrastructure is used for splitting the computational demands of large-scale smart city simulation as listed in Table 3Table 2 (one experiment runtime, one use case). The maximum tolerable delay ranging from 5ms to 15ms per ISF (post-train) indicates that the data points flow through the well-trained model with converged accuracies. This assumption follows the real-time data processing and delivery to meet the stringent requirements of 4-VNF per smart city SFC. The simulation models a wide range of SFC request rates, which vary from 100 to 1000 requests per second. The delay on links is constrained to a maximum of 2ms. Within 2000-, 5 different congestion levels are configured to input the high rates of requests and generate large-scale congestion to answer theour research questions. Pytorch is used for building GNN models, and further hyperparameters of GNN, such as learning rate, batch size, number of epochs, dropout rate, and activation function, are set to 0.01, 64, 1000, 0.3, and ReLU-Sigmoid, respectively.

Table 3.

Key simulation parameters.

Parameter

Specifications

Hosting infrastructure

Intel(R) Xeon(R) Silver 4280 CPU @ 2.10 GHz, 128 GB, NVIDIA Quadro RTX 4000 GPU

Maximum tolerable delay per ISF (post-train)

5ms to 15ms

SFC request rate

100/s to 1000/s

Number of VNFs in a single chain

4

Delay on links

≤ 2ms

Simulation timeslot

2000- (5 congestion-level)

GNN platform

Python (Pytorch)

Learning Rate

0.01

Batch Size

64

Number of Epochs

1000

Dropout Rate

0.3

Activation Function

ReLU and Sigmoid

We follow this setting to capture the experiment results. The simulation leverages the GNN-based approach by utilizing a set of parameters and specifications as outlined in Table 3. For further result discussions on this platform setup, please kindly check https://doi.org/10.3390/electronics12194018.

 

ScholarVision Creations