Large-Scale Service Function Chaining in Smart City

Large-Scale Service Function Chaining in Smart City: Comparison

Please note this is a comparison between Version 2 by Rita Xu and Version 1 by Prohim Tam.

Smart cities leverage the Internet of Things (IoT) to collect data from various sources and employ data-driven approaches to improve the management, evaluation, and decision-making processes. From a core network perspective, service function chaining (SFC) is an enabling paradigm for elastically controlling the massive network services (NS) from IoT-empowered smart cities. SFC can effectively enforce policies and regulations set by city authorities, including data retention policies, content filtering, compliance checks, etc. Moreover, SFC optimizes service delivery, resource efficiency, quality of experience (QoE)/quality of service (QoS), and service-specific routing. To activate all the beneficial factors, mobile network operators need solutions to reflect SFC orchestration policies while ensuring efficient resource utilization and preserving QoS in large-scale networking congestion states.

graph neural networks
quality of service
service function chaining
smart city
virtual network functions

1. IoT Service Composition and QoS Class Identifier (QCI) for Smart City Applications

IoT signifies the advancement of smart cities by offering technical functionalities, such as sensor installation, information exchange protocols, and massive stream data, to enable the integration of smart technologies, industries, and management [23]^[1]. On top of these functionalities, IoT service composition in smart cities involves combining multiple services from various data sources to develop impactful policies. The existing service composition mechanisms were initially designed for static enterprise services, which lack the ability to address the scalability challenges posed by IoT systems [24]^[2]. Semantic web service composition has leveraged semantic descriptions to enhance the efficiency and effectiveness of discovering and composing IoT services, which addresses challenges of heterogeneity and dynamism [25]^[3]. Furthermore, machine or deep (reinforcement) learning offers various solutions that can streamline IoT service composition by discovering services from diverse sources, selecting them based on criteria such as functionality and QoS, and automating the composition process (e.g., deep reinforcement learning for moving IoT services [26]^[4], genetic algorithms for QoS-based composition [27]^[5], and machine learning-driven QoS-aware service composition [28]^[6]). To analyze the QoS factors for the service compositions, key considerations include availability, response time, scalability, cost, and reliability, which ensure that the composed services are responsive, cost-effective, and flexible for end users and urban development initiatives [29]^[7]. QCI is used to define the characteristics and requirements of different traffic types, which ensures that the network controller can effectively prioritize data flows based on specific demands. While there is no specific standard for labeling smart city use cases, certain QCIs from 3GPP TS 23.203 V12.2.0 [30]^[8] can be used to represent the relevant example services. Table 1 presents the QCI-index, resource type, priority level, packet delay budget (PDB), and packet error loss rate (PELR) for smart city examples. In terms of resource types, guaranteed bit rate (GBR) assures minimum bandwidth to the end users even if the network is congested. In contrast, non-GBR provides the end users with optimal service, but there is no guarantee that the end user will always get the requested bandwidth. PDB and PELR represent the upper-bound thresholds as maximum tolerable delays and packet loss between end users to the policy and charging enforcement function. Each QCI-index is associated to a smart city use case, and wresearchers can describe it with examples as follows:

Table 1.

Background studies on standardized QCI-associated smart city examples.

QCI-Index	Resource Type	Priority Level	PDB	PELR	Smart City Examples
QCI 1 (Conversational Voice)	GBR	2	100 ms	10⁻²	Smart Emergency Services
QCI 2 (Conversational Video)		3	150 ms	10⁻³	Smart Surveillance
QCI 4 (Buffered Streaming)		5	300 ms	10⁻⁶	Smart PIDs
QCI 70 (Mission Critical Data)	Non-GBR
Smart Transportation
QCI 9 (Background)		9	300 ms

QCI 1 (conversational voice) is typically used for voice communication services, including smart emergency services (e.g., emergency response systems, public safety networks, or other government services).

5.5
200 ms
10	⁻⁶	Smart Infrastructure Control
QCI 79 (V2X messages)	6.5	50 ms	10⁻²
10	⁻⁶	Smart Waste Management

QCI 2 (conversational video) is designed for real-time video surveillance systems in traffic monitoring, object detection, facial recognition, or other visual monitoring applications.
QCI 4 (buffered streaming) is suitable for smart public information displays (PIDs) such as real-time information and interactive features to engage and inform the public (e.g., digital signage, message boards, or even live event streaming).
QCI 70 (mission critical data) is designed for services that require low latency and high reliability, including smart applications to control critical infrastructure in the city (e.g., utility grid management, building automation, or electricity distribution).
QCI 79 (V2X messages) is indicated for the components of modern urban mobility by gathering information on vehicles-to-vehicles, vehicles-to-infrastructure, vehicles-to-pedestrians, and vehicle-to-networks to develop a smart transportation system. With guaranteed performance indicators, smart transportation offers traffic flow optimization and safety enhancement in the city.
QCI 9 (background) is intended for applications that have lower priority and can tolerate delay, including sensor data collection, environmental monitoring, waste management, or other non-real-time data flows.

Table 1 is organized to provide a background for specifying smart city use cases and their required metrics, in accordance with the standard that network providers and operator policies should be followed. With all these services activated, urban life is enhanced and made more efficient, sustainable, and livable.

2. Related Works on GNN-Based Optimization

Deep learning integration has been used for various aspects of network optimization [31,32]^[9][10] and in the context of large-scale SFC management in smart cities [33,34]^[11][12]. GNN is a class of deep learning models designed for analyzing and extracting hidden features from graph-structured data. GNN operates by propagating information across nodes and edges in a graph, which captures key relationships and dependencies within the data [35,36]^[13][14]. In a networking MANO perspective, GNN can generate representations from graph-based SFC data and predict the chain performance, which can be used for various tasks such as load balancing and congestion control [37]^[15]. In [38]^[16], a new neural network architecture for SFC based on GNN was proposed. The encoder-decoder architecture indicated representations of network topology and estimated probabilities of neighbor nodes and VNF processing. The GNN-based modeling outperforms the baseline deep neural network model and provides flexibility to topology changes. In [39]^[17], a knowledge-defined networking system was proposed to predict the optimal path for SFC deployment and traffic steering using GNN. GNN-based modeling, RouteNet [40]^[18], was used to extract hidden information on network topology, routing, and traffic metrics for predicting the delay and loss ratio from source to destination.

3. Working Flow of Large-Scale SFC with GNN

OurThe orchestration policy consists of two primary objectives, namely QoS guarantee and efficient VNF backup, for ensuring high availability and fault tolerance in large-scale and high congestion of SFC request rates. The system model prioritizes each smart city service criticality following the upper-bound delay and remaining resources. In this study, weResearchers focus on GNN node classification that detects the efficiency of VNF- instances following the service criticality. To avoid single-server failure, when the duplicating decision is set, the proposed large-scale SFC with GNN (LS-SFC-GNN) spreads the VM-VNF placement on different physical nodes. Each physical node with a set of VNF-f placement has an assigned feature vector. In ourthe use case of large-scale SFC for smart city applications, the features contain 6-tuple information as follows: (1) node indicator and the output decision variable in initial timeslot, (2) resource capacity, (3) expected latency, (4) current loading, (5) operating statues (whether VNF node is currently operational or standby), and (6) service upper-bound requirement. The input features are used to create feature vectors for each VNF node in the SFC graph.

By obtaining the feature vector, message aggregation is executed. For each node and VNF, messages from the neighboring nodes and sequential instances are jointly operated. WResearchers use 3 different aggregation methods for 3 following conditions: (1) if all neighboring nodes are related to the current service chain; therefore, the proposed system can capture the cumulative impact of neighboring nodes on the feature representation of the central node, (2) if all neighboring nodes are balanced for future duplications, and (3) if there is different link bandwidth capacities that have bias values to guides the next VNF instances for duplicating in a changing physical node. After the aggregated message is obtained, the algorithm proceeds with the combination with the current feature using the update function to get the hidden features. The update function can be a neural network layer or a sequence of layers.

The integrated GNN consists of multiple layers of message passing. In each layer, the nodes exchange information with the neighbors and update the features. The number of layers is reflected in the message aggregation and update. After all layers of message passing, the final feature vectors represent the nodes in the graph and capture information from the local and global neighborhoods. Later, wresearchers apply a classification head to the final node representations for predicting the class probabilities on decision variables. The objective is to indicate whether the node requires duplication or is still efficient in the current timeslot. The GNN module is iteratively executed to minimize the loss using backpropagation and gradient descent. WeResearchers compute the gradients for the model parameters, which can be learned weights in the aggregation and update functions.

After the output of GNN is obtained, the policy adjusts to managing and orchestrating VNF instances while leveraging SDN flow rule installation. WResearchers study large-scale request rates of SFC; therefore, the placement of VNF backup and duplication decisions are emphasized by following the orchestration conditions as follows:

If current loading and operating statutes weight the class probability on decision variable output 0, the orchestration policy duplicates that particular VNF in other physical nodes. The allocation's decision variable is re-configured to alter the placement, and the SDN controller installs the flow rule accordingly.
Otherwise, if the total time consumption approximation from GNN output reflects 0, wresearchers reconsider the decision variable (allocation) to check for alternative VNF instances that can be matched.
Furthermore, resource allocation properties (CPU, Memory, Disk) are re-adjusted to ensure that the approximation satisfies the QoS constraints.

4. Experiment Settings

ThisFigure 1 section presents the simulation architecture with multi-purpose VNFs and chaining isolation to support our SFC orchestration policy on offering alternative VNF instances and duplicating on different VM resource blocks. Figure 1 is given to res given to represent the ingress data sources and egress end-user interfaces of each smart city service. WeResearchers propose an intelligent service function (ISF) to perform modern services; however, due to the lack of real-world open-source data of smart city SFC, weresearchers approximate the executing delays of each ISF by following the maximum thresholds of computing time in each QCI class. Each VM placement is determined based on the criticality and resource consumption of each service, corresponding to the nodes in the experiment.

Figure 1. Use cases of smart city applications from ingress data sources to the egress end-user interfaces, namely (a) smart emergency services, (b) smart surveillance, (c) smart PIDs, (d) smart infrastructure control, (e) smart transportation, and (f) smart waste management.

For example, the case of smart surveillance begins with live video camera streaming to feed service S1, namely video analytics (object detection). In tThis study, our e primary focus is on scaling SFC; therefore, the intelligent model responsible for executing S1 is configured to be well-trained with high accuracy to preemptively address any potential issues. S1 is allocated to 5 VMs, which is higher than the allocation for other services in the chain because it demands a significant computational capacity due to early staging video input and preprocessing requirements. WeResearchers replicated different VM amounts for various services in the chain to ensure that each service receives the appropriate level of computational resources and can operate optimally. The allocation of VMs and vCPU is based on ourthe model output, specific resource requirements, criticality weights, and resource consumption of each service. However, the allocation of VMs is dynamic in the experiments, which refers to the feasibility of changing replication in case of overloading congestion detected in other services.

Table 2 gives the detailed deployment properties of each use case in ourthe experiment including bandwidth (Mbps), vCPU, RAM, and replication VMs. The modular architecture with multi-purpose VNFs is designed to serve various purposes within the SFC process by globalizing the service objectives and expecting to be chained in a flexible and scalable way (using LS-SFC-GNN output).

Table 2.

Deployment properties of each use case with its proposed ISFs.

Use Case	ISF	Bandwidth (Mbps)	vCPU	RAM (GB)	Replication VMs
Smart Emergency Services	Emergency call or data handling	100	4	8	3
	Action modelling and recommendation	50	2	4	5
	Resource dispatch	80	3	6	3
	Emergency response coordination	120	5	10	2
Smart Surveillance	Object Detection	150	6	12	5
	Video Streaming	200	8	16	3
	Anomaly Detection	100	4	8	2
	Facial Recognition	80	3	6	2
Smart PIDs	API for data aggregation	100	4	8	3
	Content scheduling	50	2	4	4
	Real-time data feeds	120	5	10	3
	Display analytics	80	3	6	4
Smart Infrastructure Control	API for data aggregation	120	6	12	2
	Real-time analytics and detection	150	6	12	5
	Command settings	80	3	6	3
	Predictive maintenance	120	5	10	2
Smart Transportation	Vehicle state gathering	100	4	8	5
	Traffic prediction	100	4	8	2
	Route optimization	120	5	10	3
	Policy-making	80	3	6	2
Smart Waste Management	API for data aggregation	100	4	8	3
	Waste collection scheduling	60	3	6	4
	Driving route planning	100	5	10	3
	Environment impact evaluation	70	4	8	3

The high-performance hosting infrastructure is used for splitting the computational demands of large-scale smart city simulation as listed in Table 2Table 3 (one experiment runtime, one use case). The maximum tolerable delay ranging from 5ms to 15ms per ISF (post-train) indicates that the data points flow through the well-trained model with converged accuracies. This assumption follows the real-time data processing and delivery to meet the stringent requirements of 4-VNF per smart city SFC. The simulation models a wide range of SFC request rates, which vary from 100 to 1000 requests per second. The delay on links is constrained to a maximum of 2ms. Within 2000-, 5 different congestion levels are configured to input the high rates of requests and generate large-scale congestion to answer ourthe research questions. Pytorch is used for building GNN models, and further hyperparameters of GNN, such as learning rate, batch size, number of epochs, dropout rate, and activation function, are set to 0.01, 64, 1000, 0.3, and ReLU-Sigmoid, respectively.

Table 3.

Key simulation parameters.

Parameter	Specifications
Hosting infrastructure	Intel(R) Xeon(R) Silver 4280 CPU @ 2.10 GHz, 128 GB, NVIDIA Quadro RTX 4000 GPU
Maximum tolerable delay per ISF (post-train)	5ms to 15ms
SFC request rate	100/s to 1000/s
Number of VNFs in a single chain	4
Delay on links	≤ 2ms
Simulation timeslot	2000- (5 congestion-level)
GNN platform	Python (Pytorch)
Learning Rate	0.01
Batch Size	64
Number of Epochs	1000
Dropout Rate	0.3
Activation Function	ReLU and Sigmoid

We follow this setting to capture the experiment results. The simulation leverages the GNN-based approach by utilizing a set of parameters and specifications as outlined in Table 3. For further result discussions on this platform setup, please kindly check https://doi.org/10.3390/electronics12194018.

References

Kim, T.-H.; Ramos, C.; Mohammed, S. Smart City and IoT. Future Gen. Comput. Syst. 2017, 76, 159–162.
Arellanes, D.; Lau, K.-K. Evaluating IoT Service Composition Mechanisms for the Scalability of IoT Systems. Future Gen. Comput. Syst. 2020, 108, 827–848.
Rhayem, A.; Mhiri, M.B.A.; Gargouri, F. Semantic Web Technologies for the Internet of Things: Systematic Literature Review. Internet Things 2020, 11, 100206.
Neiat, A.; Bouguettaya, A.; Bahutair, M. A Deep Reinforcement Learning Approach for Composing Moving IoT Services. IEEE Trans. Serv. Comput. 2022, 15, 2538–2550.
AllamehAmiri, M.; Derhami, V.; Ghasemzadeh, M. QoS-Based Web Service Composition Based on Genetic Algorithm. J. AI Data Min. 2013, 1, 63–73.
De Sanctis, M.; Muccini, H.; Vaidhyanathan, K. Data-driven Adaptation in Microservice-based IoT Architectures. In Proceedings of the IEEE International Conference on Software Architecture Companion (ICSA-C) 2020, Salvador, Brazil, 16–20 March 2020; pp. 59–62.
Asghari, P.; Amir, M.R.; Hamid, H.S.J. Service Composition Approaches in IoT: A Systematic Review. J. Netw. Comput. Appl. 2018, 120, 61–77.
3GPP TS 23.203 V17.2.0; Technical Specification Group Services and System Aspects; Policy and Charging Control Architecture. ETSI: Valbonne, France, 2021.
Khan, M.T.; Adholiya, A. Machine Learning-Based Application for Predicting 5G/B5G Service. In Proceedings of the 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 19–20 January 2023.
Zou, D.; Sun, G.; Li, Z.; Xi, G.; Wang, L. Incremental Strategy-based Residual Regression Networks for Node Localization in Wireless Sensor Networks. KSII Trans. Internet Inf. Syst. 2022, 16, 2627–2647.
Zhou, F.; Yu, P.; Feng, L.; Qiu, X.; Wang, Z.; Meng, L.; Kadoch, M.; Gong, L.; Yao, X. Automatic Network Slicing for IoT in Smart City. IEEE Wirel. Commun. 2020, 27, 108–115.
Kirimtat, A.; Krejcar, O.; Kertesz, A.; Tasgetiren, M.F. Future Trends and Current State of Smart City Concepts: A Survey. IEEE Access 2020, 8, 86448–86467.
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2017, arXiv:1609.02907.
Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24.
Tam, P.; Song, I.; Kang, S.; Ros, S.; Kim, S. Graph Neural Networks for Intelligent Modelling in Network Management and Orchestration: A Survey on Communications. Electronics 2022, 11, 3371.
Heo, D.; Lange, S.; Kim, H.-G.; Choi, H. Graph Neural Network Based Service Function Chaining for Automatic Network Control. In proceeding of the 2020 21st Asia-Pacific Network Operations and Management Symposium (APNOMS), Daegu, Republic of Korea, 22–25 September 2020.
Rafiq, A.; Khan, T.A.; Afaq, M.; Song, W.-C. Service Function Chaining and Traffic Steering in SDN Using Graph Neural Network. In Proceedings of the 2020 International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 21–23 October 2020.
Rusek, K.; Suarez-Varela, J.; Almasan, P.; Barlet-Ros, P.; Cabellos-Aparicio, A. RouteNet: Leveraging Graph Neural Networks for Network Modeling and Optimization in SDN. IEEE J. Sel. Areas Commun. 2020, 38, 2260–2270.