基于DRL的6G负载均衡路由方案

基于DRL的6G负载均衡路由方案: Comparison

Please note this is a comparison between Version 2 by Jiaxin Song and Version 4 by Camila Xu.

Due to the rapid development of air-ground integrated network (SAGIN), satellite communication systems have the advantages of wide coverage and low geographical environment requirements, and are gradually becoming the main competitive technology of 6G. Low Earth orbit (LEO) satellite network has the characteristics of low transmission delay, low propagation loss and global coverage, and its exploration has become the main research object of contemporary satellite communication.

low earth orbit
satellite routing algorithm
deep reinforcement learning

1. Introduction

In recent years, the air-ground integrated network has developed rapidly. Satellite communications are a key link in the 6G Space-Ground Integrated Network (SAGIN). It can make up for the shortcomings of 5G terrestrial networks, improve network coverage, and ensure the fault tolerance of the system. It can also combine artificial intelligence, big data, Internet of Things and other technologies to provide users with diversified services. Satellite communications cover a larger area and are more globally adaptable than traditional terrestrial networks. It is gradually becoming the main competitive technology for the next generation of communications. The importance of low Earth orbit (LEO) in satellite communications cannot be overstated. Compared to geosynchronous and medium-Earth orbit constellations, LEO stands out for its low transmission delay, low propagation loss, and ability to cover the world. These unique features make LEO an attractive choice for a variety of applications, including internet services, global positioning systems and remote sensing. LEO's low propagation delay and low propagation loss make it ideal for time-sensitive applications such as real-time communications, while its global coverage ensures it is suitable for applications that require connectivity in remote or hard-to-reach locations. Therefore, it is not surprising that LEO has gained great attention and interest in recent years, leading to the development of new technologies and algorithms to improve its performance and efficiency. With the rapid development of SAGIN, the traditional terrestrial communication network can no longer adapt to future development. The development of satellite communications in low Earth orbit is already a promising development direction.

In a low-Earth orbit satellite network, inter-satellite links (ISLs) ensure communication between satellites. Compared to terrestrial communication networks, LEO satellite networks change their topology more frequently, have longer inter-satellite link delays, and have more frequently changing link states in multi-user areas. Due to high-speed dynamic changes, the cost of traditional path selection methods increases significantly. Therefore, routing protocols for terrestrial applications are difficult to use directly in LEO satellite networks. LEO satellite routing technology is also a supporting technology for the integration of 6G SAGIN remote sensing, communication and computing. Therefore, it is necessary to study routing algorithms in low-orbit satellite networks.

Most of the existing satellite routing algorithms are developed based on terrestrial network routing algorithms. Most of these algorithms are based on the shortest path. Due to the difference in satellite density between high and low latitudes and the difference in user distribution density [1], the load difference between satellites of the same constellation is large. In addition, with the high-speed movement of satellites, the high-load coverage area between satellites also changes rapidly. Therefore, traditional routing algorithms have encountered difficulties in meeting the current development of satellite networks.

Deep learning algorithms have better cognitive performance, reinforcement learning algorithms have stronger decision-making capabilities, and deep reinforcement learning combines the two. In deep reinforcement learning, agents make decisions through interaction with the environment and obtain feedback through trial and error, learning to maximize rewards and minimize penalties [2]. Due to the powerful perception and decision-making ability of deep reinforcement learning, more and more scholars apply this learning to computer vision ^[3][4][3,4], speech recognition, and automatic driving ^[4][5][4,5] and other fields. Deep reinforcement learning is also applicable in the field of LEO satellite networks. It is aware of topology changes, load changes, and network parameters such as latency and bandwidth in satellite networks. It can make the best decisions based on network service needs.

2. Low Earth orbit satellite routing

The LEO satellite constellations can be divided into two categories, namely the Walker Delta constellation based on inclined orbit and the Walker constellation based on polar orbit. As shown in Figure 1, the iridium constellation ^[6][9] is a typical polar-orbiting constellation that has been studied by many scholars as a low-orbit constellation because of its representative constellation structure and easy-to-construct mathematical model.

Figure 1

Illustration of the Iridium constellation.

The orbits and topology of the Iridium constellation are shown in Figure 2. Due to changes in satellite mobility and connectivity, the topology changes rapidly. Satellites operating in reverse orbit cannot establish communication links. In addition, when the satellite passes through the pole, the communication link changes. Due to the challenges brought by dynamic topology, satellite network routing algorithms have attracted a lot of research interest. The work in this field is mainly divided into the following two types, centralized routing algorithm and distributed routing algorithm.

Figure 2

Orbit map of the Iridium satellite.

The distributed routing algorithm can adapt to the dynamic scenarios of satellite networks. This is because the algorithm determines the next hop based on the state of neighboring satellites, such as remaining bandwidth and queue utilization. Therefore, when the state of neighboring satellites changes, the algorithm can quickly sense it and quickly decide on a routing strategy based on the dynamic environment. Combined with the design ideas of the existing terrestrial distributed routing algorithm, the author studies the routing method of satellite network in ^[7][10]. By taking full account of the characteristics of low-Earth orbit satellites, the airborne buffer space was improved. The packets are classified and the corresponding routing method is designed in ^[8][11].

Unlike distributed routing algorithms, centralized algorithms need to export global information about satellite networks ^[9][12]. The master control node first collects global information and then performs routing path calculations. Once they have the routing results, they transmit the entire routing policy to the other nodes. The authors designed an improved distributed hierarchical routing protocol (DHRP) for satellite networks in ^[10][13]. The protocol sets up master nodes and candidate nodes, so it has excellent routing performance compared to traditional discrete relaxation algorithms (DRAs). The authors in ^[11][14] propose a hybrid global-local load balancing routing (HGL) algorithm. However, it is ineffective when large-scale traffic flows change suddenly. In ^[12][15], the authors propose a probabilistic ISL routing (PIR) algorithm in which communication delay is used to evaluate path selection performance. The algorithm also takes into account the cost of inter-satellite links.

Although the above algorithms have made great progress in adapting to the dynamics of satellites in low Earth orbit, their failure to account for satellite payloads remains a major flaw.

3. Load balancing of low-Earth orbit satellites

The length of the intersatellite link for LEO satellites varies with the latitude of the satellite. The traditional route path shortest path algorithm design only relies on the path length, resulting in traffic aggregation at higher latitudes ^[1][13][1,16]. Figure 3 shows a 3D schematic of the satellite traffic distribution created using NS3 network simulation software. Black dots represent low-Earth orbiting satellites, while line segments represent inter-satellite links that carry traffic. The thickness and color of each segment correspond to its bandwidth utilization and traffic, respectively. Thicker lines indicate higher traffic, while darker colors indicate higher bandwidth utilization. Notably, the LEO satellite network experiences congestion mainly in high-latitude and densely populated areas. In addition, the uneven distribution of ground gateway stations creates an imbalance in the load of the satellite network. User mobility and global population distribution are also key factors affecting the distribution of traffic flow ^[14][17]. The high maneuverability of satellites leads to rapid changes in the coverage area of high loads between satellites ^[15][16][18,19].

Figure 3

Traffic distribution graph.

[17,20] A path-based load-balanced satellite routing algorithm is proposed to minimize maximum network traffic ^[14][17]. The algorithm avoids traffic aggregation at high latitudes by setting up all inter-satellite links with the same path length and giving all paths the same priority. The authors in ^[18][21] divide the transmission area into heavy load range and light load range, taking into account the relationship between the reverse slot and the gateway station. The overload range uses a congestion indicator to handle uneven traffic flow distribution with the least weighted path. However, this approach requires link-state information for the entire network and cannot make decisions in real time ^[19][22]. [23,24] The elastic load balancing (ELB) algorithm is proposed to realize the exchange of congestion information between satellite nodes ^[20][21]. As a result, ELB achieved its load balancing goals and avoided traffic congestion. Use the occupancy of the queue to determine whether satellite nodes are idle or busy. When a node is marked as busy, it sends messages to its neighbors to reduce its transmission rate. The TLR algorithm proposed in ^[22][25] considers both the current state of congestion and the possible state of next-hop congestion. The authors propose an iterative Dijkstra mechanism in ^[23][26] to select the best transmission path for load-balanced routing. The authors in ^[24][27] considered link latency to further improve routing performance. When exploring LEO routing algorithms ^[25],[28] cooperative game theory is utilized to balance the trade-off between load and transmission delay. In ^[26][29], fuzzy theory is used to realize the needs of different users. Transmission cost and route convergence are evaluated as key performance indicators in ^[27][30]. Combined with track prediction, an on-demand dynamic routing algorithm is proposed. [31] Energy consumption is also taken into account to improve the quality of service for users ^[28].

The existing literature highlights the advantages of LEO routing in terms of load performance ^[29][30][32,33]. However, challenges related to insufficient local optimization and weak dynamic adaptability remain unresolved, which could hinder the development of low-Earth orbit satellite networks.

4. Machine learning-based satellite routing

Complex satellite network environments and dynamic inter-satellite links make satellite routing algorithms difficult to calculate. Reinforcement learning is widely used in a variety of emerging industries due to its unique ability to deal with sequential decision problems ^[31][34]. Content caching issues were investigated in ^[32][35]. Q learning algorithms are used in cloud content delivery systems. In ^[33][36], Q learning is used to identify congested links in the Internet of Things (IoT) to improve fault tolerance. Similarly, Q-learning is used in ^[34][37] to increase the throughput of wireless sensor networks (WSNs) and solve the problem of energy consumption of devices. In ^[35][38], in order to solve the optimal allocation of cache, bandwidth and other resources in the Internet of Vehicles, they use deep reinforcement learning to solve the model. ^[36].[39] The authors of the machine learning use deep deterministic policy gradient (DDPG) ^[37][40] to design a centralized satellite routing algorithm for space-ground integrated networks ^[38][39][41,42]. ]。 The decision-making centre for the proposed strategy is located in the field. The decision-making center obtains the traffic information of the entire network in real time, and sends the routing information to the relevant satellites after making the decision. However, the disadvantage of this strategy is that the delay of satellite communication is very large, and transmission routing decisions cannot be made in time. Increased network burden; Therefore, it is not suitable for mass use. ^[40].[43] A routing algorithm based on multi-agent deep deterministic policy gradient (MADDPG) ^[41] [44] is proposed, which is a routing strategy deployed on each satellite after centralized training, which solves some of the problems of the above centralized routing algorithm. Centralized training methods have limitations in obtaining sufficient data, which becomes increasingly difficult as the scale of the network and the training complexity of deep reinforcement learning algorithms increase.