基于DRL的6G负载均衡路由方案

基于DRL的6G负载均衡路由方案: Comparison

Please note this is a comparison between Version 2 by Jiaxin Song and Version 1 by Jiaxin Song.

由于空地一体化网络（Due to the rapid development of air-ground integrated network (SAGIN）的快速发展，卫星通信系统具有覆盖范围广、对地理环境要求低等优点，正逐渐成为6G的主要竞争技术。低地球轨道（LEO）卫星网络具有传输时延低、传播损耗小、覆盖全球等特点，其探索已成为当代卫星通信的主要研究对象。), satellite communication systems have the advantages of wide coverage and low geographical environment requirements, and are gradually becoming the main competitive technology of 6G. Low Earth orbit (LEO) satellite network has the characteristics of low transmission delay, low propagation loss and global coverage, and its exploration has become the main research object of contemporary satellite communication.

low earth orbit
satellite routing algorithm
deep reinforcement learning

1. 简介Introduction

近年来，天空地一体化网络发展迅速。卫星通信是In recent years, the air-ground integrated network has developed rapidly. Satellite communications are a key link in the 6G天空地一体化网络（SAGIN）的关键环节。它可以弥补5G地面网络的不足，提高网络覆盖范围，保证系统的容错性。还可以结合人工智能、大数据、物联网等技术，为用户提供多样化的服务。卫星通信比传统地面网络覆盖面积更大，全球适应性更强。它正逐渐成为下一代通信的主要竞争技术。低地球轨道（LEO）在卫星通信领域的重要性怎么强调都不为过。与地球同步轨道和中地球轨道星座相比，LEO因其低传输延迟，低传播损耗和覆盖世界的能力而脱颖而出。这些独特的功能使LEO成为各种应用的有吸引力的选择，包括互联网服务，全球定位系统和遥感。LEO的低传输延迟和低传播损耗使其成为实时通信等时间敏感型应用的理想选择，而其全球覆盖范围确保其适用于需要在偏远或难以到达的位置进行连接的应用。因此，LEO近年来获得了极大的关注和兴趣，导致新技术和算法的发展以提高其性能和效率也就不足为奇了。随着SAGIN的快速发展，传统的地面通信网络已经不能适应未来的发展。发展近地轨道卫星通信已经是一个大有可为的发展方向。 Space-Ground Integrated Network (SAGIN). It can make up for the shortcomings of 5G terrestrial networks, improve network coverage, and ensure the fault tolerance of the system. It can also combine artificial intelligence, big data, Internet of Things and other technologies to provide users with diversified services. Satellite communications cover a larger area and are more globally adaptable than traditional terrestrial networks. It is gradually becoming the main competitive technology for the next generation of communications. The importance of low Earth orbit (LEO) in satellite communications cannot be overstated. Compared to geosynchronous and medium-Earth orbit constellations, LEO stands out for its low transmission delay, low propagation loss, and ability to cover the world. These unique features make LEO an attractive choice for a variety of applications, including internet services, global positioning systems and remote sensing. LEO's low propagation delay and low propagation loss make it ideal for time-sensitive applications such as real-time communications, while its global coverage ensures it is suitable for applications that require connectivity in remote or hard-to-reach locations. Therefore, it is not surprising that LEO has gained great attention and interest in recent years, leading to the development of new technologies and algorithms to improve its performance and efficiency. With the rapid development of SAGIN, the traditional terrestrial communication network can no longer adapt to future development. The development of satellite communications in low Earth orbit is already a promising development direction.

在低地球轨道卫星网络中，卫星间链路（In a low-Earth orbit satellite network, inter-satellite links (ISL）确保卫星之间的通信。与地面通信网络相比，LEO卫星网络更频繁地改变其拓扑结构，具有更长的星间链路时延，并且在多用户区域中具有更频繁变化的链路状态。由于高速动态变化，传统路径选择方法的成本明显增加。因此，地面应用的路由协议很难直接在LEO卫星网络中使用。LEO卫星路由技术也是s) ensure communication between satellites. Compared to terrestrial communication networks, LEO satellite networks change their topology more frequently, have longer inter-satellite link delays, and have more frequently changing link states in multi-user areas. Due to high-speed dynamic changes, the cost of traditional path selection methods increases significantly. Therefore, routing protocols for terrestrial applications are difficult to use directly in LEO satellite networks. LEO satellite routing technology is also a supporting technology for the integration of 6G SAGIN遥感、通信、计算一体化的支撑技术;因此，有必要研究低轨道卫星网络中的路由算法。 remote sensing, communication and computing. Therefore, it is necessary to study routing algorithms in low-orbit satellite networks.

现有的大多数卫星路由算法都是基于地面网络路由算法开发的。这些算法中的大多数都基于最短路径。由于高纬度和低纬度卫星密度的差异以及用户分布密度的差异Most of the existing satellite routing algorithms are developed based on terrestrial network routing algorithms. Most of these algorithms are based on the shortest path. Due to the difference in satellite density between high and low latitudes and the difference in user distribution density [1]，同一星座卫星之间的负载差异较大。此外，随着卫星的高速运动，卫星之间的高负载覆盖区域也迅速变化。因此，传统的路由算法在满足当前卫星网络发展方面遇到了困难。, the load difference between satellites of the same constellation is large. In addition, with the high-speed movement of satellites, the high-load coverage area between satellites also changes rapidly. Therefore, traditional routing algorithms have encountered difficulties in meeting the current development of satellite networks.

深度学习算法的认知表现更好，强化学习算法的决策能力更强，深度强化学习将两者结合起来。在深度强化学习中，智能体通过与环境的交互做出决策，并通过试错获得反馈，学习最大化奖励和最小化惩罚Deep learning algorithms have better cognitive performance, reinforcement learning algorithms have stronger decision-making capabilities, and deep reinforcement learning combines the two. In deep reinforcement learning, agents make decisions through interaction with the environment and obtain feedback through trial and error, learning to maximize rewards and minimize penalties [2]。由于深度强化学习强大的感知和决策能力，越来越多的学者将这种学习应用于计算机视觉. Due to the powerful perception and decision-making ability of deep reinforcement learning, more and more scholars apply this learning to computer vision [3，4,4]、语音识别、自动驾驶, speech recognition, automatic driving [4，5 ,5]等多个领域。深度强化学习也适用于 and other fields. Deep reinforcement learning is also applicable in the field of LEO卫星网络领域。它可以感知拓扑变化、负载变化和网络参数，例如卫星网络中的延迟和带宽。它可以根据网络服务需求做出最佳决策。 satellite networks. It is aware of topology changes, load changes, and network parameters such as latency and bandwidth in satellite networks. It can make the best decisions based on network service needs.

2. 低地球轨道卫星路由Low Earth orbit satellite routing

The LEO卫星星座可分为两类，即基于倾斜轨道的沃克三角洲星座和基于极轨道的沃克星星座。如图 satellite constellations can be divided into two categories, namely the Walker Delta constellation based on inclined orbit and the Walker constellation based on polar orbit. As shown in Figure 1所示，铱星座, the iridium constellation [9]是一个典型的极轨道星座，因其具有代表性的星座结构和易于构建的数学模型而被许多学者研究为低轨道星座。 is a typical polar-orbiting constellation that has been studied by many scholars as a low-orbit constellation because of its representative constellation structure and easy-to-construct mathematical model.

图Figure 1.铱星座的插图。 Illustration of the Iridium constellation.

铱星座的轨道和拓扑如图The orbits and topology of the Iridium constellation are shown in Figure 2所示。由于卫星的移动性和连接的变化，拓扑结构迅速变化。在反向轨道上运行的卫星无法建立通信链路。此外，当卫星穿过极点时，通信链路会发生变化。由于动态拓扑结构带来的挑战，卫星网络路由算法引起了很多研究兴趣。该领域的工作主要分为以下两种，集中式路由算法和分布式路由算法。. Due to changes in satellite mobility and connectivity, the topology changes rapidly. Satellites operating in reverse orbit cannot establish communication links. In addition, when the satellite passes through the pole, the communication link changes. Due to the challenges brought by dynamic topology, satellite network routing algorithms have attracted a lot of research interest. The work in this field is mainly divided into the following two types, centralized routing algorithm and distributed routing algorithm.

图Figure 2.铱星卫星轨迹图。 Orbit map of the Iridium satellite.

分布式路由算法可以适应卫星网络的动态场景。这是因为该算法根据相邻卫星的状态（例如剩余带宽和队列利用率）确定下一跳。因此，当相邻卫星的状态发生变化时，算法可以快速感知到它，并根据动态环境快速决定路由策略。结合现有地面分布式路由算法的设计思路，作者在The distributed routing algorithm can adapt to the dynamic scenarios of satellite networks. This is because the algorithm determines the next hop based on the state of neighboring satellites, such as remaining bandwidth and queue utilization. Therefore, when the state of neighboring satellites changes, the algorithm can quickly sense it and quickly decide on a routing strategy based on the dynamic environment. Combined with the design ideas of the existing terrestrial distributed routing algorithm, the author studies the routing method of satellite network in [10]中研究了卫星网络的路由方法。通过充分考虑低地球轨道卫星的特点，改善了机载缓冲区空间。对数据包进行分类，并在. By taking full account of the characteristics of low-Earth orbit satellites, the airborne buffer space was improved. The packets are classified and the corresponding routing method is designed in [11]中设计了相应的路由方法。.

与分布式路由算法不同，中心化算法需要导出卫星网络的全局信息Unlike distributed routing algorithms, centralized algorithms need to export global information about satellite networks [12]。主控制节点首先收集全局信息，然后执行路由路径计算。获得路由结果后，它们将整个路由策略传输到其他节点。作者在. The master control node first collects global information and then performs routing path calculations. Once they have the routing results, they transmit the entire routing policy to the other nodes. The authors designed an improved distributed hierarchical routing protocol (DHRP) for satellite networks in [13]中为卫星网络设计了一种改进的分布式分层路由协议（. The protocol sets up master nodes and candidate nodes, so it has excellent routing performance compared to traditional discrete relaxation algorithms (DHRP）。该协议设置了主节点和候选节点，因此与传统的离散松弛算法（DRA）相比，具有出色的路由性能。RAs). The authors in [14]中的作者提出了一种混合全局 propose a hybrid global-本地负载平衡路由（HGL）算法。然而，当大规模的交通流量突然改变时，它是无效的。作者在local load balancing routing (HGL) algorithm. However, it is ineffective when large-scale traffic flows change suddenly. In [15]中提出了一种概率, the authors propose a probabilistic ISL路由（PIR）算法，其中利用通信延迟来评估路径选择性能。该算法还考虑了卫星间链路的成本。 routing (PIR) algorithm in which communication delay is used to evaluate path selection performance. The algorithm also takes into account the cost of inter-satellite links.

尽管上述算法在对低地球轨道卫星动力学的适应性方面取得了很大进展，但它们未能考虑卫星载荷仍然是一个重大缺陷。Although the above algorithms have made great progress in adapting to the dynamics of satellites in low Earth orbit, their failure to account for satellite payloads remains a major flaw.

3. 低地球轨道卫星的负载平衡Load balancing of low-Earth orbit satellites

低地球轨道卫星的星间链路长度随卫星的纬度而变化。传统的路由路径最短路径算法设计仅依赖于路径长度，导致纬度较高时的流量聚合The length of the intersatellite link for LEO satellites varies with the latitude of the satellite. The traditional route path shortest path algorithm design only relies on the path length, resulting in traffic aggregation at higher latitudes [1，1,16]。图. Figure 3 显示了使用shows a 3D schematic of the satellite traffic distribution created using NS3 网络模拟软件创建的卫星流量分布的 3D 示意图。黑点表示低地球轨道卫星，而线段表示承载流量的卫星间链路。每个分段的粗细和颜色分别对应于其带宽利用率和流量。较粗的线条表示较高的流量，而较深的颜色表示较高的带宽利用率。值得注意的是，LEO卫星网络主要在高纬度和人口稠密的地区经历拥塞。此外，地面网关站分布不均，造成卫星网络负荷失衡。用户流动性和全球人口分布也是影响交通流量分布的关键因素network simulation software. Black dots represent low-Earth orbiting satellites, while line segments represent inter-satellite links that carry traffic. The thickness and color of each segment correspond to its bandwidth utilization and traffic, respectively. Thicker lines indicate higher traffic, while darker colors indicate higher bandwidth utilization. Notably, the LEO satellite network experiences congestion mainly in high-latitude and densely populated areas. In addition, the uneven distribution of ground gateway stations creates an imbalance in the load of the satellite network. User mobility and global population distribution are also key factors affecting the distribution of traffic flow [17]。卫星的高机动性导致卫星之间的高负载覆盖区域快速变化. The high maneuverability of satellites leads to rapid changes in the coverage area of high loads between satellites [18，19 ,19]。.

图Figure 3.流量分布图。 Traffic distribution graph.

[17，20 ,20] A path-based load-balanced satellite routing algorithm is proposed to minimize 中提出了一种基于路径的负载平衡卫星路由算法，目的是最小化最大网络流量。该算法通过设置所有具有相同路径长度的卫星间链路并赋予所有路径相同的优先级，避免了高纬度地区的流量聚集。maximum network traffic. The algorithm avoids traffic aggregation at high latitudes by setting up all inter-satellite links with the same path length and giving all paths the same priority. The authors in [21]中的作者考虑到反向插槽和网关站之间的关系，将传输区域划分为重载范围和轻负载范围。重载范围采用拥堵指示器，利用权重最小的路径处理不均匀的交通流量分布。然而，这种方法需要整个网络的链路状态信息，无法实时做出决策 divide the transmission area into heavy load range and light load range, taking into account the relationship between the reverse slot and the gateway station. The overload range uses a congestion indicator to handle uneven traffic flow distribution with the least weighted path. However, this approach requires link-state information for the entire network and cannot make decisions in real time [22]。. [23，24 ,24]中提出了弹性负载均衡（ The elastic load balancing (ELB）算法，实现了卫星节点间拥塞信息的交换。因此，ELB 实现了负载平衡目标并避免了交通拥堵。利用队列的占用情况来确定卫星节点是空闲还是繁忙。当节点被标记为繁忙节点时，它会向其邻居节点发出消息以降低其传输速率。) algorithm is proposed to realize the exchange of congestion information between satellite nodes. As a result, ELB achieved its load balancing goals and avoided traffic congestion. Use the occupancy of the queue to determine whether satellite nodes are idle or busy. When a node is marked as busy, it sends messages to its neighbors to reduce its transmission rate. The TLR algorithm proposed in [25] 中提出的considers both the current state of congestion and the possible state of next-hop congestion. TLR 算法同时考虑了拥塞的当前状态和下一跳拥塞的可能状态。作者在he authors propose an iterative Dijkstra mechanism in [26]中提出了一种迭代D to select the best transmijkstra机制来选择负载平衡路由的最佳传输路径。ssion path for load-balanced routing. The authors in [27] 中的作者考虑了链路延迟，以进一步提高路由性能。在探索considered link latency to further improve routing performance. When exploring LEO路由算法时， routing algorithms,[28]利用合作博弈论来平衡负载和传输延迟之间的权衡。在 cooperative game theory is utilized to balance the trade-off between load and transmission delay. In [29]中，利用模糊理论实现了不同用户的需求。传输开销和路由收敛被评估为, fuzzy theory is used to realize the needs of different users. Transmission cost and route convergence are evaluated as key performance indicators in [30]中的关键性能指标。结合航迹预测，提出了按需动态路由算法。. Combined with track prediction, an on-demand dynamic routing algorithm is proposed. [31]中还考虑了能源消耗，以提高用户的服务质量。 Energy consumption is also taken into account to improve the quality of service for users.

现有文献强调了The existing literature highlights the advantages of LEO路由在负载性能方面的优势 routing in terms of load performance [32，33 ,33]。然而，与本地优化不足和动态适应性弱有关的挑战仍未解决，这可能阻碍低地球轨道卫星网络的发展。. However, challenges related to insufficient local optimization and weak dynamic adaptability remain unresolved, which could hinder the development of low-Earth orbit satellite networks.

4. 基于机器学习的卫星路由Machine learning-based satellite routing

复杂的卫星网络环境和动态的星间链路使得卫星路由算法难以计算。强化学习因其处理顺序决策问题的独特能力而广泛应用于各种新兴行业Complex satellite network environments and dynamic inter-satellite links make satellite routing algorithms difficult to calculate. Reinforcement learning is widely used in a variety of emerging industries due to its unique ability to deal with sequential decision problems [34]。在. Content caching issues were investigated in [35]. 中调查了内容缓存问题。Q学习算法被用于云内容分发系统。在Q learning algorithms are used in cloud content delivery systems. In [36]中，使用, Q学习识别物联网（IoT）中的拥塞链路以提高容错率。同样， learning is used to identify congested links in the Internet of Things (IoT) to improve fault tolerance. Similarly, Q-learning在 is used in [37]中用于提高无线传感器网络（ to increase the throughput of wireless sensor networks (WSN）的吞吐量并解决设备的能耗问题。在s) and solve the problem of energy consumption of devices. In [38]中，为了解决车联网中缓存、带宽等资源的最优分配问题，他们使用深度强化学习来求解模型。, in order to solve the optimal allocation of cache, bandwidth and other resources in the Internet of Vehicles, they use deep reinforcement learning to solve the model. [39]的作者在机器学习中使用深度确定性策略梯度（ The authors of the machine learning use deep deterministic policy gradient (DDPG）) [40]为天地一体化网络设计了一种集中式卫星路由算法 to design a centralized satellite routing algorithm for space-ground integrated networks [41，42 ,42]. ]。拟议战略的决策中心设在实地。决策中心实时获取全网的流量信息，决策后将路由信息发送给相关卫星。但是，这种策略的缺点是卫星通信的时延非常大，无法及时做出传输路由决策。网络负担增加 The decision-making centre for the proposed strategy is located in the field. The decision-making center obtains the traffic information of the entire network in real time, and sends the routing information to the relevant satellites after making the decision. However, the disadvantage of this strategy is that the delay of satellite communication is very large, and transmission routing decisions cannot be made in time. Increased network burden;因此，它不适合大规模使用。 Therefore, it is not suitable for mass use. [43]中提出了一种基于多智能体深度确定性策略梯度（ A routing algorithm based on multi-agent deep deterministic policy gradient (MADDPG）) [44]的路由算法，该算法是经过集中训练后部署在每颗卫星上的路由策略，解决了上述集中式路由算法的部分问题。集中训练方法在获取足够数据方面存在局限性，随着网络规模和深度强化学习算法训练复杂度的增加，这变得越来越困难。 is proposed, which is a routing strategy deployed on each satellite after centralized training, which solves some of the problems of the above centralized routing algorithm. Centralized training methods have limitations in obtaining sufficient data, which becomes increasingly difficult as the scale of the network and the training complexity of deep reinforcement learning algorithms increase.