Automatic Navigation Approaches for Flying Robots

Automatic Navigation Approaches for Flying Robots: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Contributor: Kaiyu Hu ,

Huanlin Li

, Jiafan Zhuang ,

Zhifeng Hao

, Zhun Fan

Various approaches to achieve autonomous flight have been proposed in the literature, which can be broadly categorized into the following three types: (1) Trajectory-based optimization methods: These methods involve designing a set of optimal trajectories that the robot should follow to reach its destination; (2) Imitation-learning-based methods; (3) Reinforcement-learning-based methods.

unmanned aerial vehicle
autonomous navigation
aerial robots

1. Introduction

Flying robots are among the most flexible man-made robots developed to date. With their high level of maneuverability, these robots can navigate through complex and challenging environments, including natural forests and modern urban buildings, and they can reach areas that most other human-made robots cannot. Their flexibility has led to the development of numerous applications, including environmental mapping ^[1]^[2], patrol inspection ^[3], search and rescue ^[4], logistics automation ^[5], entertainment performances ^[6], and agricultural automation ^[7]. Flying robots have an incredible level of autonomy, thereby enabling them to achieve previously impossible missions. To this end, the autonomous navigation capability of a flying robot enables it to safely interact with the environment and fly to its destination automatically without human intervention.

The autonomous navigation capability of the flying robot enables the aircraft to safely interact with the environment and fly to its destination automatically without human intervention. After years of continuous development and research ^[8]^[9], many previous research efforts have led to the development of flying robots in the field of full autonomy, thus freeing the hands of human expert pilots. However, developing end-to-end navigation methods that are capable of robust flight at high speeds in complex environments is a long-standing challenge that remains unsolved. Traditional trajectory-based optimization methods need to prebuild a mathematical model for the environment, which usually requires a map construction procedure, such as the ESDF map ^[10]^[11]^[12]^[13]^[14]^[15]^[16]. However, the mapping procedure tends to be time-consuming, and it is difficult to meet the real-time requirements. Imitation-learning-based methods train a policy generator based on a large amount of expert experiences ^[17]^[18]. However, by only imitating the human experience, the learned policy generator cannot handle unseen scenarios and would make a inappropriate decision.

In recent years, learning-based end-to-end algorithms such as soft actor–critic (SAC) ^[19] combined with convolutional neural networks have been investigated. SAC is an offline deep reinforcement learning (DRL) algorithm that employs a stochastic policy to maximize the sample utilization. These end-to-end algorithms can directly map the visual observations, attitude, and desired target information of a flying robot to the action output of the agent. In related work ^[20], a proposed sensor-level DRL-based policy surpassed traditional algorithms in complex pedestrian scenario navigation tasks using a ground robot platform, which was greatly impressive. Although neural-network-based methods may be less explainable, they are still preferred, since they do not require rigorous mathematical proofs or tedious theoretical analyses. In the study of Xue et al. ^[21], seven ranging sensors were used for the perception of the environment, and a reinforcement learning approach based on an actor–critic framework was used to the achieve autonomous navigation of the UAV in an unknown environment. Similarly, in Zhang et al.’s study ^[22], more than seven laser ranging sensors were used to sense the environment, and an improved TD3-based algorithm was used to realize an autonomous navigation task for a UAV in a multi-obstacle environment. However, both methods are dependent on ranging sensors, and neither of them can accurately perceive the environment in the UAV’s forward direction.

2. Automatic Navigation Approaches for Flying Robots

In the field of automatic navigation for flying robots, various approaches to achieve autonomous flight have been proposed in the literature, which can be broadly categorized into the following three types: (1) Trajectory-based optimization methods: These methods involve designing a set of optimal trajectories that the robot should follow to reach its destination. They commonly rely on mathematical models and algorithms to generate the trajectories, and they typically require accurate information about the environment and the robot’s dynamics; (2) Imitation-learning-based methods: These methods require a large amount of expert experience to fit an AI model that performs well in specific environments, but they have poorer generalization and exploration capabilities; (3) Reinforcement-learning-based methods: These methods represent a promising approach to achieving autonomous flight, which involves training an intelligent agent to learn how to navigate by interacting with its environment and receiving feedback in the form of rewards or penalties. Reinforcement-learning-based methods require a large amount of data for training, but they can use simulation software to obtain this data, thus making them more cost-effective than other methods. Additionally, the agent can continuously explore and learn from its environment, thus ultimately achieving comparable results to human experts.

2.1. Trajectory-Based Optimization Algorithms

Fast-Planner ^[23] and EGO-Planner ^[24] utilize certain search rules to find collision-free paths and optimize those paths for dynamic feasibility and smoothness. Fast-Planner features its stability, which is based on the approach of projecting depth images onto point clouds to construct ESDF maps and subsequently performing a path search and trajectory optimization. Since the planning algorithm needs to operate on the constructed ESDF map, the delay of the observation information becomes more prominent. This also means that, in order to achieve better performance, the speed of the flying robot must be strictly limited. Moreover, it should be noted that, due to the adaptive modification of the target point by trajectory optimization, Fast-Planner is not suitable for tasks in challenging environments requiring high-precision navigation. In navigation experiments conducted in complex scenes, the planner may exhibit conservative behaviors, because the target point does not impose a sufficient constraint on the behaviors, thereby resulting in a higher likelihood of task timeout without completion.

EGO-Planner is an improved planning algorithm that is based on Fast-Planner with an improved decision-making ability. This reduces the probability of task timeout while increasing the success rate. Interestingly, even when the planning horizon of EGO-Planner is increased several times, the algorithm still boldly explores and plans trajectories filled with exciting maneuvers such as frequent emergency turns for flying robots. In addition, the planner requires frequent restarts to reduce data errors.

For navigation in complex unknown environments, these typical algorithms combine online mapping and traditional planning algorithms. From an engineering perspective, splitting the navigation task into environmental perception and local planning is attractive, because each component can be performed in parallel, thereby making the overall system more efficient and interpretable. However, there is a time–space mismatch between the output of the perception module and the joint debugging of the planner, which makes the interaction between different stages amenable to compound errors to a large extent. Additionally, their sequential nature introduces additional delays that make maneuvering at high speeds and with agility difficult. Although these issues can be mitigated to some extent by manual tuning with expert knowledge, the divide-and-conquer principle that prevails in autonomous flight research in unknown environments commonly imposes fundamental limits on the speed and agility that flying robotic systems can achieve.

2.2. Imitation-Learning-Based Algorithms

Imitation-learning-based agents learn how to navigate by observing the trajectories of human experts or other robots that have completed specific tasks. Typically, a large volume of observational data is collected and used to train a neural network policy that can replicate an expert’s decision-making process. The policy then predicts the next action to be taken from the input observation data and achieves the navigation goal by executing those actions. Imitation-learning-based algorithms are simple to train and, with sufficient training data, robots can learn how to navigate on their own. However, if the training data are insufficient or noisy, the policy may fail to make an optimal decision. Additionally, since the algorithm learns and selects actions based on existing data, it may be unable to handle situations it has never seen before. Typical published studies ^[18]^[25]^[26] used an imitation learning algorithm to train a policy as closely as possible to the expert’s behavior. However, the policy was heavily dependent on the input experience.

2.3. Deep-Reinforcement-Learning-Based Algorithms

Recently, research on end-to-end robot navigation using DRL has become increasingly popular. Yarats et al. ^[27] proposed a SAC_AE policy with regularization constraints on the decoder loss. Then, in ^[28], Huang et al. used the regularized SAC_AE policy (SAC_RAE) to complete a distributed multiUAV collision avoidance task, where the flying robots were able to avoid each other and reach the target point using only the depth image from a front-facing deep camera. However, the validity of this policy was not well-demonstrated, because the experiment was conducted in an unobstructed open space. Following the success of the transformer ^[29] in the CV field, a combination of the transformer and reinforcement learning has been proposed in several works ^[30]^[31]^[32]. In these works, transformers were used to extract feature information from observation, which was then input into the policy network for learning, thereby achieving satisfactory results in their task scenarios.

However, scholars have noticed that the introduction of transformer modules in DRL may make policy training more challenging. Nevertheless, the literature suggests that it is theoretically possible to use vision transformers to build an encoder network for the perception module, which takes in all observation information (including depth images and agent state information), extracts latent variables, and computes the attention between them. In practice, transformer modules may lead to unstable learning, particularly in situations in which the agent’s action set is rich and continuous. Therefore, to address this issue, scholars explored methods to increase the receptive field of convolutional modules, rather than relying solely on the large receptive field advantage of transformer modules.

This entry is adapted from the peer-reviewed paper 10.3390/drones7100609

References

Christiansen, M.P.; Laursen, M.S.; Jørgensen, R.N.; Skovsen, S.; Gislum, R. Designing and testing a UAV mapping system for agricultural field surveying. Sensors 2017, 17, 2703.
Leduc, M.B.; Knudby, A.J. Mapping wild leek through the forest canopy using a UAV. Remote Sens. 2018, 10, 70.
Kim, S.; Kim, D.; Jeong, S.; Ham, J.W.; Lee, J.K.; Oh, K.Y. Fault diagnosis of power transmission lines using a UAV-mounted smart inspection system. IEEE Access 2020, 8, 149999–150009.
Tian, Y.; Liu, K.; Ok, K.; Tran, L.; Allen, D.; Roy, N.; How, J.P. Search and rescue under the forest canopy using multiple UAVs. Int. J. Robot. Res. 2020, 39, 1201–1221.
Chen, Z.; Alonso-Mora, J.; Bai, X.; Harabor, D.D.; Stuckey, P.J. Integrated task assignment and path planning for capacitated multi-agent pickup and delivery. IEEE Robot. Autom. Lett. 2021, 6, 5816–5823.
Loquercio, A.; Kaufmann, E.; Ranftl, R.; Dosovitskiy, A.; Koltun, V.; Scaramuzza, D. Deep drone racing: From simulation to reality with domain randomization. IEEE Trans. Robot. 2019, 36, 1–14.
Ju, C.; Son, H.I. Modeling and control of heterogeneous agricultural field robots based on Ramadge–Wonham theory. IEEE Robot. Autom. Lett. 2019, 5, 48–55.
Wu, K.; Wang, H.; Esfahani, M.A.; Yuan, S. Learn to navigate autonomously through deep reinforcement learning. IEEE Trans. Ind. Electron. 2021, 69, 5342–5352.
Hu, H.; Zhang, K.; Tan, A.H.; Ruan, M.; Agia, C.; Nejat, G. A sim-to-real pipeline for deep reinforcement learning for autonomous robot navigation in cluttered rough terrain. IEEE Robot. Autom. Lett. 2021, 6, 6569–6576.
Oleynikova, H.; Burri, M.; Taylor, Z.; Nieto, J.; Siegwart, R.; Galceran, E. Continuous-time trajectory optimization for online uav replanning. In Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Daejeon, Republic of Korea, 9–14 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 5332–5339.
Ding, W.; Gao, W.; Wang, K.; Shen, S. An efficient b-spline-based kinodynamic replanning framework for quadrotors. IEEE Trans. Robot. 2019, 35, 1287–1306.
Zhou, B.; Pan, J.; Gao, F.; Shen, S. Raptor: Robust and perception-aware trajectory replanning for quadrotor fast flight. IEEE Trans. Robot. 2021, 37, 1992–2009.
Oleynikova, H.; Taylor, Z.; Fehr, M.; Siegwart, R.; Nieto, J. Voxblox: Incremental 3d euclidean signed distance fields for on-board mav planning. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1366–1373.
Han, L.; Gao, F.; Zhou, B.; Shen, S. Fiesta: Fast incremental euclidean distance fields for online motion planning of aerial robots. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 4423–4430.
Ratliff, N.; Zucker, M.; Bagnell, J.A.; Srinivasa, S. CHOMP: Gradient optimization techniques for efficient motion planning. In Proceedings of the 2009 IEEE International Conference on Robotics and Automation, Kobe, Japan, 12–17 May 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 489–494.
Gao, F.; Lin, Y.; Shen, S. Gradient-based online safe trajectory generation for quadrotor flight in complex environments. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3681–3688.
Ishihara, K.; Kanervisto, A.; Miura, J.; Hautamaki, V. Multi-task learning with attention for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2902–2911.
Loquercio, A.; Kaufmann, E.; Ranftl, R.; Müller, M.; Koltun, V.; Scaramuzza, D. Learning high-speed flight in the wild. Sci. Robot. 2021, 6, eabg5810.
Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905.
Fan, T.; Long, P.; Liu, W.; Pan, J. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int. J. Robot. Res. 2020, 39, 856–892.
Xue, Y.; Chen, W. A UAV Navigation Approach Based on Deep Reinforcement Learning in Large Cluttered 3D Environments. IEEE Trans. Veh. Technol. 2022, 72, 3001–3014.
Zhang, S.; Li, Y.; Dong, Q. Autonomous navigation of UAV in multi-obstacle environments based on a Deep Reinforcement Learning approach. Appl. Soft Comput. 2022, 115, 108194.
Zhou, B.; Gao, F.; Wang, L.; Liu, C.; Shen, S. Robust and efficient quadrotor trajectory generation for fast autonomous flight. IEEE Robot. Autom. Lett. 2019, 4, 3529–3536.
Zhou, X.; Wang, Z.; Ye, H.; Xu, C.; Gao, F. Ego-planner: An esdf-free gradient-based local planner for quadrotors. IEEE Robot. Autom. Lett. 2020, 6, 478–485.
Karnan, H.; Warnell, G.; Xiao, X.; Stone, P. Voila: Visual-observation-only imitation learning for autonomous navigation. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 2497–2503.
Watkins-Valls, D.; Xu, J.; Waytowich, N.; Allen, P. Learning your way without map or compass: Panoramic target driven visual navigation. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October–24 January 2021; IEEE: Piscataway, NJ, USA, 2020; pp. 5816–5823.
Yarats, D.; Zhang, A.; Kostrikov, I.; Amos, B.; Pineau, J.; Fergus, R. Improving sample efficiency in model-free reinforcement learning from images. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 10674–10681.
Huang, H.; Zhu, G.; Fan, Z.; Zhai, H.; Cai, Y.; Shi, Z.; Dong, Z.; Hao, Z. Vision-based Distributed Multi-UAV Collision Avoidance via Deep Reinforcement Learning for Navigation. In Proceedings of the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Kyoto, Japan, 23–27 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 13745–13752.
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762.
Chen, L.; Lu, K.; Rajeswaran, A.; Lee, K.; Grover, A.; Laskin, M.; Abbeel, P.; Srinivas, A.; Mordatch, I. Decision transformer: Reinforcement learning via sequence modeling. Adv. Neural Inf. Process. Syst. 2021, 34, 15084–15097.
Huang, W.; Zhou, Y.; He, X.; Lv, C. Goal-guided Transformer-enabled Reinforcement Learning for Efficient Autonomous Navigation. arXiv 2023, arXiv:2301.00362.
Esslinger, K.; Platt, R.; Amato, C. Deep Transformer Q-Networks for Partially Observable Reinforcement Learning. arXiv 2022, arXiv:2206.01078.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.