Deep Reinforcement Learning for Vision-Based Navigation of UAVs

This entry is adapted from the peer-reviewed paper 10.3390/drones7040245

Unmanned Aerial Vehicles (UAVs), also known as drones, have advanced greatly in recent years. There are many ways in which drones can be used, including transportation, photography, climate monitoring, and disaster relief. The reason for this is their high level of efficiency and safety in all operations. While the design of drones strives for perfection, it is not yet flawless. When it comes to detecting and preventing collisions, drones still face many challenges. In this context, this research describes a methodology for developing a drone system that operates autonomously without the need for human intervention. This research applies reinforcement learning algorithms to train a drone to avoid obstacles autonomously in discrete and continuous action spaces based solely on image data. The research compare three different reinforcement learning strategies—namely, Deep Q-Networks (DQN), Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC)—that can assist in avoiding obstacles, both stationary and moving The novelty of this research lies in its comprehensive assessment of the advantages, limitations, and future research directions of obstacle detection and avoidance for drones, using different reinforcement learning techniques. The findings could have practical implications for the development of safer and more efficient drones in the future.

autonomous navigation collision avoidance deep reinforcement learning drones

1. Introduction

UAVs are reshaping the aviation industry and have emerged as the potential successors of conventional aircraft. Without the assistance of a pilot, they are proving to be extremely convenient when it comes to reaching the most remote regions. Today, drones are available in a variety of shapes, sizes, ranges, specifications, and equipment. They serve various purposes, such as transporting commodities, taking photos and videos, monitoring climate change, and conducting search operations after natural disasters ^[1]. The development of UAVs has a significant impact on the market and economy and a variety of industries. Despite drones being available for a long time, research on their abilities for autonomous obstacle avoidance has only recently gained attention from the scientific community. Humans are heavily involved in most drone practices. Drones with autonomous capabilities, however, can be extremely useful in emergencies since they provide immediate situational awareness and direct response efforts without the need to have a pilot on-site. Security and inspection issues can also be addressed using autonomous drone systems. With predefined GPS coordinates for the departure and destination, researchers initially focused on self-navigation drones that could determine the optimal route and arrive at a predetermined location without human assistance ^[2]. Despite the GPS navigation claim of collision avoidance, it is still possible for a drone to collide with a tree, a building, or another drone during its flight.

Different strategies have been implemented in vision-based navigation systems to tackle the problem of obstacle detection and avoidance. The most popular ones, among others, are based on geometric relations, fuzzy logic, potential fields, and neural networks ^[3]. In recent years, the field of deep reinforcement learning (DRL) has experienced exponential growth in both its research and applications. Reinforcement learning (RL) can address a variety of complicated decision-making tasks that were previously beyond the capacity of a machine to function like a human and solve problems in the real world. The ability of autonomous drones to detect and avoid obstacles with a high degree of accuracy is considered a challenging task. Applying reinforcement learning in averting collisions can provide a model which can dynamically adapt to the environment. In RL-based techniques, the agent attempts to learn in a virtual world that is analogous to a real-world setting, and then the trained model is applied to a real drone for testing ^[4]. When training RL models for effective collision avoidance, minimising the gap between the real world and the training environment is imperative. There should be a close correlation between the training environment and the real world. Different obstacle avoidance scenarios should be taken into account during training. Choosing the best reinforcement learning model is essential to obtaining the best result.

2. The Autonomous Navigation of UAVs

Autonomous Drone Based on Computer Vision and Geometric Relations ^[3]^[5]^[6]^[7]^[8]^[9]^[10]: Early research in autonomous drone navigation focused on computer vision algorithms and geometric relationships to detect and navigate around obstacles. Various approaches, including real-time obstacle detection algorithms and adaptive navigation algorithms based on geometric relationships, were explored. These methods, although computationally inexpensive, had limitations in fully addressing the navigation problem. Researchers also investigated techniques using SIFT detectors, SURF feature matching, and template matching for obstacle identification, emphasizing the challenges of frontal collision detection with monocular imagery.

Autonomous Drone Based Supervised Learning ^[11]^[12]^[13]: Supervised learning methods for drone navigation involved training models to recognize patterns and relationships between input data and output labels. Examples include control algorithms for drones navigating forest environments and object detection algorithms using deep neural networks (DNN) and computer vision. While these approaches showed improvements in obstacle-distance estimation accuracy and computational efficiency, they often relied heavily on specific features and preplanned GPS tracks.

Autonomous Drone Based on Reinforcement Learning ^[4]^[14]^[15]^[16]^[17]: Reinforcement learning (RL) emerged as a promising approach for drone navigation, allowing agents to interact with an environment, optimize behaviour, and adapt to dynamic scenarios. Researchers explored RL algorithms such as Nature-DQN, Dueling-DQN, Deep Deterministic Policy Gradient (DDPG), and soft actor-critic methods. The papers highlight studies comparing RL algorithms in discrete and continuous action spaces, demonstrating their effectiveness in training drones for obstacle avoidance. However, challenges remain in scaling RL algorithms to handle large state spaces and addressing the complexity of navigating around dynamic obstacles.

3. Research Gap

Drone missions may vary based on weather, lighting, and terrain. Existing methods may not cope with such changes. It is crucial to develop new approaches to drone navigation that are robust and adaptable to different environments. The research indicated that among other methods, researchers are looking into how to detect obstacles using geometric relations, supervised learning, and neural networks. As models based on geometric relations do not require training on large datasets, they can be more advantageous than other models. However, such models rely heavily on the precise positioning and orientation of drones to their surroundings. As a consequence, drones may have difficulties estimating their position in environments that have complex geometry or lighting conditions that change over time. In these cases, the drone’s position estimation may become inaccurate, resulting in navigation errors. Hence, major research was focused on supervised learning and reinforcement learning.

Although both reinforcement and supervised learning employ input-output mapping, reinforcement learning uses rewards and punishments as signals for good and bad behaviour. In contrast, supervised learning provides feedback in the form of the proper set of actions. However, supervised learning requires labelled data for all possible scenarios, which makes it hard to incorporate the complex and dynamic drone flight environment into a single dataset. The problem is addressed by increasing the dataset size, but it can be time-consuming and expensive. Furthermore, supervised learning is limited to the task it was trained on and does not allow for adapting to changes in the environment.

Significant research has been conducted, with most of it focused on navigating around static obstacles. Using reinforcement learning to avoid moving obstacles is possible but can be more challenging, as the drone has to predict the future movements of the obstacle and adjust its trajectory accordingly. It is essential that reinforcement learning is expanded to prevent collisions with dynamic moving obstacles. One way is to train the drone in different scenarios with dynamic obstacles in a simulation environment. By doing so, the drone will learn to deal with a wide range of obstacles and environments more robustly.

4. Conclusion

This study applies reinforcement learning algorithms to train a drone to avoid obstacles autonomously in discrete and continuous action spaces based solely on image data. The novelty of this study lies in its comprehensive assessment of the advantages, limitations, and future research directions of obstacle detection and avoidance for drones, using different reinforcement learning techniques. This study compares three different reinforcement learning strategies—namely, Deep Q-Networks (DQN), Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC)—that can assist in avoiding obstacles, both stationary and moving. The standout finding from this study is the superior performance of SAC, showcasing its effectiveness in navigating complex 3D environments with dynamic actors. DQN also demonstrated promising results, outperforming PPO due to its better sample efficiency. The conclusion that off-policy algorithms, exemplified by SAC and DQN, are more efficient in collision avoidance than on-policy algorithms, such as PPO, is a crucial takeaway. Overall, this study advances the understanding of reinforcement learning techniques in the context of drone autonomy, offering a foundation for further research and development in the pursuit of safer and more effective unmanned aerial vehicles.

References

Mohsan, S.A.H.; Khan, M.A.; Noor, F.; Ullah, I.; Alsharif, M.H. Towards the Unmanned Aerial Vehicles (UAVs): A Comprehensive Review. Drones 2022, 6, 147.
Kan, M.K.; Okamoto, S.; Lee, J.H. Development of drone capable of autonomous flight using GPS. In Proceedings of the International MultiConference of Engineers and Computer Scientists Vol II, Hong Kong, China, 14–16 March 2018.
Aldao, E.; González-deSantos, L.M.; Michinel, H.; González-Jorge, H. UAV Obstacle Avoidance Algorithm to Navigate in Dynamic Building Environments. Drones 2022, 6, 16.
Shin, S.-Y.; Kang, Y.-W.; Kim, Y.-G. Obstacle Avoidance Drone by Deep Reinforcement Learning and Its Racing with Human Pilot. Appl. Sci. 2019, 9, 5571.
Martins, W.M.; Braga, R.G.; Ramos, A.C.B.; Mora-Camino, F. A Computer Vision Based Algorithm for Obstacle Avoidance. In Information Technology—New Generations; Springer International Publishing: New York, NY, USA, 2018; pp. 569–575.
Sarmalkar, K.; Jain, S.; Hangal, S. Vision based Obstacle Avoidance System for Autonomous Aerial Systems, NTASU 2020. IJERT 2021, 9, 730–734. Available online: www.ijert.org (accessed on 10 January 2023).
Guo, J.; Liang, C.; Wang, K.; Sang, B.; Wu, Y. Three-Dimensional Autonomous Obstacle Avoidance Algorithm for UAV Based on Circular Arc Trajectory. Int. J. Aerosp. Eng. 2021, 2021, 1–13.
Al-Kaff, A.; García, F.; Martín, D.; de La Escalera, A.; Armingol, J. Obstacle Detection and Avoidance System Based on Monocular Camera and Size Expansion Algorithm for UAVs. Sensors 2017, 17, 1061.
Aswini, N.; Uma, S.V. Obstacle Detection in Drones Using Computer Vision Algorithm. In Communications in Computer and Information Science; Springer: Berlin/Heidelberg, Germany, 2019; Volume 968, pp. 104–114.
Mori, T.; Scherer, S. First results in detecting and avoiding frontal obstacles from a monocular camera for micro unmanned aerial vehicles. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, 6–10 May 2013; pp. 1750–1757.
Mannar, S.; Thummalapeta, M.; Saksena, S.K.; Omkar, S. Vision-based Control for Aerial Obstacle Avoidance in Forest Environments. IFAC-PapersOnLine 2018, 51, 480–485.
Zhai, X.; Liu, K.; Nash, W.; Castineira, D. Smart Autopilot Drone System for Surface Surveillance and Anomaly Detection via Customizable Deep Neural Network. In Proceedings of the International Petroleum Technology Conference, Dhahran, Saudi Arabia, 13–15 January 2020.
Fang, R.; Cai, C. Computer vision based obstacle detection and target tracking for autonomous vehicles. MATEC Web Conf. 2021, 336, 07004.
Yang, S.; Meng, Z.; Chen, X.; Xie, R. Real-time obstacle avoidance with deep reinforcement learning Three-Dimensional Autonomous Obstacle Avoidance for UAV. In Proceedings of the 2019 International Conference on Robotics, Intelligent Control and Artificial Intelligence—RICAI 2019, Shanghai, China, 20–22 September 2019; pp. 324–329.
Roghair, J.; Niaraki, A.; Ko, K.; Jannesari, A. A Vision Based Deep Reinforcement Learning Algorithm for UAV Obstacle Avoidance. In Intelligent Systems and Applications; Springer: Berlin/Heidelberg, Germany, 2022; Volume 294, pp. 115–128.
Rubí, B.; Morcego, B.; Pérez, R. Quadrotor Path Following and Reactive Obstacle Avoidance with Deep Reinforcement Learning. J. Intell. Robot Syst. 2021, 103, 62.
Xue, Z.; Gonsalves, T. Vision Based Drone Obstacle Avoidance by Deep Reinforcement Learning. AI 2021, 2, 366–380.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Amudhini P. Kalidas

Christy Jackson Joshua

View Times: 562

Update Date: 13 Dec 2023

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Amudhini P K	--	1525	2023-12-13 06:56:07	\|
2	format correct	Catherine Yang	-97 word(s)	1428	2023-12-13 07:27:08	\| \|
3	format correct	Catherine Yang	Meta information modification	1428	2023-12-13 07:28:28	\|