Due to their high flexibility, quadrotor unmanned aerial vehicles (QUAVs) have gained significant popularity in various applications, including parcel delivery
[1], precision agriculture
[2][3][4], search and rescue
[5][6], and surveillance
[7]. In these scenarios, the QUAV is typically required to autonomously navigate to a target position. However, in dense environments, unexpected obstacles can obstruct the path and lead to collisions. This task is even more challenging in a multi-agent system
[8][9]. Therefore, obstacle avoidance becomes a crucial task for ensuring safe path planning. The QUAV must find an unobstructed path to the target while considering its physical limitations. Conventional approaches to obstacle-free path planning include Dijkstra, probabilistic roadmap (PRM), rapidly-exploring random trees (RRT), and artificial potential field (APF)
[10][11][12][13]. Since these approaches do not assume the physical realization of the agent to be of a certain type, they are applicable to QUAV navigation. APF faces a critical challenge in dense environments. When multiple obstacles narrow the passageway to the target, the agent may experience oscillations due to the repulsive force created by the obstacles that conflicts with the attractive force towards the target. As a result, reaching the target smoothly becomes difficult.
2. Reducing Oscillations for Obstacle Avoidance in a Dense Environment Using Deep Reinforcement Learning and Time-Derivative of an Artificial Potential Field
Three major approaches to achieve obstacle-free path planning are identified. Since they do not set restrictions on the type of agents being used, these approaches are applicable to QUAV obstacle avoidance. The first approach is grid search, exemplified by algorithms such as Dijkstra, A*, and D*
[10][15][16]. These algorithms are guaranteed to find a viable least-cost path when the environment is finite. However, they suffer from the curse of dimensionality, as the number of vertices to be explored grows exponentially with the dimensionality of the working space. Additionally, the need for higher control resolutions further increases the computational burden. The second approach is sampling-based search, with PRM
[11] and RRT
[12] being typical examples. For example, Farooq et al.
[17] guided the QUAV to avoid dangerous zones in a dynamic environment by computing the PRM. Compared to grid search, these algorithms are capable of quickly producing paths in high-dimensional spaces. However, since sampling-based methods rely on global information, such as environment boundaries, they are not well-suited for real-time control where sensor data are used to generate actions for the next step. The third approach is APF, which assumes that both the target and obstacles generate potential fields that influence the movement of the agent. By leveraging the gradient of the summed potential field, the agent is attracted to the target while being repelled from nearby obstacles. For example, Ma’arif et al.
[18] used APF to guide a single QUAV to reach the target and avoid collisions with dynamic obstacles. APF is widely used in QUAV navigation due to its ease of implementation and ability to guide the QUAV using only local information.
Despite its numerous advantages, APF has certain inherent limitations
[19]. One recognizable limitation is that the agent can easily become trapped in local minima, where the attraction and repulsion forces cancel each other out. Algorithms designed to address this problem can generally be classified into three categories. The first category is local minimum removal. For example, Kim et al.
[20] employed harmonic functions to construct potential fields, allowing for the selection of singularity locations and the elimination of local minima in free space. The second category is local minimum escape. For instance, Park et al.
[21] combined APF with simulated annealing, which introduces randomness to the agent’s actions, enabling it to escape from local minima. Wang et al.
[22] proposed the Left Turning scheme, which effectively handles U-shaped obstacles and helps the agent escape local minima. Lai et al.
[23] proposed a dynamic step-size adjustment method to help the multi-UAV escape the local minima. The third category is local minimum avoidance. For instance, Doria et al.
[24] utilized the deterministic annealing approach to expand the repulsive area and shrink the attractive area. This allows the agent to avoid local minima at the beginning, when the potential function is convex due to a high initial temperature. Additionally, Ge et al.
[25] addressed a specific case of the local minimum problem known as goals non-reachable with obstacles nearby (GNRON). They considered the relative distance between the agent and the target when designing the repulsive potential function. As the agent approaches the target, this function approaches zero, thereby reducing the repulsive force in the target’s vicinity and overcoming the local minimum problem. Overall, these different approaches highlight the efforts made to overcome the limitations of APF and improve its performance in various scenarios.
APF faces another dilemma, which is the occurrence of oscillations in narrow passages with densely distributed obstacles nearby. Previous research on oscillation reduction is relatively limited and mainly draws inspiration from optimization theory techniques. For instance, Ren et al.
[26] proposed the use of Levenberg’s modification of Newton’s method as a solution to oscillation problems. This approach incorporates second-order information by utilizing the Hessian matrix. Additionally, they adjusted the control law to maintain a constant speed, ensuring smooth movement of the agents. Biswas et al.
[27] further compared first-order gradient descent methods with two second-order approaches and concluded that Levenberg–Marquardt is more effective in generating smoother trajectories and improving convergence speeds.
The second branch of oscillation reduction algorithms introduces the concept of virtual obstacles or targets. Similar to LME used in tackling local minima, methods belonging to this branch often employ escaping techniques once oscillations are detected. For instance, Zhao et al.
[28] enhanced the manipulator’s predictive ability by incorporating dynamic virtual target points and utilized an extreme point jump-out function to escape oscillations. Zhang et al.
[29] employed tangent APF to avoid local oscillations and introduced the back virtual obstacle setting strategy-APF algorithm, which enables the agent to return to previous steps and withdraw from concave obstacles. In a rule-based fashion, Zheng et al.
[30] specified the condition for adding obstacles, compelling the resultant force to deflect when its angle to the obstacle center is too small. The dynamic step-size adjustment method proposed by Lai et al.
[23] is also able to escape the jitter area where a local minimum is encountered, but it does not address oscillations in other cases, such as narrow passageways.
Oscillations can also be mitigated by modifying the repulsive forces in a certain manner. Tran et al.
[31] estimated the projection of the repulsive force vector onto the attractive force vector and subtracted this component to prevent the agent from moving in the opposite direction of the attractive force. Martis et al.
[32] introduced vortex potential fields to achieve seamless cooperative collision avoidance between mobile agents, and these were validated using Lyapunov stability analysis. For larger obstacles with irregular shapes, Szczepanski
[33] combined the benefits of repulsive APF and vortex APF by defining multiple layers around the surface area of the obstacles. This approach surpassed traditional APF and pure vortex APF in terms of path smoothness.
There are also works specifically designed for QUAV path planning and obstacle avoidance. For example, Meradi et al.
[34] proposed a nonlinear model predictive control method based on quaternions for QUAVs’ obstacle avoidance. Valencia et al.
[35] constructed a QUAV obstacle detection and avoidance system using a monocular camera. Gageik et al.
[36] discussed the use of complementary low-cost sensors in QUAV collision avoidance. However, the problem of oscillation reduction in QUAV obstacle-free path planning based on APF is largely under-explored.
With the advancement in DRL technology, motion controllers based on DRL have gained widespread usage in QUAV path planning and obstacle avoidance by virtue of their strong fitting capability. In this context, the APF can be seen as an upper layer that indirectly or directly influences the agent’s current pursuit position, while the low-level motion control task is handled by the DRL agent. For instance, the RL environment may employ convolutional neural networks to receive information about the surrounding potential energy and generate estimated rewards for various actions
[37]. Xing et al.
[38] combined the enhanced APF method with deep deterministic policy gradient to navigate autonomous underwater vehicles.