2. Multi-Robot Path Planning Approaches
Figure 1 presents the classification of multi-robot path-planning algorithms, and it is divided into three categories: classical approaches, heuristic algorithms, and artificial intelligence (AI)-based approaches. The subcategories are linked to the primary categories and only display the significant subcategories. The classical approaches include the Artificial potential field, sampling-based, and graph-based approaches. The heuristic algorithms mainly consist of A* and D* search algorithms. The AI-based approaches are the most common algorithms for multi-robot systems, and the bio-inspired approaches take most of the attention. Metaheuristic has been applied to most of the research, and the famous algorithms are PSO and GA. From
[26], GA and PSO are the most commonly used approaches.
Figure 1.
Classification of multi-robot path planning approaches.
2.1. Classical Approaches
2.1.1. Artificial Potential Field (APF)
The APF uses its control force for path planning, and the control force sums up the attractive and the repulsive potential field. The illustration of APF is shown in
Figure 2; the blue force indicates the attractive field, and the yellow force represents the repulsive field. The APF establishes path-planning optimization and dynamic particle models, and the additional control force updates the APF for multi-robot formation in a realistic and known environment
[27]. Another APF-based approach is presented for a multi-robot system in a warehouse.
Figure 2.
Illustration of the APF algorithm.
It uses the priority strategy and solves the drawbacks of traffic jams, local minima, collisions, and non-reachable targets
[28]. An innovative APF algorithm is proposed to find all possible paths under a discrete girded environment. It implements a time-efficient deterministic scheme to obtain the initial path and then uses enhanced GA to improve it
[29]. A potential field-based controller in
[30] supports robots to follow the designed path, avoid collision with nearby robots, and distribute the robots stochastically across different paths in topologically distinct classes.
An improved APF is proposed to overcome the traditional APF’s shortcomings, including target unreachable and local minimum in
[31] for real-time performance with dynamic obstacles for realizing local path planning. A collision avoidance strategy and risk assessment are proposed based on the improved APF and the fuzzy inference system for multi-robot path planning under a completely unknown environment
[32].
APF is applied in the approximate cost function in
[33], and integral reinforcement learning is developed for the minimum time-energy strategy in an unknown environment, converting the finite horizon problem with constraints to an infinite horizon optimal control problem. APF is introduced for the reward functions and integrates Deep Deterministic Policy Gradient and Model Predictive Control to address uncertain scenes
[34].
2.1.2. Sampling-Based
The rapidly exploring random tree (RRT) searches high-dimensional and nonconvex space by using a space-filling tree randomly, and the tree is built incrementally from samples to grow towards unreached areas. The sampling-based approach’s outline is demonstrated in
Figure 3, and the generated path is highlighted in green. For a multi-robot centralized approach, multi-robot path-planning RRT performs better in optimizing the solution and exploring search space in an urban environment than push and rotate, push and swap, and the Bibox algorithm
[35]. The discrete-RRT extends the celebrated RRT algorithm in the discrete graph with a speedy exploration of the high-dimensional space of implicit roadmaps
[36].
Figure 3.
Demonstration of the RRT algorithm.
2.2. Heuristic Algorithms
2.2.1. A* Search
The A* search algorithm is one of the most common heuristic algorithms in path planning.
Figure 4 shows the simple example of the gird-based A* algorithm, and the path is highlighted in green. It uses the heuristic cost to determine the optimal path on the map. The relaxed-A* is used to provide an optimal initial path and fast computation, and Bezier-splines are used for continuous path planning to optimize and control the curvature of the path and restrict the acceleration and velocity
[17].
Figure 4.
Simple example of the A* algorithm.
A two-level adaptive variable neighborhood search algorithm is designed to be integrated with the A* search algorithm for the coupled mission planning framework. It models the path planning problem and the integrated sensor allocation to minimize travel costs and maximize the task profit
[37][46]. For the multi-AGV routing problem, the improved A* algorithm plans the global path and uses a dynamic RRT algorithm to obtain a passable local path with kinematic constraints, avoiding collisions in the grid map
[38][47].
Additionally, Ref.
[39][48] utilized the A* algorithm for the predicted path and generated a flyable path by cubic B-spline in real-time for guidance with triple-stage prediction. With the computational efficiency of cluster algorithms and A*, the proposed planning strategy supports online implementation. An optimal multi-robot path-planning approach is proposed with EA* algorithm with assignment techniques and fault-detection algorithm for the unknown environment based on the circle partitioning concept in
[40][49]. A proposed navigation system integrates a modified A* algorithm, auction algorithm, and insertion heuristics to calculate the paths for multiple responders. It supports connection with a geo-database, information collection, path generation in dynamic environments, and spatio-temporal data analysis
[41][50].
The D* algorithm is employed for multi-robot symbiotic navigation in a knowledge-sharing mechanism with sensors
[8]. It allows robots to inform other robots about environmental changes, such as new static obstacles and path blockage, and it can be extended for real-time mobile applications. Additionally, D* Lite is applied with artificial untraversable vertex to avoid deadlocks and collisions for real-time robot applications, and D* Lite has fast re-planning abilities
[9].
A cloud approach is developed with D* Lite and multi-criteria decision marking to offer powerful processing capabilities and shift computation load to the cloud from robots in the multi-robot system with a high level of autonomy
[42][51]. An integrated framework is proposed based on D* Lite, A*, and uniform cost search, and it is used for multi-robot dynamic path-planning algorithms with concurrent and real-time movement
[43][52].
2.2.2. Other Heuristic Algorithms
A constructive heuristic approach is presented to perceive multiple regions of interest. It aims to find the robot’s path with minimal cost and cover target regions with heterogeneous multi-robot settings
[6]. Conflict-Based Search is proposed for multi-agent path planning problems in the train routing problem for scheduling multiple vehicles and setting paths in
[44][53]. For multi-robot transportation, a primal-dual-based heuristic is designed to solve the path planning problem as the multiple heterogeneous asymmetric Hamiltonian path problem, solving in a short time
[45][54]. The linear temporal logic formula is applied to solve the multi-robot path planning by satisfying a high-level mission specification with Dijkstra’s algorithm in
[46][55].
A modified Dijkstra’s algorithm is introduced for robot global path planning without intersections, using a quasi-Newton interior point solver to smooth local paths in tight spaces
[47][56]. Moreover, cognitive adaptive optimization is developed with transformed optimization criteria for adaptively offering the accurate approximation of paths in the proposed real-time reactive system; it takes into account the unknown operation area and nonlinear characteristics of sensors
[18].
The Grid Blocking Degree (GBD) is integrated with priority rules for multi-AGV path planning, and it can generate a conflict-free path for AGV to handle tasks and update the path based on real-time traffic congestion to overcome the problems caused by most multi-AGV path planning is offline scheduling
[48][57]. Heuristic algorithms, minimization techniques, and linear sum assignment are used in
[49][58] for multi-UAV coverage path and task planning with RGB and thermal cameras.
[50][59] designed the extended Angular-Rate-Constrained-Theta* for a multi-agent path-planning approach to maintaining the formation in a leader–follower formation.
Figure 5 displays the overview of the mentioned heuristic algorithms. The heuristic algorithms are widely used in path planning, and the heuristic cost functions are developed to evaluate the paths. The algorithms can provide the complete path in a grid-like map. However, for the requirement of flexibility and robustness, bio-inspired algorithms are proposed.
Figure 5.
Search algorithms.
2.3. Bio-Inspired Techniques
2.3.1. Particle Swarm Optimization (PSO)
PSO is one of the most common metaheuristic algorithms in multi-robot path planning problems and formation. The flowchart of PSO is shown in
Figure 6. It is a stochastic optimization algorithm based on the social behavior of animals, and it obtains global and local search abilities by maintaining a balance between exploitation and exploration
[51][60]. Ref.
[52][61] presents an interval multi-objective PSO using an ingenious interval update law for updating the global best position and the crowding distance of risk degree interval for the particle’s local best position. PSO is employed for multiple vehicle path planning to minimize the mission time, and the path planning problem is formulated as a multi-constrained optimization problem
[53][62], while the approach has low scalability and execution ability.
Figure 6.
Flowchart of the PSO algorithm.
An improved PSO is developed with differentially perturbed velocity, focusing on minimizing the maximum path length and arrival time with a multi-objective optimization problem
[54][63]. The time stamp segmentation model handles the coordination cost. Improved PSO is combined with modified symbiotic organisms searching for multi-UAV path planning, using a B-spline curve to smooth the path in
[55][64]. For a non-stationary environment, improved PSO and invasive weed optimization are hybrids for planning a path for each robot in the multi-robot system, balancing diversification and intensification, and avoiding local minima
[56][65].
PSO is adapted for a leader–follower strategy in multi-UAV path planning with obstacle avoidance
[51][60]. A distributed cooperative PSO is proposed for obtaining a safe and flyable path for a multi-UAV system, and it is combined with an elite keeping strategy and the Pythagorean hodograph curve to satisfy the kinematic constraints in
[57][66]. The enhanced PSO is improved by greedy strategy and democratic rule in human society inspired by sine and cosine algorithms. The projected algorithm can generate a deadlock-free path with preserving a balance between intensification and diversification
[58][67].
For the multi-robot path planning issue, a coevolution-based PSO is proposed to adjust the local and goal search abilities and solve the stagnation problem of PSO with evolutionary game theory in
[59][68]. An improved gravitational search algorithm is integrated with the improved PSO for a new methodology for multi-robot path planning in the clutter environment, and it updates the particle positions and gravitational search algorithm acceleration with PSO velocity simultaneously
[60][69].
A hybrid algorithm of democratic robotics PSO and improved Q-learning is proposed to balance exploitation and exploration, and it is fast and available for a real-time environment. However, it cannot guarantee the completeness of the path, and it is hard to achieve robot cooperation
[61][70]. PSO-based and a B-Spline data frame solver engine is developed for uninterrupted collision-free path planning. It is robust to deal with current disturbances and irregular operations and provides quick obstacle avoidance for real-time implementation
[15].
A wireless sensor network is presented for locating obstacles and robots in a dynamic environment. It combines a jumping mechanism PSO algorithm and a safety gap obstacle avoidance algorithm for multi-robot path planning
[7]. The jumping mechanism PSO estimates the inertia weight based on fitness value and updates the particles. The safety gap obstacle avoidance algorithm focuses on robots struck when avoiding obstacles. Ref.
[62][71] designed the hybrid GA and PSO with fuzzy logic controller for multi-AGV conflict-free path planning with rail-mounted gantry and quay cranes; however, it is inapplicable to real-time scheduling.
2.3.2. Genetic Algorithm (GA)
GA is widely utilized for solving optimization problems as an adaptive search technique, and it is based on a genetic reproduction mechanism and natural selection
[63][72]. The flowchart of GA is indicated in
Figure 7. Ref.
[64][73] uses GA and reinforcement learning techniques for multi-UAV path planning, considers the number of vehicles and a response time, and a heuristic allocation algorithm for ground vehicles. GA solves the Multiple Traveling Sales Person problem with the stop criterion and the cost function of Euclidean distance, and Dubins curves achieve geometric continuity while the proposed algorithm cannot avoid the inter-robot collision or support online implementation
[16].
Figure 7.
Flowchart of the GA algorithm.
A 3D sensing model and a cube-based environment model are involved in describing a complex environment, and non-dominated sorting GA is modified to improve the convergence speed for the Pareto solution by building a voyage cost map by the R-Dijkstra algorithm in
[65][74] as an omnidirectional perception model for multi-robot path planning. Ref.
[66][75] applies the sensors in the area to obtain a minimal cost and solves the traveling salesman, and GA is adapted for persistent cooperative coverage.
Efficient genetic operators are developed to generate valid solutions on a closed metric graph in a reasonable time and are designed for multi-objective GA for multi-agent systems
[67][76]. GA assigns the regions to each robot, sets the visiting orders, and uses simultaneous localization and mapping to create the global map in
[68][77] for coverage path planning. Ref.
[69][78] presents GA to optimize the integration of motion patterns that represent the priority of the neighbor cell and divides the target environment into cell areas, and then uses a double-layer strategy to guarantee complete coverage.
A domain knowledge-based operator is proposed to improve GA by obtaining the elite set of chromosomes, and the proposed algorithm can support robots that have multiple targets
[70][79]. For intelligent production systems, the improved GA is aimed at complicated multi-AGV path planning and maneuvering scheduling decision with time-dependent and time-independent variables. It first addresses AGV resource allocation and transportation tasks, and then solves the transportation scheduling problem
[71][80].
An improved GA was presented with three-exchange crossover heuristic operators than the traditional two-exchange operators, which consider double-path constraints for multi-AGV path planning
[72][81]. Ref.
[63][72] proposed a boundary node method with a GA for finding the shortest collision-free path for 2D multi-robot system and using a path enhancement method to reduce the initial path length. Due to the short computational time, it can be used for real-time navigation, while it can only be implemented in a known environment without dynamic obstacles.
A high degree of GA is employed for optimal path planning under a static environment at offline scheduling, and online scheduling is aimed to solve conflicts between AGVs for the two-stage multi-AGV system
[73][82]. The evolution algorithm is used for planning a real-time path for multi-robot cooperative path planning with a unique chromosome coding method, redefining mutation and crossover operator in
[74][83].
2.3.3. Ant Colony Optimization (ACO)
Ants will move along the paths and avoid the obstacle, marking available paths with pheromone, and the ACO treats the path with higher pheromone as the optimal path. The principle of ACO is demonstrated in
Figure 8, and the path with a higher pheromone is defined as the optimal path marked by green. For collision-free routing and job-shop scheduling problems, an improved ant colony algorithm is enhanced by multi-objective programming for a multi-AGV system
[75][84].
Figure 8.
Changes of the ACO algorithm with different timeslots.
For multi-UGVs, a continuous ACO-based path planner focuses on coordination and path planning. It is integrated with an adaptive waypoint-repair method and a probability-based random-walk strategy to balance exploration and exploitation and improve the algorithm’s performance, resolving the coordination with a velocity-shifting optimization algorithm
[76][85].
K-degree smoothing and the improved ACO are integrated as a coordinated path planning strategy for the multi-UAV control and precise coordination strategy in
[77][86]. Voronoi models the environment by considering various threats, and the improved ACO’s pheromone update method and heuristic information are redefined for path planning, then using a k-degree smoothing method for the path smoothing problem. For precision agriculture and agricultural processes, ACO, Bellman–Held–Karp, Christofides, and Nearest Neighbor based on K-means clustering are used for the optimization path of multi-UAV
[78][87].
2.3.4. Pigeon-Inspired Optimization (PIO)
Pigeon navigation tools inspired PIO, and it uses two operators for evaluating the solutions. Social-class PIO is proposed to improve the performances and convergence capabilities of standard PIO with inspiring by the inherent social-class character of pigeons
[79][88], and it is combined with time stamp segmentation for multi-UAV path planning. Ref.
[80][89] analyzing and comparing the changing trend of fitness value of local and global optimum positions to improve the PIO algorithm as Cauchy mutant PIO method, and the plateau topography and wind field, control constraints of UAVs are modeled for cooperative strategy and better robustness.
2.3.5. Grey Wolf Optimizer (GWO)
GWO is inspired by the hunting behavior and leadership of grey wolves, and it obtain the solutions by searching, encircling, and attacking prey. An improved grey wolf optimizer is employed for the multi-constraint objective optimization model for multi-UAV collaboration under the confrontation environment. It considers fuel consumption, space, and time
[81][90]. The improvements of the grey wolf optimizer are individual position updating, population initialization, and decay factor updating. An improved hybrid grey wolf optimizer is proposed with a whale optimizer algorithm in a leader–follower formation and fuses a dynamic window approach to avoid dynamic obstacles
[82][91].
The leader–follower formation controls the followers to track their virtual robots based on the leader’s position and considers the maximum angular and linear speed of robots. Ref.
[83][92] proposed a hybrid discrete GWO to overcome the weakness of traditional GWO, and it updates the grey wolf position vector to gain solution diversity with faster convergence in discrete domains for multi-UAV path planning, using greedy algorithms and the integer coding to convert between discrete problem space and the grey wolf space.
2.4. Artificial Intelligence
2.4.1. Fuzzy Logic
Fuzzy logic uses the principle of “degree of truth” for computing the solutions. It can be applied for controlling the robot without the mathematical model but cannot predict the stochastic uncertainty in advance. As a result, a probabilistic neuro-fuzzy model is proposed with two fuzzy level controllers and an adaptive neuro-fuzzy inference system for multi-robot path planning and eliminating the stochastic uncertainties with leader–follower coordination
[84][104]. The fuzzy C-means or the K-means methods filter and sort the camera location points, then use A* as a path optimization process for the multi-UAV traveling salesman problem in
[5].
For collision avoidance and autonomous mobile robot navigation, Fuzzy-wind-driven optimization and a singleton type-1 fuzzy logic system controller are hybrids in the unknown environment in
[85][105]. The wind-driven optimization algorithm optimizes the function parameters for the fuzzy controller, and the controller controls the motion velocity of the robot by sensory data interpretation. Ref.
[86][106] proposed a reverse auction-based method and a fuzzy-based optimum path planning for multi-robot task allocation with the lowest path cost.
2.4.2. Machine Learning
Machine learning simulates the learning behavior to obtain the solutions. It is used for path planning, embracing mobile computing, hyperspectral sensing, and rapid telecommunication for the rapid agent-based robust system
[87][107]. Kernel smooth techniques, reinforcement learning, and the neural network are integrated for greedy actions for multi-agent path planning in an unknown environment
[10] to overcome the shortcomings of traditional reinforcement learning, such as high time consumption, slow learning speed, and disabilities of learning in an unknown environment.
The self-organizing neural network has self-learning abilities and competitive characteristics for the multi-robot system’s path planning and task assignment. Ref.
[88][108] combined it with Glasius Bio-inspired neural network for obstacle avoidance and speed jump while the environment changes have not been considered in this approach. The biological-inspired self-organizing map is combined with a velocity synthesis algorithm for multi-robot path planning and task assignment. The self-organizing neural network supports a set of robots to reach multiple target locations and avoid obstacles autonomously for each robot with updating weights of the winner by the neurodynamic model
[89][109].
Convolution Neural networks analyze image information to find the exact situation in the environment, and Deep q learning achieves robot navigation in a noble multi-robot path-planning algorithm
[90][110]. This algorithm learns the mutual influence of robots to compensate for the drawback of conventional path-planning algorithms. In an unknown environment, a bio-inspired neural network is developed with the negotiation method, and each neuron has a one-to-one correspondence with the position of the grid map
[91][111]. A biologically inspired neural network map is presented for task assignment and path planning, and it is used to calculate the activity values of robots in the maps of each target and select the winner with the highest activity value, then perform path planning
[92][112]. The simple neural network diagram is exhibited in the following
Figure 9.
Figure 9.
Diagram of a three-layer neural network.
Moreover, a multi-agent path-planning algorithm based on deep reinforcement learning is proposed, providing high efficiency
[93][113]. Another multi-agent reinforcement learning is developed in
[94][114], and it constructs a node network and establishes an integer programming model to extract the shortest path. The improved Q-learning plans the collision-free path for a single robot in a static environment and then uses the algorithm to achieve collision-free motion among robots based on prior knowledge in
[95][115]. The reinforcement learning framework is applied to optimize the quality of service and path planning, describe the users’ requirements, and consider geometric distance and risk by reinforcement learning reward matrix with a sigmoid-like function
[96][116].
An attention neural network is used for generating the multimachine collaborative path planning as attention reinforcement learning, and it can meet high real-time requirements
[97][117]. A deep Q-network is implemented with a Q-learning algorithm in a deep reinforcement learning algorithm for a productive neural network to handle multi-robot path planning with faster convergence
[98][118]. The meta-reinforcement learning is designed based on transfer learning in
[99][119], and it improves proximal policy optimization by covariance matrix adaptation evolutionary strategies to avoid static and dynamic obstacles.
Multi-agent reinforcement learning is improved by an iterative single-head attention mechanism for multi-UAV path planning, and it calculates robot interactions for each UAV’s control decision-making
[100][120]. Fuzzy reinforcement learning is proposed for the continuous-time path-planning algorithm, combining a modified Wolf-PH and fuzzy Q-iteration algorithm for cooperative tasks
[101][121].