The traditional method of finding missing people involves deploying fixed cameras in some hotspots to capture images and using humans to identify targets from these images. However, in this approach, high costs are incurred in deploying sufficient cameras in order to avoid blind spots, and a great deal of time and human effort is wasted in identifying possible targets. Further, most AIbased search systems focus on how to improve the human body recognition model, without considering how to speed up the search in order to shorten the search time and improve search efficiency. As the technology of the unmanned aerial vehicle (UAV) has seen significant progress, a number of applications have been proposed for it due to its unique characteristics, such as higher mobility and more flexible integration with different equipment, such as sensors and cameras, etc.
1. Introduction
As the technology of the unmanned aerial vehicle (UAV) has seen significant progress in recent years, a number of applications have been proposed for it ^{[1]}^{[2]} due to its unique characteristics, such as higher mobility and more flexible integration with different equipment, such as sensors and cameras, etc. ^{[3]}^{[4]}. The researchers of ^{[5]} explore algorithms for the formation movement of UAV swarms, with the objective of facilitating simultaneous adjustments to the formation shape while the UAV swarm is in motion. Signal transmission is another highly significant topic in UAV control. The research in ^{[6]} proposes automatic modulation classification utilizing deep learning in this context. The study in ^{[7]} addresses improvements to existing GNSS systems, such as GPS positioning, tackling issues related to inaccuracies. The researchers propose a timedifferenced carrier phase (TDCP) derivationcontrolled GNSS/IMU integration scheme to successfully acquire vehicle information such as the relative position and heading. Realworld tests demonstrate that this method exhibits higher accuracy compared to traditional algorithms. In recent years, the increasing integration of UAVs with various interdisciplinary domains has also been observed. Koopman operators are mathematical tools used to describe the evolution of nonlinear dynamic systems. The work of ^{[8]} proposes robust tubebased model predictive control with Koopman operators, while ^{[9]} integrates Koopman operators with the control of UAVs. Furthermore, there exist various UAV path planning problems and related studies, such as the capacitated arc routing problem (CARP). The objective of CARP is to find the shortest path in a mixed graph with undirected edges and directed arcs, minimizing the distance of the path while considering capacity constraints for objects moving on the graph. In ^{[10]}, the study introduces a memetic algorithm based on Two_Arch2 (MATA), which simultaneously considers multiple optimization objectives for the path planning problem, including the total cost, makespan, carbon emissions, and load utilization rate.
Recently, UAVs have been used for search and rescue (SAR) missions to find missing persons at the scene of a natural disaster or when an emergency event occurs ^{[11]}^{[12]}^{[13]}. The issue of missing persons is a challenging societal problem, particularly when involving minors. Children, due to their smaller stature, are susceptible to disappearance within large crowds, especially in crowded places such as amusement parks, making it difficult to notice their absence. Unfortunately, they generally exhibit a lower level of vigilance towards unfamiliar individuals, rendering them vulnerable to abduction. As the duration of a missing person’s search is prolonged, the probability of encountering a perilous situation escalates, imposing significant psychological distress upon parents.
However, there is a limited amount of research aimed at identifying specific individuals, such as missing persons, and researchers have primarily relied on fixed cameras installed in specific areas. This limitation prevents the continuous tracking of targets, leading to difficulties in inferring their actual positions due to the limited perspective and potential blind spots. Furthermore, most of the existing works on search and rescue adopt unmanned aerial vehicles (UAVs) ^{[14]}^{[15]}^{[16]} and employ indiscriminate search algorithms, without prioritizing the areas where the search target may be located, resulting in inefficient search operations and excessive UAV power consumption.
2. Traditional Unmanned Aerial Vehicle Path Planning Methods for Search and Rescue Operations
Several search and rescue methods have been proposed recently ^{[14]}^{[15]}^{[16]}. In ^{[14]}, the sweep line search method conducts a thorough search from left to right, as illustrated in Figure 1. Meanwhile, ref. ^{[15]} introduces the spiral search, which navigates the designated search area in a spiral pattern, as depicted in Figure 2. Both methods are uncomplicated and exhibit algorithms with linear time complexity in relation to the search area. Differing from these two methods, refs. ^{[16]}^{[17]} introduce blockbased methods. These approaches offer the advantage of categorizing the whole search area into blocks with and without search targets. Figure 3 demonstrates the relationship between the UAV’s perspective and the altitude concerning the search blocks when the whole search area is partitioned ^{[17]}. Through the traveling salesman problem (TSP) ^{[18]} approach, the shortest path that does not require the visiting of all blocks is computed if all blocks with search targets have been recognized in advance. However, the four methods mentioned above do not prioritize the block searching sequence in proximity to the search target, which results in inadequate search efficiency. Therefore, taking inspiration from blockbased approaches, this research assigns priority to all blocks based on the likelihood of the blocks containing potential targets, which are automatically recognized in real time using the YOLOv5 model. In contrast to ^{[16]}, which primarily focuses on finding the shortest path, this research emphasizes improving the search efficiency to yield the shortest search time by searching in the block with the highest priority first.
Figure 1. Sweep line search.
Figure 3. The relationship between the altitude of the UAV and the partitioned search area.
3. Search Target Recognition Techniques
3.1. Color Space Exchange
The RGB color space is the most widely used color space, where RGB denotes red, green, and blue. It is similar to the wellknown concept of the primary colors of light, where mixing these colors yields various levels of brightness and chromaticity. However, the RGB color space has a strong dependence on the lighting conditions, meaning that the color of an object can change with variations in brightness. In addition, the three elements in the RGB color space are highly correlated, indicating that a change in one element will result in a corresponding change in the perceived color. Therefore, using the RGB color space for the color extraction of objects is not ideal ^{[19]}. In contrast, the HSV color space ^{[20]} is more intuitive and easily understood compared to the RGB color space. It separates the brightness value (V) from the color chrominance, which can be further divided into hue (H) and saturation (S). Because these elements in HSV have a relatively weak correlation with each other, it is highly suitable for use in feature color extraction. In comparison to RGB, one of the advantages of the HSV color space is its weak interelement correlation, making it easy to control. In applications involving color recognition, we can convert the detected images from the RGB color space to the HSV color space with Equation (1).
$$H={\mathrm{cos}}^{1}\left(\frac{\frac{1}{2}\left[\left(RG\right)+\left(RB\right)\right]}{\left[{\left(RG\right)}^{2}+\left(RB\right){\left(GB\right)}^{2}\right]}\right)\phantom{\rule{0ex}{0ex}}S=1\frac{3\left[\mathrm{min}\left(R,G,B\right)\right]}{R+G+B}\phantom{\rule{0ex}{0ex}}V=\frac{\mathrm{max}\left(R,G,B\right)}{255}$$
3.2. Extracting Feature Colors of Image
The feature color extraction process in ^{[20]} involves first segmenting the elements of an image’s HSV color space, followed by the conversion of each element (H, S, V) into a histogram of oriented gradient (HOG). Since the HOG divides each element into several element intervals, the segmentation proportions for each element can be determined. Then, selecting the interval with the highest proportion for each element, we can obtain their respective numerical values (H, S, V). These values represent the HSV feature colors for the image.
3.3. Transformation of Color Space
After experimenting, it has been observed that certain issues exist when directly calculating color distances in the HSV color space. Specifically, when the saturation (S) is low, it often leads to the knearest neighbors (KNN) ^{[21]} decision result being mistakenly classified as gray, regardless of how the hue (H) changes. To address this, the extracted feature colors in HSV are transformed into the RGB color space using Equation (2) ^{[20]}. This transformation involves mapping the hue (h) range to ℎ𝑖, and calculating variables p, q, t based on the hue (ℎ_{𝑖}) range to determine which combination of RGB attributes (p, q, t, v) applies. The calculated RGB values (𝑟_{0}, 𝑔_{0}, 𝑏_{0}) are then subjected to Euclidean distance computation ^{[22]} against preestablished RGB color table values (𝑟_{1}, 𝑔_{1}, 𝑏_{1}) to determine the color distance (d), as illustrated in Equation (3). Subsequently, the KNN algorithm is employed to identify the color of the clothing based on this computed distance.
$$\begin{array}{l}\begin{array}{l}{h}_{i}=\lfloor \frac{h}{60}\rfloor \hfill \\ f=\frac{h}{60}{h}_{i}\hfill \end{array}\hfill \\ \begin{array}{l}p=v\times \left(1s\right)\hfill \\ q=v\times \left(1f\times s\right)\hfill \\ t=v\times \left(1\left(1f\right)\times s\right)\hfill \end{array}\hfill \end{array}\phantom{\rule{0ex}{0ex}}\left(r,g,b\right)=\{\begin{array}{c}\begin{array}{c}\left(v,t,p\right),if{h}_{i}=0\\ \left(q,v,p\right),if{h}_{i}=1\\ \left(p,v,t\right),if{h}_{i}=2\end{array}\\ \begin{array}{c}\left(p,q,v\right),if{h}_{i}=3\\ \left(t,p,v\right),if{h}_{i}=4\\ \left(v,p,q\right),if{h}_{i}=5\end{array}\end{array}$$
$$d=\sqrt{{\left({r}_{1}{r}_{0}\right)}^{2}+{\left({g}_{1}{g}_{0}\right)}^{2}+{\left({b}_{1}{b}_{0}\right)}^{2}},$$
3.4. KNearest Neighbors (KNN) Color Classification
Knearest neighbors (KNN) ^{[21]} is a fundamental classification and regression algorithm. After obtaining the HSV feature colors of an image and calculating the color distances using Equation (3), these distances are compared to a preestablished RGB color table. After sorting the color distances for each color, K colors with the closest distances are then selected. Followed by a voting process among neighboring colors, the color with the most votes is determined as the final color result selected by the KNN algorithm.
3.5. UAV Systems for Human Detection
The work in ^{[23]} proposes an approach utilizing an automated human detection system on UAVs to identify human bodies, discussing the hardware configuration of UAVs and realtime human recognition capabilities. Ref. ^{[24]} presents a comprehensive human activity recognition algorithm, where the UAV first identifies whether the object is a person and subsequently recognizes various human activities, such as throwing, walking, and digging. Additionally, the study introduces various image stabilization techniques. The research of ^{[15]} focuses on achieving human body recognition using a CNN. Due to the difficulty in acquiring datasets, data augmentation is employed to enhance the training outcomes. The study compares the training outcomes using various architectures and outlines the algorithm’s path planning as a spiral search. The focus of the study in ^{[25]} lies in the application of UAVs for commercial transportation, aiming to achieve successful human body recognition using UAVs. The research encompasses the design of five distinct scenarios, revealing that the distance variation between the UAV and the human body has a more significant impact on the recognition success compared to the quality of the camera. In the context of search and rescue operations for swimmers, ref. ^{[26]} proposes a methodology that integrates global navigation satellite system (GNSS) techniques with computer vision algorithms to locate individuals in distress. Refs. ^{[27]}^{[28]} primarily focus on the training of human detection models. Ref. ^{[27]} introduces a modified YOLOv8 architecture by incorporating the SC3T module into the final layer and training the model using images captured from a UAV perspective. The emphasis of the study lies in the recognition performance. The experimental results are evaluated using confusion matrices and the mean average precision. The findings reveal that, across the precision rate, recall rate, and mAP, the modified YOLOv8 outperforms both the original YOLOv5 and YOLOv8 models. Ref. ^{[28]} primarily utilizes YOLOv5 for human detection and further employs a Haar cascade classifier to identify specific body parts (head, upper body, lower body). The final results indicate that YOLOv5 achieves 98% average precision (AP), while the Haar cascade classifier attains approximately 78% AP. Table 1 presents a comparison of related studies on human detection using UAVs. It can be found that most of the related methods focus on how to improve the human body recognition model, without considering how to speed up the search in order to shorten the search time and search efficiency.
Table 1. Comparison of related studies of UAV human detection.

Human Body Recognition Model

Dataset Used

Recognition of Human Clothing Types and Colors

Segmentation of the Search Area

Dynamic Route Planning for Search

Integration of Human Body and Clothing/Pant Color Recognition with Dynamic Route Planning

^{[23]}

Motion detection outputs a score of human confidence

No

No

No

No

No

^{[24]}

CNN

UCFARG dataset

No, proposes human activity classification algorithm

No

No

No

^{[15]}

CNN

Selfdeveloped captured dataset

No

No

No, spiral search

No

^{[25]}

DNN with MobileNet V2 SSDLite

COCO dataset

No

No

Yes, estimates the person and moves in his direction with GPS


^{[26]}

CNN with Tiny YOLOv3

COCO dataset + selfdeveloped swimmers dataset

No

No

No

No

^{[27]}

CNN with modified YOLOv8

Selfdeveloped UAV view realworld dataset

No

No

No

No

^{[28]}

CNN with YOLOv5 and Haar Cascade classifier

VisDrone dataset + COC0128 dataset

No, proposes a human body region classification algorithm

No

No

No

HWF

CNN with YOLOv5

VisDrone dataset + selfdeveloped droneclothing dataset

Yes, uses KNN color recognition

Yes

Yes, proposes the hierarchical humanweightfirst (HWF) path planning algorithm

Yes,
Proposes the integrated YOLOv5 and HWF framework
