Object Detection for Small Water Floater

Object Detection for Small Water Floater: Comparison

Please note this is a comparison between Version 1 by Fuxun Chen Chen and Version 3 by Peter Tang.

Object物体检测是无人机 detection is one of the most widely used applications in UAV missions. Detection of small objects in unmanned aerial vehicle (UAV) images remains a persistent challenge due to the limited pixel values and interference from background noise.（UAV）任务中使用最广泛的应用之一。由于像素值有限和背景噪声的干扰，检测无人机图像中的小物体仍然是一个持续的挑战。

small objects
object detection
improved YOLOv5
water surface floaters
UAV images

1. Introduction

Water is the source of life, and oceans and rivers cover about 71% of the Earth’s area [1]. Rivers constitute a vital component of the global water cycle and serve a crucial function in facilitating the transfer [2]. However, recent global economic expansion and urbanization have caused significant damage to the natural environment [3] and severe water pollution [4]. Therefore, real-time monitoring of water environments has become crucial due to the ongoing deterioration of water quality. At present, the water’s surface is susceptible to problems, such as exposure or dim light due to the intensity of sunlight, which results in less information about the target features and difficult recognition results.

Object detection is one of the most widely used applications in UAV missions. Due to the angle and flight height of UAV photography, small objects account for a large proportion of UAV images compared to general scenes [5]. Small object detection has been one of the key difficulties in the field of object detection [6] with problems, such as invalid image feature information and blurred object features. To solve this problem, many scholars have proposed solutions. Xia et al. proposed an automated driving system (ADS) data acquisition and analytics platform for vehicle trajectory extraction, reconstruction, and evaluation. In addition to collecting various sensor data, the platform can also use deep learning to detect small targets, such as vehicle trajectories [7]. Liu et al. proposed a model for tassel detection in maize, named yolov5-tassel. The authors trained the yolov5-tassel model based on UAV remote sensing images and achieved an mAP value of 44.7%, which is an improvement compared to FCOS, RetinaNet, and YOLOv5 in terms of detecting small tassels in maize [8]. Liu et al. proposed multibranch parallel feature pyramid networks (MPFPN) to detect small objects under UAV images. The MPFPN model applies the SSAM attention module [9] to attenuate the effect of background noise and uses cascade architecture in the Fast R-CNN stage to achieve a more powerful localization [10]. Chen et al. proposed small object detection networks based on a classification-oriented super-resolution generative adversarial network (CSRGAN), which is a model that adds classification branches and introduces classification losses into a typical SRGAN [11]. The experimental results demonstrate that CSRGAN outperforms VGG16 in classification [12]. In this researchtudy, UAV technology and deep learning are applied to small object detection at the same time. However, the water’s surface is susceptible to problems, such as exposure or dim light due to the intensity of sunlight, which results in less information about the water surface floater’s features and difficult recognition results. Deep learning models also face some problems when detecting a small water floater, such as small water surface floaters losing some of their features in the process of down-sampling, which leads to a lack of ability to extract features from global and low detection accuracy and poor feature recognition of a small water surface floater. Therefore, the direct application of the original YOLOv5 to UAV image object detection is not very effective. At the same time, UAV is also known as a powerful complement to the conventional water environment and assessment and has been gradually applied to water environment detection. So, this researchstudy proposes an efficient and accurate method based on YOLOV5 for the detection of small water surface floaters in UAV-captured images. The model can effectively locate the water surface floater and thus assist in the water surface floater salvage work.

2. Object Detection

Computer vision research is mainly concerned with image classification, object detection, object tracking, semantic segmentation, and instance segmentation. Object detection is one of the most fundamental and challenging tasks in the field of computer vision ^[13]. Exploring efficient real-time object detection models has been a hot research topic in recent years ^[14]. Traditional object detection methods usually include region proposal, feature extraction, feature fusion, and classifier training, all of which require the laborious manual production of object features and must be completed step by step ^[15]. Therefore, traditional object detection methods have the drawbacks of massive redundancy, time-consuming computation, and low accuracy. With the rapid development of deep learning theory, deep learning has made great progress in the field of computer vision, such as image classification and object detection ^[16]. Since then, object detection has entered a new phase. In the 计算机视觉研究主要涉及图像分类、对象检测、对象跟踪、语义分割和实例分割。物体检测是计算机视觉领域最基本和最具挑战性的任务之一[13]。探索高效的实时目标检测模型是近年来研究的热点[14]。传统的目标检测方法通常包括区域建议、特征提取、特征融合和分类器训练，所有这些都需要费力的手工制作对象特征，必须一步一步地完成[15]。因此，传统的目标检测方法存在冗余量大、计算耗时、精度低等缺点。随着深度学习理论的快速发展，深度学习在计算机视觉领域取得了长足的进步，如图像分类和目标检测[16]。从那时起，物体检测进入了一个新的阶段。在2012 年的ImageNet competition, compared with the traditional method, 竞赛中，与传统方法相比，A. Krizhevsky et al. used a convolutional neural network (CNN) ^[17] to make image classification results much improved. 等人使用卷积神经网络（CNN）[17]使图像分类结果有了很大的改善。CNN gives region suggestions through a region suggestion network (RPN) ^[18] at a low cost, which can significantly improve the efficiency of object detection. In 通过区域建议网络（RPN）[18]以低成本给出区域建议，可以显著提高目标检测的效率。2014, 年，R. Girshick et al. ^[19] used a region convolutional neural network for the first time in the field of object detection. The detection results were much improved. Compared to traditional methods, this end-to-end network is more popular because it reduces the complex steps, such as data preprocessing and manual design of object features ^[20]. Since then, deep learning has started to develop rapidly in object detection and has been widely used in practice ^[21].等人[19]首次在目标检测领域使用了区域卷积神经网络。检测结果有了很大的改善。与传统方法相比，这种端到端网络更受欢迎，因为它减少了复杂的步骤，例如数据预处理和对象特征的手动设计[20]。从那时起，深度学习开始在目标检测中迅速发展，并在实践中得到广泛应用[21]。 There are two major categories of mainstream deep learning object detection algorithms: One is the one-stage object detector, such as the 主流深度学习目标检测算法主要有两大类：一类是一级目标检测器，如YOLO series, 系列、SSD ^[22], [22]、RetinaNet ^[23], [23]、FCOS ^[24], etc. The second is two-stage object detectors, such as [24]等。第二种是两级目标检测器，如R-CNN ^[25], [25]、Fast R-CNN ^[26], [26]、Faster R-CNN ^[27], [27]、Mask R-CNN ^[28], etc. [28]等。例如，两级目标检测器For example, the two-stage object detector Faster R-CNN integrates RPN (region proposal network) on top of ster R-CNN在Fast R-CNN. The 之上集成了RPN（区域建议网络）。Faster R-CNN mainly includes the shared convolutional layer module, RPN, and 主要包括共享卷积层模块、RPN和Fast R-CNN detector. However, due to the large number of prediction frames of the two-stage object detector and the large computational effort and slow detection speed, it is not suitable for real-time detection tasks. In 2015, 检测器。但由于两级目标检测器的预测帧数量多，计算工作量大，检测速度慢，不适合实时检测任务。2015年，J. Redmon et al. proposed a one-stage object detector, You 等人提出了单级目标检测器YouOnly Look Once (YOLO) ^[29], which directly obtains the class and bounding box information of the object and greatly improves the object detection speed. （YOLO）[29]，直接获取目标的类和边界框信息，大大提高了目标检测速度。Ma et al. proposed an improved 等人提出了一种改进的YOLOv3 ^[30], which has stronger noise immunity and better generalization ability. Deep learning also has good applications in the detection of the water surface floater. [30]，具有更强的抗噪性和更好的泛化能力。深度学习在水面漂浮物的检测中也有很好的应用。Lieshoutet et al. constructed a plastic floating debris monitoring dataset using videos collected from five different waterway bridges in Jakarta, Indonesia, and used 等人使用从印度尼西亚雅加达五座不同水道桥收集的视频构建了一个塑料漂浮碎片监测数据集，并使用Faster R-CNN to first detect regions that may contain plastic floating materials and then Inception V2 ^[31], pre-trained based on 首先检测可能包含塑料漂浮物的区域，然后根据COCO ^[32], to determi[2]预先训练的Ince whether these regionsption areV31 packed with plastic ^[33][32]，以确定这些区域是否用塑料填充[33]. Li et al. acquired ocean images of the South China Sea using a camera mounted on an unmanned ship and used a fusion model based on 等人使用安装在无人船上的相机获取了南中国海的海洋图像，并使用基于YOLOv3 with 的融合模型与DenseNet ^[34] to detect sea surface objects ^[35]. [34]来检测海面物体[35]。Stofa et al.等人选择了 selected the DenseNet model to detect ships in remotely sensed images and determined by fine-tuning the hyperparameters a batch size of 16 and a learning rate of 模型来检测遥感图像中的船舶，并通过微调超参数确定批量大小为 16，学习率为 0.0001. The model uses the Adam optimizer。该模型使用 Adam 优化器 [36] and并优化了参数设置 optimized the parameter settings ^[37]. In contrast, [37]。相比之下，YOLOv5 [38] discarded the candidate frame generation phase and directly performed classification and regression operations on the objects, which improved the real-time detection speed of the object detection algorithm, and the model complexity of 摒弃了候选帧生成阶段，直接对对象进行分类和回归操作，提高了目标检测算法的实时检测速度，YOLOv5 was reduced by about 10% compared to YOLOv4的模型复杂度比 YOLOv10 [4] ^[39].降低了约 39%。与其他目标检测模型相比，YOLOv5 has a stronger generalization capability and a lighter network structure compared to other object detection models. Moreover, generally speaking, the computational resources of UAV platforms are limited, and the impact of the model on the object detection speed must be considered. One-stage segment object detector is more suitable for object detection in UAV platforms compared to the two-stage object detector. Therefore, the 具有更强的泛化能力和更轻的网络结构。而且，一般来说，无人机平台的计算资源有限，必须考虑模型对目标检测速度的影响。与两级物体检测器相比，一级分段目标检测器更适合无人机平台中的目标检测。因此，最终选择YOLOv5 model is finally selected as the original model for small object detection of a water surface floater in this reseach.模型作为水面漂浮物小物体检测的原始模型。 Small objects usually have lower resolution and less distinctive features. Therefore, achieving precise detection of small objects is a hot issue in the field of object detection. Many scholars have conducted a lot of research work on small object detection. 小物体通常具有较低的分辨率和不太明显的特征。因此，实现对小物体的精确检测是物体检测领域的热点问题。许多学者在小物体检测方面进行了大量的研究工作。Kim et al. inserted a high-resolution processing module (HRPM) and a sigmoid fusion module (SFM), which not only reduced computational complexity but also improved the detection accuracy of small targets. They obtained good detection results in drone reconnaissance images and small vehicles ^[40]. 等人插入了高分辨率处理模块（HRPM）和S形融合模块（SFM），不仅降低了计算复杂度，而且提高了小目标的检测精度。他们在无人机侦察图像和小型车辆中获得了良好的检测结果[40]。Wang et al. proposed a bidirectional attention network called 等人提出了一种名为BANet, which solved the problems of inaccurate and inefficient detection of small and multiple targets. The model achieved an AP improvement of 0.55–2.93% compared to YOLOX on the VOC2012 dataset and a, AP improvement of 0.3–1.01% on the 的双向注意力网络，解决了对小目标和多目标检测不准确、效率低下的问题。与YOLOX相比，该模型在VOC0数据集上的AP改善了55.2-93.2012%，在MSCOCO2017 dataset ^[41]. 0数据集上实现了AP改善3.1-01.2017%[41]。Yang et al. proposed 等人提出了QueryDet, which uses a new query mechanism called cascade sparse query (CSQ) to speed up inference and calculate detection results using sparse queries. The model obtains high-resolution feature maps by avoiding redundant calculations in the background area. ，它使用一种称为级联稀疏查询（CSQ）的新查询机制来加速推理并使用稀疏查询计算检测结果。该模型通过避免背景区域中的冗余计算来获得高分辨率特征图。QueryDet is applied in FCOS and 应用于FCOS和Faster-R CNN and is tested on the COCO dataset and the ，并在COCO数据集和visDrone dataset for small objects, achieving better results than the original algorithms in terms of accuracy and inference speed ^[42]. 数据集上针对小物体进行了测试，在准确性和推理速度方面取得了比原始算法更好的结果[42]。Liu et al. addressed the problem of low accuracy in small target recognition by changing the 等人通过改变ROI alignment method, which reduced the quantization error of 对齐方法解决了小目标识别精度低的问题，降低了Faster R-CNN and improved its accuracy by 7% compared to the original model ^[43].的量化误差，与原始模型相比，其精度提高了7%[43]。

3. Object Detection in UAV-Captured Images无人机捕获图像中的目标检测

UAV has been widely applied in industries, such as agriculture, forestry, electric power, atmospheric detection, and mapping, among others. Compared with traditional methods, UAVs ^[44] have the advantages of flexibility and mobility, high efficiency and energy saving, diversified results, and low operating costs. 无人机已广泛应用于农业、林业、电力、大气探测、测绘等行业。与传统方法相比，无人机[44]具有灵活性和机动性、高效节能、结果多样化和运行成本低等优点。Lan et al. combined UAV technology and deep learning to complete the detection of diseased plants and abnormal areas in orchards using the 等人结合无人机技术和深度学习，利用Swin-T YOLOX lightweight model. Additionally, the improved model compared with the original 轻量级模型完成了果园患病植物和异常区域的检测。此外，与原始YOLOVX had a detection accuracy of 1.9% ^[45]. 相比，改进后的模型的检测准确率为1.9%[45]。Bajić et al. used the improved 等人使用改进的YOLOV5 for object detection of unexploded remnants of war based on UAV thermal imaging, and the accuracy was above 90% for all 11 detection objects ^[46]. 进行基于无人机热成像的未爆炸战争遗留物的物体检测，所有90个探测对象的准确率均在11%以上[46]。Liang et al. proposed a U等人提出了一种基于无人机的低空遥感头盔检测系统，将小目标检测的AV-based low-altitude remote-sensing-based helmet detection system, which improved the AP value of small object detection to 88.7% P值提高到88.7%，显著提高了网络对小物体的检测性能[47]。Wand significantly improved the network’s detection performance for small objects ^[47]. Wang et al. built enhanced 等基于YOLOX-nano网络构建了增强型CSPDarknet (（ECSP) and weighted aggregate feature re-extraction pyramid modules (WAFR) based on the YOLOX-nano network, which solved the problem of low recognition accuracy for grazing livestock due to the small number of occupied pixels ^[48]. However, due to the limited number of pixels and features occupied by objects in UAV images, the detection accuracy is low. Thus, small object detection based on UAV imagery remains a significant challenge.）和加权聚合特征重提取金字塔模块（WAFR），解决了由于占用像素数量少而导致放牧牲畜识别精度低的问题[48]。但由于无人机图像中物体所占像素和特征数量有限，检测精度较低。因此，基于无人机图像的小物体检测仍然是一个重大挑战。

4. Object Detection for Small Water Floater小型浮水车的物体检测

As deep learning is increasingly applied to object detection, some researchers have also applied it to water surface floater detection. The application of deep learning in floating objects on the water’s surface has contributed greatly to timely detection and processing of floating objects and to improving the supervision level of rivers and lakes. 随着深度学习越来越多地应用于物体检测，一些研究人员也将其应用于水面漂浮物检测。深度学习在水面漂浮物中的应用，为及时检测和处理漂浮物，提高江河湖泊监管水平做出了巨大贡献。Li et al. collected marine images from the South China Sea using a camera mounted on an unmanned boat and obtained a dataset containing 4000 images after data augmentation. A fusion model based on 等人使用安装在无人船上的相机从南中国海收集海洋图像，经过数据增强后获得了包含4000张图像的数据集。基于YOLOv3 and 和DenseNet was used to detect sea surface vessel targets ^[35]. 的融合模型用于检测海面船只目标[35]。Zhang et al. used a dataset extracted from three days of video monitoring of a certain location on the Beijing North Canal to detect plastic bags and bottles floating on the water’s surface. The detection model used was 等人使用从北京北运河某处视频监控三天中提取的数据集，检测漂浮在水面上的塑料袋和瓶子。采用的检测模型为以VGG16为特征提取器的Faster R-CNN with VGG16 as the feature extractor, in which conv4_3 and conv5_3 were selected from VGG16 for feature fusion, to improve the detection accuracy of small objects ^[49]. In ，从VGG4中选择conv3_5和conv3_16进行特征融合，以提高小物体的检测精度[49]。2020, 年，van et al. constructed a plastic floating garbage monitoring dataset using videos collected from five different waterway bridges in Jakarta, Indonesia, and identified plastic floating objects through two rounds of detection. Faster R-CNN was used to detect regions that may contain plastic floating objects, and then 等人利用从印度尼西亚雅加达五座不同水桥采集的视频构建了塑料漂浮垃圾监测数据集，并通过两轮检测识别塑料漂浮物。使用更快的R-CNN检测可能包含塑料漂浮物体的区域，然后使用基于预先训练的COCO的Inception V2 based on pre-trained COCO was used to determine whether these regions contain plastic ^[33].来确定这些区域是否含有塑料[33]。