Object Detection for Small Water Floater

Object Detection for Small Water Floater: Comparison

Please note this is a comparison between Version 2 by Fuxun Chen Chen and Version 1 by Fuxun Chen Chen.

物体检测是无人机Object detection is one of the most widely used applications in UAV missions. Detection of small objects in unmanned aerial vehicle （UAV）任务中使用最广泛的应用之一。由于像素值有限和背景噪声的干扰，检测无人机图像中的小物体仍然是一个持续的挑战。(UAV) images remains a persistent challenge due to the limited pixel values and interference from background noise.

small objects
object detection
improved YOLOv5
water surface floaters
UAV images

1. Introduction

Water is the source of life, and oceans and rivers cover about 71% of the Earth’s area [1]. Rivers constitute a vital component of the global water cycle and serve a crucial function in facilitating the transfer [2]. However, recent global economic expansion and urbanization have caused significant damage to the natural environment [3] and severe water pollution [4]. Therefore, real-time monitoring of water environments has become crucial due to the ongoing deterioration of water quality. At present, the water’s surface is susceptible to problems, such as exposure or dim light due to the intensity of sunlight, which results in less information about the target features and difficult recognition results.

Object detection is one of the most widely used applications in UAV missions. Due to the angle and flight height of UAV photography, small objects account for a large proportion of UAV images compared to general scenes [5]. Small object detection has been one of the key difficulties in the field of object detection [6] with problems, such as invalid image feature information and blurred object features. To solve this problem, many scholars have proposed solutions. Xia et al. proposed an automated driving system (ADS) data acquisition and analytics platform for vehicle trajectory extraction, reconstruction, and evaluation. In addition to collecting various sensor data, the platform can also use deep learning to detect small targets, such as vehicle trajectories [7]. Liu et al. proposed a model for tassel detection in maize, named yolov5-tassel. The authors trained the yolov5-tassel model based on UAV remote sensing images and achieved an mAP value of 44.7%, which is an improvement compared to FCOS, RetinaNet, and YOLOv5 in terms of detecting small tassels in maize [8]. Liu et al. proposed multibranch parallel feature pyramid networks (MPFPN) to detect small objects under UAV images. The MPFPN model applies the SSAM attention module [9] to attenuate the effect of background noise and uses cascade architecture in the Fast R-CNN stage to achieve a more powerful localization [10]. Chen et al. proposed small object detection networks based on a classification-oriented super-resolution generative adversarial network (CSRGAN), which is a model that adds classification branches and introduces classification losses into a typical SRGAN [11]. The experimental results demonstrate that CSRGAN outperforms VGG16 in classification [12]. In this restudyearch, UAV technology and deep learning are applied to small object detection at the same time. However, the water’s surface is susceptible to problems, such as exposure or dim light due to the intensity of sunlight, which results in less information about the water surface floater’s features and difficult recognition results. Deep learning models also face some problems when detecting a small water floater, such as small water surface floaters losing some of their features in the process of down-sampling, which leads to a lack of ability to extract features from global and low detection accuracy and poor feature recognition of a small water surface floater. Therefore, the direct application of the original YOLOv5 to UAV image object detection is not very effective. At the same time, UAV is also known as a powerful complement to the conventional water environment and assessment and has been gradually applied to water environment detection. So, this studyresearch proposes an efficient and accurate method based on YOLOV5 for the detection of small water surface floaters in UAV-captured images. The model can effectively locate the water surface floater and thus assist in the water surface floater salvage work.

2. Object Detection

计算机视觉研究主要涉及图像分类、对象检测、对象跟踪、语义分割和实例分割。物体检测是计算机视觉领域最基本和最具挑战性的任务之一Computer vision research is mainly concerned with image classification, object detection, object tracking, semantic segmentation, and instance segmentation. Object detection is one of the most fundamental and challenging tasks in the field of computer vision [13]。探索高效的实时目标检测模型是近年来研究的热点. Exploring efficient real-time object detection models has been a hot research topic in recent years [14]。传统的目标检测方法通常包括区域建议、特征提取、特征融合和分类器训练，所有这些都需要费力的手工制作对象特征，必须一步一步地完成. Traditional object detection methods usually include region proposal, feature extraction, feature fusion, and classifier training, all of which require the laborious manual production of object features and must be completed step by step [15]。因此，传统的目标检测方法存在冗余量大、计算耗时、精度低等缺点。随着深度学习理论的快速发展，深度学习在计算机视觉领域取得了长足的进步，如图像分类和目标检测. Therefore, traditional object detection methods have the drawbacks of massive redundancy, time-consuming computation, and low accuracy. With the rapid development of deep learning theory, deep learning has made great progress in the field of computer vision, such as image classification and object detection [16]。从那时起，物体检测进入了一个新的阶段。在. Since then, object detection has entered a new phase. In the 2012年的 ImageNet竞赛中，与传统方法相比， competition, compared with the traditional method, A. Krizhevsky等人使用卷积神经网络（CNN） et al. used a convolutional neural network (CNN) [17]使图像分类结果有了很大的改善。 to make image classification results much improved. CNN通过区域建议网络（RPN）[ gives region suggestions through a region suggestion network (RPN) [18]以低成本给出区域建议，可以显著提高目标检测的效率。 at a low cost, which can significantly improve the efficiency of object detection. In 2014年，, R. Girshick等人 et al. [19]首次在目标检测领域使用了区域卷积神经网络。检测结果有了很大的改善。与传统方法相比，这种端到端网络更受欢迎，因为它减少了复杂的步骤，例如数据预处理和对象特征的手动设计 used a region convolutional neural network for the first time in the field of object detection. The detection results were much improved. Compared to traditional methods, this end-to-end network is more popular because it reduces the complex steps, such as data preprocessing and manual design of object features [20]。从那时起，深度学习开始在目标检测中迅速发展，并在实践中得到广泛应用. Since then, deep learning has started to develop rapidly in object detection and has been widely used in practice [21]。. 主流深度学习目标检测算法主要有两大类：一类是一级目标检测器，如There are two major categories of mainstream deep learning object detection algorithms: One is the one-stage object detector, such as the YOLO系列、 series, SSD [22]、, RetinaNet [23]、, FCOS [24]等。第二种是两级目标检测器，如, etc. The second is two-stage object detectors, such as R-CNN [25]、, Fast R-CNN [26]、, Faster R-CNN [27]、, Mask R-CNN [28]等。例如，两级目标检测器, etc. For example, the two-stage object detector Faster R-CNN在 integrates RPN (region proposal network) on top of Fast R-CNN之上集成了RPN（区域建议网络）。. The Faster R-CNN主要包括共享卷积层模块、RPN和 mainly includes the shared convolutional layer module, RPN, and Fast R-CNN检测器。但由于两级目标检测器的预测帧数量多，计算工作量大，检测速度慢，不适合实时检测任务。2015年， detector. However, due to the large number of prediction frames of the two-stage object detector and the large computational effort and slow detection speed, it is not suitable for real-time detection tasks. In 2015, J. Redmon等人提出了单级目标检测器You et al. proposed a one-stage object detector, You Only Look Once （YOLO）(YOLO) [29]，直接获取目标的类和边界框信息，大大提高了目标检测速度。, which directly obtains the class and bounding box information of the object and greatly improves the object detection speed. Ma等人提出了一种改进的a et al. proposed an improved YOLOv3 [30]，具有更强的抗噪性和更好的泛化能力。深度学习在水面漂浮物的检测中也有很好的应用。, which has stronger noise immunity and better generalization ability. Deep learning also has good applications in the detection of the water surface floater. Lieshoutet等人使用从印度尼西亚雅加达五座不同水道桥收集的视频构建了一个塑料漂浮碎片监测数据集，并使用 et al. constructed a plastic floating debris monitoring dataset using videos collected from five different waterway bridges in Jakarta, Indonesia, and used Faster R-CNN首先检测可能包含塑料漂浮物的区域，然后根据 to first detect regions that may contain plastic floating materials and then Inception V2 [31], pre-trained based on COCO [232]预先训练的I, to determinceptione whether these regions are V31packed with plastic [3233]，以确定这些区域是否用塑料填充[33]. Li等人使用安装在无人船上的相机获取了南中国海的海洋图像，并使用基于 et al. acquired ocean images of the South China Sea using a camera mounted on an unmanned ship and used a fusion model based on YOLOv3的融合模型与 with DenseNet [34]来检测海面物体 to detect sea surface objects [35]。. Stofa 等人选择了 et al. selected the DenseNet 模型来检测遥感图像中的船舶，并通过微调超参数确定批量大小为 16，学习率为 model to detect ships in remotely sensed images and determined by fine-tuning the hyperparameters a batch size of 16 and a learning rate of 0.0001。该模型使用. The model uses the Adam 优化器optimizer [36] 并优化了参数设置and optimized the parameter settings [37]。相比之下，. In contrast, YOLOv5 [38] 摒弃了候选帧生成阶段，直接对对象进行分类和回归操作，提高了目标检测算法的实时检测速度，discarded the candidate frame generation phase and directly performed classification and regression operations on the objects, which improved the real-time detection speed of the object detection algorithm, and the model complexity of YOLOv5 的模型复杂度比was reduced by about 10% compared to YOLOv104 [439]. 降低了约 39%。与其他目标检测模型相比，YOLOv5具有更强的泛化能力和更轻的网络结构。而且，一般来说，无人机平台的计算资源有限，必须考虑模型对目标检测速度的影响。与两级物体检测器相比，一级分段目标检测器更适合无人机平台中的目标检测。因此，最终选择 has a stronger generalization capability and a lighter network structure compared to other object detection models. Moreover, generally speaking, the computational resources of UAV platforms are limited, and the impact of the model on the object detection speed must be considered. One-stage segment object detector is more suitable for object detection in UAV platforms compared to the two-stage object detector. Therefore, the YOLOv5模型作为水面漂浮物小物体检测的原始模型。 model is finally selected as the original model for small object detection of a water surface floater in this reseach. 小物体通常具有较低的分辨率和不太明显的特征。因此，实现对小物体的精确检测是物体检测领域的热点问题。许多学者在小物体检测方面进行了大量的研究工作。Small objects usually have lower resolution and less distinctive features. Therefore, achieving precise detection of small objects is a hot issue in the field of object detection. Many scholars have conducted a lot of research work on small object detection. Kim等人插入了高分辨率处理模块（HRPM）和S形融合模块（SFM），不仅降低了计算复杂度，而且提高了小目标的检测精度。他们在无人机侦察图像和小型车辆中获得了良好的检测结果 et al. inserted a high-resolution processing module (HRPM) and a sigmoid fusion module (SFM), which not only reduced computational complexity but also improved the detection accuracy of small targets. They obtained good detection results in drone reconnaissance images and small vehicles [40]。. Wang等人提出了一种名为BANet的双向注意力网络，解决了对小目标和多目标检测不准确、效率低下的问题。与YOLOX相比，该模型在VOC0数据集上的AP改善了55.2-93.2012%，在 et al. proposed a bidirectional attention network called BANet, which solved the problems of inaccurate and inefficient detection of small and multiple targets. The model achieved an AP improvement of 0.55–2.93% compared to YOLOX on the VOC2012 dataset and a, AP improvement of 0.3–1.01% on the MSCOCO0数据集上实现了AP改善3.1-01.2017%2017 dataset [41]。. Yang等人提出了 et al. proposed QueryDet，它使用一种称为级联稀疏查询（CSQ）的新查询机制来加速推理并使用稀疏查询计算检测结果。该模型通过避免背景区域中的冗余计算来获得高分辨率特征图。, which uses a new query mechanism called cascade sparse query (CSQ) to speed up inference and calculate detection results using sparse queries. The model obtains high-resolution feature maps by avoiding redundant calculations in the background area. QueryDet应用于FCOS和 is applied in FCOS and Faster-R CNN，并在COCO数据集和 and is tested on the COCO dataset and the visDrone数据集上针对小物体进行了测试，在准确性和推理速度方面取得了比原始算法更好的结果 dataset for small objects, achieving better results than the original algorithms in terms of accuracy and inference speed [42]。. Liu等人通过改变ROI对齐方法解决了小目标识别精度低的问题，降低了 et al. addressed the problem of low accuracy in small target recognition by changing the ROI alignment method, which reduced the quantization error of Faster R-CNN的量化误差，与原始模型相比，其精度提高了7% and improved its accuracy by 7% compared to the original model [43]。.

3. 无人机捕获图像中的目标检测Object Detection in UAV-Captured Images

无人机已广泛应用于农业、林业、电力、大气探测、测绘等行业。与传统方法相比，无人机UAV has been widely applied in industries, such as agriculture, forestry, electric power, atmospheric detection, and mapping, among others. Compared with traditional methods, UAVs [44]具有灵活性和机动性、高效节能、结果多样化和运行成本低等优点。 have the advantages of flexibility and mobility, high efficiency and energy saving, diversified results, and low operating costs. Lan等人结合无人机技术和深度学习，利用 et al. combined UAV technology and deep learning to complete the detection of diseased plants and abnormal areas in orchards using the Swin-T YOLOX轻量级模型完成了果园患病植物和异常区域的检测。此外，与原始 lightweight model. Additionally, the improved model compared with the original YOLOVX相比，改进后的模型的检测准确率为1.9% had a detection accuracy of 1.9% [45]。. Bajić等人使用改进的 et al. used the improved YOLOV5进行基于无人机热成像的未爆炸战争遗留物的物体检测，所有90个探测对象的准确率均在11%以上 for object detection of unexploded remnants of war based on UAV thermal imaging, and the accuracy was above 90% for all 11 detection objects [46]。. Liang等人提出了一种基于无人机的低空遥感头盔检测系统，将小目标检测的AP值提高到88.7%，显著提高了网络对小物体的检测性能 et al. proposed a UAV-based low-altitude remote-sensing-based helmet detection system, which improved the AP value of small object detection to 88.7% and significantly improved the network’s detection performance for small objects [47]。. Wang等基于YOLOX-nano网络构建了增强型 et al. built enhanced CSPDarknet（ECSP）和加权聚合特征重提取金字塔模块（WAFR），解决了由于占用像素数量少而导致放牧牲畜识别精度低的问题 (ECSP) and weighted aggregate feature re-extraction pyramid modules (WAFR) based on the YOLOX-nano network, which solved the problem of low recognition accuracy for grazing livestock due to the small number of occupied pixels [48]。但由于无人机图像中物体所占像素和特征数量有限，检测精度较低。因此，基于无人机图像的小物体检测仍然是一个重大挑战。. However, due to the limited number of pixels and features occupied by objects in UAV images, the detection accuracy is low. Thus, small object detection based on UAV imagery remains a significant challenge.

4. 小型浮水车的物体检测Object Detection for Small Water Floater

随着深度学习越来越多地应用于物体检测，一些研究人员也将其应用于水面漂浮物检测。深度学习在水面漂浮物中的应用，为及时检测和处理漂浮物，提高江河湖泊监管水平做出了巨大贡献。As deep learning is increasingly applied to object detection, some researchers have also applied it to water surface floater detection. The application of deep learning in floating objects on the water’s surface has contributed greatly to timely detection and processing of floating objects and to improving the supervision level of rivers and lakes. Li等人使用安装在无人船上的相机从南中国海收集海洋图像，经过数据增强后获得了包含4000张图像的数据集。基于 et al. collected marine images from the South China Sea using a camera mounted on an unmanned boat and obtained a dataset containing 4000 images after data augmentation. A fusion model based on YOLOv3和 and DenseNet的融合模型用于检测海面船只目标 was used to detect sea surface vessel targets [35]。. Zhang等人使用从北京北运河某处视频监控三天中提取的数据集，检测漂浮在水面上的塑料袋和瓶子。采用的检测模型为以VGG16为特征提取器的 et al. used a dataset extracted from three days of video monitoring of a certain location on the Beijing North Canal to detect plastic bags and bottles floating on the water’s surface. The detection model used was Faster R-CNN，从VGG4中选择conv3_5和conv3_16进行特征融合，以提高小物体的检测精度 with VGG16 as the feature extractor, in which conv4_3 and conv5_3 were selected from VGG16 for feature fusion, to improve the detection accuracy of small objects [49]。. In 2020年，van等人利用从印度尼西亚雅加达五座不同水桥采集的视频构建了塑料漂浮垃圾监测数据集，并通过两轮检测识别塑料漂浮物。使用更快的R-CNN检测可能包含塑料漂浮物体的区域，然后使用基于预先训练的COCO的, van et al. constructed a plastic floating garbage monitoring dataset using videos collected from five different waterway bridges in Jakarta, Indonesia, and identified plastic floating objects through two rounds of detection. Faster R-CNN was used to detect regions that may contain plastic floating objects, and then Inception V2来确定这些区域是否含有塑料 based on pre-trained COCO was used to determine whether these regions contain plastic [33]。.