Object Detection for Small Water Floater: Comparison
Please note this is a comparison between Version 2 by Fuxun Chen Chen and Version 3 by Peter Tang.

Object detection is one of the most widely used applications in UAV missions. Detection of small objects in unmanned aerial vehicle (UAV) images remains a persistent challenge due to the limited pixel values and interference from background noise.

  • small objects
  • object detection
  • improved YOLOv5
  • water surface floaters
  • UAV images

1. Introduction

Water is the source of life, and oceans and rivers cover about 71% of the Earth’s area [1]. Rivers constitute a vital component of the global water cycle and serve a crucial function in facilitating the transfer [2]. However, recent global economic expansion and urbanization have caused significant damage to the natural environment [3] and severe water pollution [4]. Therefore, real-time monitoring of water environments has become crucial due to the ongoing deterioration of water quality. At present, the water’s surface is susceptible to problems, such as exposure or dim light due to the intensity of sunlight, which results in less information about the target features and difficult recognition results.
Object detection is one of the most widely used applications in UAV missions. Due to the angle and flight height of UAV photography, small objects account for a large proportion of UAV images compared to general scenes [5]. Small object detection has been one of the key difficulties in the field of object detection [6] with problems, such as invalid image feature information and blurred object features. To solve this problem, many scholars have proposed solutions. Xia et al. proposed an automated driving system (ADS) data acquisition and analytics platform for vehicle trajectory extraction, reconstruction, and evaluation. In addition to collecting various sensor data, the platform can also use deep learning to detect small targets, such as vehicle trajectories [7]. Liu et al. proposed a model for tassel detection in maize, named yolov5-tassel. The authors trained the yolov5-tassel model based on UAV remote sensing images and achieved an mAP value of 44.7%, which is an improvement compared to FCOS, RetinaNet, and YOLOv5 in terms of detecting small tassels in maize [8]. Liu et al. proposed multibranch parallel feature pyramid networks (MPFPN) to detect small objects under UAV images. The MPFPN model applies the SSAM attention module [9] to attenuate the effect of background noise and uses cascade architecture in the Fast R-CNN stage to achieve a more powerful localization [10]. Chen et al. proposed small object detection networks based on a classification-oriented super-resolution generative adversarial network (CSRGAN), which is a model that adds classification branches and introduces classification losses into a typical SRGAN [11]. The experimental results demonstrate that CSRGAN outperforms VGG16 in classification [12]. In this research, UAV technology and deep learning are applied to small object detection at the same time. However, the water’s surface is susceptible to problems, such as exposure or dim light due to the intensity of sunlight, which results in less information about the water surface floater’s features and difficult recognition results. Deep learning models also face some problems when detecting a small water floater, such as small water surface floaters losing some of their features in the process of down-sampling, which leads to a lack of ability to extract features from global and low detection accuracy and poor feature recognition of a small water surface floater. Therefore, the direct application of the original YOLOv5 to UAV image object detection is not very effective. At the same time, UAV is also known as a powerful complement to the conventional water environment and assessment and has been gradually applied to water environment detection. So, this research proposes an efficient and accurate method based on YOLOV5 for the detection of small water surface floaters in UAV-captured images. The model can effectively locate the water surface floater and thus assist in the water surface floater salvage work.

2. Object Detection

Computer vision research is mainly concerned with image classification, object detection, object tracking, semantic segmentation, and instance segmentation. Object detection is one of the most fundamental and challenging tasks in the field of computer vision [13]. Exploring efficient real-time object detection models has been a hot research topic in recent years [14]. Traditional object detection methods usually include region proposal, feature extraction, feature fusion, and classifier training, all of which require the laborious manual production of object features and must be completed step by step [15]. Therefore, traditional object detection methods have the drawbacks of massive redundancy, time-consuming computation, and low accuracy. With the rapid development of deep learning theory, deep learning has made great progress in the field of computer vision, such as image classification and object detection [16]. Since then, object detection has entered a new phase. In the 2012 ImageNet competition, compared with the traditional method, A. Krizhevsky et al. used a convolutional neural network (CNN) [17] to make image classification results much improved. CNN gives region suggestions through a region suggestion network (RPN) [18] at a low cost, which can significantly improve the efficiency of object detection. In 2014, R. Girshick et al. [19] used a region convolutional neural network for the first time in the field of object detection. The detection results were much improved. Compared to traditional methods, this end-to-end network is more popular because it reduces the complex steps, such as data preprocessing and manual design of object features [20]. Since then, deep learning has started to develop rapidly in object detection and has been widely used in practice [21]. There are two major categories of mainstream deep learning object detection algorithms: One is the one-stage object detector, such as the YOLO series, SSD [22], RetinaNet [23], FCOS [24], etc. The second is two-stage object detectors, such as R-CNN [25], Fast R-CNN [26], Faster R-CNN [27], Mask R-CNN [28], etc. For example, the two-stage object detector Faster R-CNN integrates RPN (region proposal network) on top of Fast R-CNN. The Faster R-CNN mainly includes the shared convolutional layer module, RPN, and Fast R-CNN detector. However, due to the large number of prediction frames of the two-stage object detector and the large computational effort and slow detection speed, it is not suitable for real-time detection tasks. In 2015, J. Redmon et al. proposed a one-stage object detector, You Only Look Once (YOLO) [29], which directly obtains the class and bounding box information of the object and greatly improves the object detection speed. Ma et al. proposed an improved YOLOv3 [30], which has stronger noise immunity and better generalization ability. Deep learning also has good applications in the detection of the water surface floater. Lieshoutet et al. constructed a plastic floating debris monitoring dataset using videos collected from five different waterway bridges in Jakarta, Indonesia, and used Faster R-CNN to first detect regions that may contain plastic floating materials and then Inception V2 [31], pre-trained based on COCO [32], to determine whether these regions are packed with plastic [33]. Li et al. acquired ocean images of the South China Sea using a camera mounted on an unmanned ship and used a fusion model based on YOLOv3 with DenseNet [34] to detect sea surface objects [35]. Stofa et al. selected the DenseNet model to detect ships in remotely sensed images and determined by fine-tuning the hyperparameters a batch size of 16 and a learning rate of 0.0001. The model uses the Adam optimizer [36] and optimized the parameter settings [37]. In contrast, YOLOv5 [38] discarded the candidate frame generation phase and directly performed classification and regression operations on the objects, which improved the real-time detection speed of the object detection algorithm, and the model complexity of YOLOv5 was reduced by about 10% compared to YOLOv4 [39]. YOLOv5 has a stronger generalization capability and a lighter network structure compared to other object detection models. Moreover, generally speaking, the computational resources of UAV platforms are limited, and the impact of the model on the object detection speed must be considered. One-stage segment object detector is more suitable for object detection in UAV platforms compared to the two-stage object detector. Therefore, the YOLOv5 model is finally selected as the original model for small object detection of a water surface floater in this reseach. Small objects usually have lower resolution and less distinctive features. Therefore, achieving precise detection of small objects is a hot issue in the field of object detection. Many scholars have conducted a lot of research work on small object detection. Kim et al. inserted a high-resolution processing module (HRPM) and a sigmoid fusion module (SFM), which not only reduced computational complexity but also improved the detection accuracy of small targets. They obtained good detection results in drone reconnaissance images and small vehicles [40]. Wang et al. proposed a bidirectional attention network called BANet, which solved the problems of inaccurate and inefficient detection of small and multiple targets. The model achieved an AP improvement of 0.55–2.93% compared to YOLOX on the VOC2012 dataset and a, AP improvement of 0.3–1.01% on the MSCOCO2017 dataset [41]. Yang et al. proposed QueryDet, which uses a new query mechanism called cascade sparse query (CSQ) to speed up inference and calculate detection results using sparse queries. The model obtains high-resolution feature maps by avoiding redundant calculations in the background area. QueryDet is applied in FCOS and Faster-R CNN and is tested on the COCO dataset and the visDrone dataset for small objects, achieving better results than the original algorithms in terms of accuracy and inference speed [42]. Liu et al. addressed the problem of low accuracy in small target recognition by changing the ROI alignment method, which reduced the quantization error of Faster R-CNN and improved its accuracy by 7% compared to the original model [43].

3. Object Detection in UAV-Captured Images

UAV has been widely applied in industries, such as agriculture, forestry, electric power, atmospheric detection, and mapping, among others. Compared with traditional methods, UAVs [44] have the advantages of flexibility and mobility, high efficiency and energy saving, diversified results, and low operating costs. Lan et al. combined UAV technology and deep learning to complete the detection of diseased plants and abnormal areas in orchards using the Swin-T YOLOX lightweight model. Additionally, the improved model compared with the original YOLOVX had a detection accuracy of 1.9% [45]. Bajić et al. used the improved YOLOV5 for object detection of unexploded remnants of war based on UAV thermal imaging, and the accuracy was above 90% for all 11 detection objects [46]. Liang et al. proposed a UAV-based low-altitude remote-sensing-based helmet detection system, which improved the AP value of small object detection to 88.7% and significantly improved the network’s detection performance for small objects [47]. Wang et al. built enhanced CSPDarknet (ECSP) and weighted aggregate feature re-extraction pyramid modules (WAFR) based on the YOLOX-nano network, which solved the problem of low recognition accuracy for grazing livestock due to the small number of occupied pixels [48]. However, due to the limited number of pixels and features occupied by objects in UAV images, the detection accuracy is low. Thus, small object detection based on UAV imagery remains a significant challenge.

4. Object Detection for Small Water Floater

As deep learning is increasingly applied to object detection, some researchers have also applied it to water surface floater detection. The application of deep learning in floating objects on the water’s surface has contributed greatly to timely detection and processing of floating objects and to improving the supervision level of rivers and lakes. Li et al. collected marine images from the South China Sea using a camera mounted on an unmanned boat and obtained a dataset containing 4000 images after data augmentation. A fusion model based on YOLOv3 and DenseNet was used to detect sea surface vessel targets [35]. Zhang et al. used a dataset extracted from three days of video monitoring of a certain location on the Beijing North Canal to detect plastic bags and bottles floating on the water’s surface. The detection model used was Faster R-CNN with VGG16 as the feature extractor, in which conv4_3 and conv5_3 were selected from VGG16 for feature fusion, to improve the detection accuracy of small objects [49]. In 2020, van et al. constructed a plastic floating garbage monitoring dataset using videos collected from five different waterway bridges in Jakarta, Indonesia, and identified plastic floating objects through two rounds of detection. Faster R-CNN was used to detect regions that may contain plastic floating objects, and then Inception V2 based on pre-trained COCO was used to determine whether these regions contain plastic [33].
Video Production Service