小物体检测和交通标志检测

小物体检测和交通标志检测: Comparison

Please note this is a comparison between Version 1 by 赖华清华清赖 and Version 4 by Jessie Wu.

The detection of traffic signs is easily affected by changes in the weather, partial occlusion, and light intensity, which increases the number of potential safety hazards in practical applications of autonomous driving.交通标志的检测容易受到天气变化、部分遮挡和光照强度的影响，增加了自动驾驶实际应用中安全隐患的数量。

small object detection
multi-scale feature fusion
loss function
data

1. Introduction简介

The traffic sign detection system is an important part of an intelligent transportation system. It can effectively provide the driver with current road traffic information, and it can also ensure the operational safety of the intelligent vehicle control system. In recent years, due to the far-reaching impact of this technology on traffic safety, this field has been deeply studied by many researchers.交通标志检测系统是智能交通系统的重要组成部分。它可以有效地为驾驶员提供当前的道路交通信息，也可以保证智能车辆控制系统的运行安全。近年来，由于该技术对交通安全的深远影响，这一领域被许多研究人员深入研究。

Traditional traffic sign detection algorithms are mainly concentrated on color segmentation, combining features such as the shape and contour for feature extraction, and then realizing the recognition of traffic sign by completing feature classification through classifiers ^{[1][2][3][4][5][6]}传统的交通标志检测算法主要集中在颜色分割上，结合形状和轮廓等特征进行特征提取，然后通过分类器完成特征分类，实现交通标志的识别[1，2，3，4，5，6]. The handmade features in traditional techniques are human exhaustion and a lack of sufficient robustness to deal with complex and changeable traffic environments. In recent years, traffic sign detection algorithms based on deep convolutional neural networks have been widely developed. They are mainly divided into two categories: the two-stage object detection algorithm represented by the region-based convolutional network (传统技术中的手工制作特征是人类疲惫不堪，缺乏足够的稳健性来应对复杂多变的交通环境。近年来，基于深度卷积神经网络的交通标志检测算法得到了广泛的发展。它们主要分为两类：以基于区域的卷积网络（R-CNN) series ^[7][8][9], and the one-stage object detection algorithm represented by the you only look once (）系列[7，8，9]为代表的两阶段目标检测算法，以及以你只看一次（YOLO) series ^[10][11][12] and the single shot multibox detector (）系列[10，11，12]和单次多盒检测器（SSD) series ^[13][14]）系列[13，14]为代表的单阶段目标检测算法。]. The two-stage algorithm has achieved remarkable results in accuracy, but the lack of real-time performance means that it is difficult to apply most of the methods to practical detection tasks. Researchers are more concerned with the one-stage algorithm because it can predict the object categories and generate the bounding boxes simultaneously, being competent for detection tasks with high real-time requirements. 两阶段算法在准确性方面取得了显著成果，但缺乏实时性意味着难以将大多数方法应用于实际检测任务。研究人员更关注单阶段算法，因为它可以预测对象类别并同时生成边界框，能够胜任具有高实时要求的检测任务。Zhang et al. ^[15] introduced a multi-scale spatial pyramid pooling block based on the 等[15]介绍了一种基于YOLOv3 ^[10] algorithm, aiming to accurately realize the real[10]算法的多尺度空间金字塔池块，旨在精确实现交通标志的实时定位和分类。该算法在清华-time location and classification of traffic signs. The mean average precision (mAP) of the algorithm on the Tsinghua-Tencent 腾讯100K (（TT100K) dataset ^[16] was satisfactory, but it detected only ）数据集[16]上的平均精度（mAP）令人满意，但仅检测到每秒23.81 frames per second (帧（FPS). Wu et al. ^[17] proposed a traffic sign detection model based on ）。Wu等人[17]提出了一种基于SSD ^[13] combined with a receptive field module ([13]结合感受野模块（RFM) and path aggregation network (）和路径聚合网络（PAN) ^[18], which achieved a 95.4% and 95.9% mAP on the ）[18]的交通标志检测模型，在德国交通标志检测基准（German Traffic Sign Detection Benchmark (GTSDB) dataset ^[19] and SDB）数据集[95]和CSUST 中国交通标志检测基准（Chinese Traffic Sign Detection Benchmark (CCTSDB) dataset ^[20], respectively, but it has high requirements for the storage capacity and computing power of the device. CTSDB）数据集[4]，但对设备的存储容量和算力要求很高。Yan et al. ^[21] proposed an auxiliary information enhanced 等人[95]提出了一种基于YOLO algorithm based on v9的辅助信息增强YOLOv5, which achieved a detection speed of 84.8% mAP and 100.7 FPS on the TT100K dataset, but its robustness against complex scenes such as extreme weather and lighting changes has not been verified.算法，该算法在TT19K数据集上实现了20.21%mAP和5.84 FPS的检测速度，但其对极端天气和照明变化等复杂场景的鲁棒性尚未得到验证。

The research on the detection of traffic signs in harsh environments such as fog, strong light, and insufficient light has attracted the attention of many scholars. 在雾、强光、光线不足等恶劣环境下交通标志检测的研究引起了众多学者的关注。Hnewa et al. ^[22] proposed a novel multi-scale domain adaptive 等人[22]提出了一种新的多尺度域自适应YOLO framework, which extracts domain-invariant features from blurred long-distance image regions and has a significant effect on foggy image datasets. 框架，该框架从模糊的远距离图像区域中提取域不变特征，并对雾图像数据集具有显着影响。Fan et al. ^[23] proposed a multi-scale traffic sign detection algorithm based on an attention mechanism, which can effectively reduce the effect of illumination changes on traffic sign detection. 等[23]提出了一种基于注意力机制的多尺度交通标志检测算法，可以有效降低光照变化对交通标志检测的影响。Zhou et al. ^[24] proposed an attention network based on high-resolution traffic sign classification to overcome the complex factors of icy and snowy environments. However, the above methods used a single scene and cannot be effectively applied to multi-scene detection tasks.等人[24]提出了一种基于高分辨率交通标志分类的注意力网络，以克服冰雪环境的复杂因素。然而，上述方法使用的是单个场景，无法有效地应用于多场景检测任务。

2. Small Object Detection

There

小物体检测

通常有两种方法可以定义小对象。一个定义指出，对象大小必须小于原始大小的 are usually two ways to define small objects. One definition states that the object size must be smaller than 0.12% of the original size to be regarded as a small object. This research takes this as a reference. The other is an absolute size definition, that is, the object size must be smaller than 才能被视为小对象。本文以此为参考。另一种是绝对大小定义，即对象大小必须小于 32 × 32 pixels. Therefore, small object detection has always been a difficult topic to address in the field of object detection. At present, multi-scale fusion, the receptive field angle, high-resolution detection, and context-aware detection are the main approaches to small object detection. In high-resolution detection ^[25][26], high-resolution feature maps are established and predicted to obtain fine details, but context information is lost. In addition, to obtain the context information of the object, there are several methods ^[27][28] that use the top-down and bottom-up paths to fuse the features of different layers, which can greatly increase their receptive field. The feature pyramid network (像素。因此，小物体检测一直是物体检测领域难以解决的课题。目前，多尺度融合、感受野角、高分辨率检测和情境感知检测是小目标检测的主要方法。在高分辨率检测中[26，27]，建立并预测高分辨率特征图以获得精细细节，但上下文信息会丢失。此外，为了获得对象的上下文信息，有几种方法[28，29]使用自上而下和自下而上的路径来融合不同层的特征，这可以大大增加它们的感受野。本文以特征金字塔网络（FPN) ^[29] ）[30]+ PAN was used as the feature fusion module of the network, and a multiple attention mechanism was introduced in the model backbone to enhance the learning of context and expand the receptive field, so as to effectively improve the accuracy of small object detection. 作为网络的特征融合模块，在模型骨干中引入多注意力机制，增强上下文学习，拓展感受野，有效提高小目标检测精度。

3. Traffic Signs Detection

The key to traffic sign detection is to extract distinguishable features. Due to limitations in the computer power and available dataset size, the performance of traditional methods depends on the effectiveness of the manual extraction of features, such as color-based ^[30][31] and shape-based methods ^[32][33]. These methods are also easily affected by factors such as extreme weather, illumination changes, variable shooting angles, and obstacles, and can only be applied to limited scenes. In order to promote traffic sign detection in real scenes, many authors have published excellent traffic sign datasets, such as the

交通标志检测

交通标志检测的关键是提取可区分的特征。由于计算机能力和可用数据集大小的限制，传统方法的性能取决于手动提取特征的有效性，例如基于颜色的方法[31，32]和基于形状的方法[33，34]。这些方法也容易受到极端天气、光照变化、可变拍摄角度、障碍物等因素的影响，只能应用于有限的场景。

为了促进真实场景中的交通标志检测，许多作者发表了优秀的交通标志数据集，如智能安全汽车实验室（Laboratory for Intelligent and Safe Automobiles (LISA) dataset ^[34], SA）数据集[35]，GTSDB, ，CCTSDB, and 和TT100K. Since the 。由于TT100K dataset covers partial occlusion, illumination changes, and viewing angle changes, it is closer to the real scene than other datasets. With the development of deep learning technology, and the publication of several excellent public datasets, the performance of traffic sign detection algorithms based on deep learning has been significantly improved compared with the traditional traffic sign detection algorithms. Zhang et al. ^[35] used the Cascade 数据集涵盖了部分遮挡、照明变化和视角变化，因此它比其他数据集更接近真实场景。随着深度学习技术的发展，以及几个优秀的公共数据集的发布，基于深度学习的交通标志检测算法的性能相比传统的交通标志检测算法有了显著的提升。Zhang等人[36]使用级联R-CNN ^[8] combined with the sample balance method to detect traffic signs, achieving ideal detection results on both [8]结合样本平衡法检测交通标志，在CCTSDB and 和GTSDB. Sun et al. ^[36] proposed a feature expression enhanced 上都取得了理想的检测结果。Sun等人[37]提出了一种特征表达式增强型SSD detection algorithm, which achieved an 检测算法，该算法在TT81.26% and 90.52% mAP on TT100K and CCTSDB, respectively. However, the detection speed of this algorithm was only K和CCTSDB上分别实现了26.90%和52.100%的mAP。然而，该算法的检测速度仅为22.86 FPS and 和25.08 FPS, which could not achieve real-time performance. Liu et al. ^[37] proposed a symmetric traffic sign detection algorithm, which optimizes the delay problem by reducing the computing overhead of the network and, at the same time, improves the traffic sign detection performance in complex environments, such as scale and illumination changes, achieving a ，无法实现实时性能。Liu等[38]提出了一种对称交通标志检测算法，该算法通过降低网络的计算开销来优化时延问题，同时提高复杂环境下的交通标志检测性能，例如规模和照明变化，在CCTSDB数据集上实现了97.8% 的mAP and 和84 FPS on the CCTSDB dataset. However, the integration of multiple modules leads to insufficient global information acquisition. 。然而，多个模块的整合导致全球信息获取不足。