Small Object Detection and Traffic Signs Detection: Comparison
Please note this is a comparison between Version 4 by Jessie Wu and Version 3 by 赖华清 华清 赖.

The detection of traffic signs is easily affected by changes in the weather, partial occlusion, and light intensity, which increases the number of potential safety hazards in practical applications of autonomous driving.

  • small object detection
  • multi-scale feature fusion
  • loss function
  • data

1. Introduction

The traffic sign detection system is an important part of an intelligent transportation system. It can effectively provide the driver with current road traffic information, and it can also ensure the operational safety of the intelligent vehicle control system. In recent years, due to the far-reaching impact of this technology on traffic safety, this field has been deeply studied by many researchers.
Traditional traffic sign detection algorithms are mainly concentrated on color segmentation, combining features such as the shape and contour for feature extraction, and then realizing the recognition of traffic sign by completing feature classification through classifiers [1,2,3,4,5,6][1][2][3][4][5][6]. The handmade features in traditional techniques are human exhaustion and a lack of sufficient robustness to deal with complex and changeable traffic environments. In recent years, traffic sign detection algorithms based on deep convolutional neural networks have been widely developed. They are mainly divided into two categories: the two-stage object detection algorithm represented by the region-based convolutional network (R-CNN) series [7,8[7][8][9],9], and the one-stage object detection algorithm represented by the you only look once (YOLO) series [10,11,12][10][11][12] and the single shot multibox detector (SSD) series [13,14][13][14]. The two-stage algorithm has achieved remarkable results in accuracy, but the lack of real-time performance means that it is difficult to apply most of the methods to practical detection tasks. Researchers are more concerned with the one-stage algorithm because it can predict the object categories and generate the bounding boxes simultaneously, being competent for detection tasks with high real-time requirements. Zhang et al. [15] introduced a multi-scale spatial pyramid pooling block based on the YOLOv3 [10] algorithm, aiming to accurately realize the real-time location and classification of traffic signs. The mean average precision (mAP) of the algorithm on the Tsinghua-Tencent 100K (TT100K) dataset [16] was satisfactory, but it detected only 23.81 frames per second (FPS). Wu et al. [17] proposed a traffic sign detection model based on SSD [13] combined with a receptive field module (RFM) and path aggregation network (PAN) [18], which achieved a 95.4% and 95.9% mAP on the German Traffic Sign Detection Benchmark (GTSDB) dataset [19] and CSUST Chinese Traffic Sign Detection Benchmark (CCTSDB) dataset [20], respectively, but it has high requirements for the storage capacity and computing power of the device. Yan et al. [21] proposed an auxiliary information enhanced YOLO algorithm based on YOLOv5, which achieved a detection speed of 84.8% mAP and 100.7 FPS on the TT100K dataset, but its robustness against complex scenes such as extreme weather and lighting changes has not been verified.
The research on the detection of traffic signs in harsh environments such as fog, strong light, and insufficient light has attracted the attention of many scholars. Hnewa et al. [22] proposed a novel multi-scale domain adaptive YOLO framework, which extracts domain-invariant features from blurred long-distance image regions and has a significant effect on foggy image datasets. Fan et al. [23] proposed a multi-scale traffic sign detection algorithm based on an attention mechanism, which can effectively reduce the effect of illumination changes on traffic sign detection. Zhou et al. [24] proposed an attention network based on high-resolution traffic sign classification to overcome the complex factors of icy and snowy environments. However, the above methods used a single scene and cannot be effectively applied to multi-scene detection tasks.

2. Small Object Detection

There are usually two ways to define small objects. One definition states that the object size must be smaller than 0.12% of the original size to be regarded as a small object. This preseaperrch takes this as a reference. The other is an absolute size definition, that is, the object size must be smaller than 32 × 32 pixels. Therefore, small object detection has always been a difficult topic to address in the field of object detection. At present, multi-scale fusion, the receptive field angle, high-resolution detection, and context-aware detection are the main approaches to small object detection. In high-resolution detection [26[25][26],27], high-resolution feature maps are established and predicted to obtain fine details, but context information is lost. In addition, to obtain the context information of the object, there are several methods [28,29][27][28] that use the top-down and bottom-up paths to fuse the features of different layers, which can greatly increase their receptive field. In tThis paper, the e feature pyramid network (FPN) [30][29] + PAN was used as the feature fusion module of the network, and a multiple attention mechanism was introduced in the model backbone to enhance the learning of context and expand the receptive field, so as to effectively improve the accuracy of small object detection.

3. Traffic Signs Detection

The key to traffic sign detection is to extract distinguishable features. Due to limitations in the computer power and available dataset size, the performance of traditional methods depends on the effectiveness of the manual extraction of features, such as color-based [31,32][30][31] and shape-based methods [33,34][32][33]. These methods are also easily affected by factors such as extreme weather, illumination changes, variable shooting angles, and obstacles, and can only be applied to limited scenes. In order to promote traffic sign detection in real scenes, many authors have published excellent traffic sign datasets, such as the Laboratory for Intelligent and Safe Automobiles (LISA) dataset [35][34], GTSDB, CCTSDB, and TT100K. Since the TT100K dataset covers partial occlusion, illumination changes, and viewing angle changes, it is closer to the real scene than other datasets. With the development of deep learning technology, and the publication of several excellent public datasets, the performance of traffic sign detection algorithms based on deep learning has been significantly improved compared with the traditional traffic sign detection algorithms. Zhang et al. [36][35] used the Cascade R-CNN [8] combined with the sample balance method to detect traffic signs, achieving ideal detection results on both CCTSDB and GTSDB. Sun et al. [37][36] proposed a feature expression enhanced SSD detection algorithm, which achieved an 81.26% and 90.52% mAP on TT100K and CCTSDB, respectively. However, the detection speed of this algorithm was only 22.86 FPS and 25.08 FPS, which could not achieve real-time performance. Liu et al. [38][37] proposed a symmetric traffic sign detection algorithm, which optimizes the delay problem by reducing the computing overhead of the network and, at the same time, improves the traffic sign detection performance in complex environments, such as scale and illumination changes, achieving a 97.8% mAP and 84 FPS on the CCTSDB dataset. However, the integration of multiple modules leads to insufficient global information acquisition.

References

  1. Zhang, T.; Zou, J.; Jia, W. Fast and robust road sign detection in driver assistance systems. Appl. Intell. 2018, 48, 4113–4127.
  2. Wang, C.W.; You, W.H. Boosting-SVM: Effective learning with reduced data dimension. Appl. Intell. 2013, 39, 465–474.
  3. Souani, C.; Faiedh, H.; Besbes, K. Efficient algorithm for automatic road sign recognition and its hardware implementation. J. Real-Time Image Process. 2014, 9, 79–93.
  4. Yu, L.; Xia, X.; Zhou, K. Traffic sign detection based on visual co-saliency in complex scenes. Appl. Intell. 2019, 49, 764–790.
  5. Greenhalgh, J.; Mirmehdi, M. Real-time detection and recognition of road traffic signs. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1498–1506.
  6. Berkaya, S.K.; Gunduz, H.; Ozsen, O.; Akinlar, C.; Gunal, S. On circular traffic sign detection and recognition. Expert Syst. Appl. 2016, 48, 67–75.
  7. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 28, 1137–1149.
  8. Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162.
  9. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969.
  10. Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767.
  11. Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A single-stage object detection framework for industrial applications. arXiv 2022, arXiv:2209.02976.
  12. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv 2022, arXiv:2207.02696.
  13. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37.
  14. Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. Dssd: Deconvolutional single shot detector. arXiv 2017, arXiv:1701.06659.
  15. Zhang, H.; Qin, L.; Li, J.; Guo, Y.; Zhou, Y.; Zhang, J.; Xu, Z. Real-time detection method for small traffic signs based on Yolov3. IEEE Access 2020, 8, 64145–64156.
  16. Zhu, Z.; Liang, D.; Zhang, S.; Huang, X.; Li, B.; Hu, S. Traffic-sign detection and classification in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2110–2118.
  17. Wu, J.; Liao, S. Traffic sign detection based on SSD combined with receptive field module and path aggregation network. Comput. Intell. Neurosci. 2022, 2022, 4285436.
  18. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768.
  19. Houben, S.; Stallkamp, J.; Salmen, J.; Schlipsing, M.; Igel, C. Detection of traffic signs in real-world images: The German Traffic Sign Detection Benchmark. In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, USA, 4–9 August 2013; IEEE: New York, NY, USA, 2013; pp. 1–8.
  20. Zhang, J.; Zou, X.; Kuang, L.D.; Wang, J.; Sherratt, R.S.; Yu, X. CCTSDB 2021: A more comprehensive traffic sign detection benchmark. In Human-Centric Computing and Information Sciences; Springer: Berlin/Heidelberg, Germany, 2022; p. 12.
  21. Yan, B.; Li, J.; Yang, Z.; Zhang, X.; Hao, X. AIE-YOLO: Auxiliary Information Enhanced YOLO for Small Object Detection. Sensors 2022, 22, 8221.
  22. Hnewa, M.; Radha, H. Multiscale domain adaptive yolo for cross-domain object detection. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; IEEE: New York, NY, USA, 2021; pp. 3323–3327.
  23. Fan, B.B.; Yang, H. Multi-scale traffic sign detection model with attention. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2021, 235, 708–720.
  24. Zhou, K.; Zhan, Y.; Fu, D. Learning region-based attention network for traffic sign recognition. Sensors 2021, 21, 686.
  25. Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. Sod-mtgan: Small object detection via multi-task generative adversarial network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 206–221.
  26. Bai, Y.; Zhang, Y.; Ding, M.; Ghanem, B. Finding tiny faces in the wild with generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 21–30.
  27. Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400.
  28. Yu, F.; Koltun, V.; Funkhouser, T. Dilated residual networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 472–480.
  29. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125.
  30. Gómez-Moreno, H.; Maldonado-Bascón, S.; Gil-Jiménez, P.; Lafuente-Arroyo, S. Goal evaluation of segmentation algorithms for traffic sign recognition. IEEE Trans. Intell. Transp. Syst. 2010, 11, 917–930.
  31. Salti, S.; Petrelli, A.; Tombari, F.; Fioraio, N.; Di Stefano, L. Traffic sign detection via interest region extraction. Pattern Recognit. 2015, 48, 1039–1049.
  32. Fang, C.Y.; Chen, S.W.; Fuh, C.S. Road-sign detection and tracking. IEEE Trans. Veh. Technol. 2003, 52, 1329–1341.
  33. Barnes, N.; Zelinsky, A.; Fletcher, L.S. Real-time speed sign detection using the radial symmetry detector. IEEE Trans. Intell. Transp. Syst. 2008, 9, 322–332.
  34. Møgelmose, A.; Liu, D.; Trivedi, M.M. Detection of US traffic signs. IEEE Trans. Intell. Transp. Syst. 2015, 16, 3116–3125.
  35. Zhang, J.; Xie, Z.; Sun, J.; Zou, X.; Wang, J. A cascaded R-CNN with multiscale attention and imbalanced samples for traffic sign detection. IEEE Access 2020, 8, 29742–29754.
  36. Sun, C.; Wen, M.; Zhang, K.; Meng, P.; Cui, R. Traffic sign detection algorithm based on feature expression enhancement. Multimed. Tools Appl. 2021, 80, 33593–33614.
  37. Liu, Y.; Shi, G.; Li, Y.; Zhao, Z. M-YOLO: Traffic sign detection algorithm applicable to complex scenarios. Symmetry 2022, 14, 952.
More
Video Production Service