Enhanced Traffic Sign Recognition with Ensemble Learning

Enhanced Traffic Sign Recognition with Ensemble Learning: Comparison

Please note this is a comparison between Version 1 by Xin Roy Lim and Version 2 by Camila Xu.

Traffic sign recognition plays a crucial role in the functioning of autonomous vehicles. The ability to accurately identify and interpret traffic signs is necessary for autonomous vehicles to navigate roads safely and efficiently. Machine learning techniques are used to train and test models on traffic sign data, including prohibitory, danger, mandatory, and other signs.

traffic sign recognition
convolutional neural network
ensemble learning

1. Introduction

The technology market continues to grow, as consumers demand new and innovative products. Leading technology companies, such as Microsoft, Tesla, and Ford, are investing in the development of autonomous vehicles. However, the recent rise in the number of reported car crashes involving autonomous vehicles, highlights the need for advanced and accurate machine learning algorithms.

Traffic sign recognition plays a crucial role in the functioning of autonomous vehicles ^[1][2][3][1,2,3]. The ability to accurately identify and interpret traffic signs is necessary for autonomous vehicles to navigate roads safely and efficiently. Machine learning techniques are used to train and test models on traffic sign data, including prohibitory, danger, mandatory, and other signs. The goal of these models is to achieve a high level of accuracy in recognizing traffic signs, which will contribute to the development of “smarter autonomous cars” and improve driver safety through advanced driver alert systems.

2. Enhanced Traffic Sign Recognition with Ensemble Learning

In Siniosoglou et al. (2021) [4], a deep autoencoder algorithm was proposed, to detect and recognize traffic signs. The authors used the Carla Traffic Signs Recognition Dataset (CATERED), with 43 classes, for recognition. The dataset contains 94478 traffic signs images. The testing phase was split into two scenarios, which were a centralized detection system and a decentralized system. In both scenarios, the proposed method achieved over 90.00% accuracy. The highest accuracy obtained from the centralized detection system was 99.19%, while the highest accuracy obtained from the decentralized system was 94.19%. Kerim and Efe (2021) [5] introduced an artificial neural network (ANN), combining various models. The authors rearranged and combined the GTSRB and TSRD datasets due to imbalance in traffic signs classes. The new dataset contained 10 classes, arbitrarily. Data augmentation such as translation, rotation, noising, blurring, etc., were used in the proposed method. The authors constructed two experiments, one with the HOG feature and the other one with the combination of color, HOG, and LBP features. In the first experiment, the method achieved a lower accuracy, at 80%, while the second experiment achieved 95% accuracy. The proposed method showed that using color, HOG, and LBP features, significantly enhanced the accuracy of classification. In Li et al. (2019) [6], traffic sign recognition using CNN was proposed. There were two datasets used in this paper, which were GTSRB and BTSD, with more than 50,000 and 7000 images, respectively. The hyperparameter settings used in the experiment were gamma set to 0.1, learning rate of 0.001, with step values of 24,000 and 48,000, for 60,000 iterations. The results obtained from the proposed method were 97.4% accuracy for GTSRB and 98.1% accuracy for BTSD. From the experiment, it showed that the proposed method improved the existing results. Yazdan and Varshosaz (2021) [7] introduced a different method to recognize traffic signs. The proposed method utilized a minimal amount of common images to recognize signs. The suggested solution created a new orthogonal picture of the traffic sign, eliminated the need for numerous images in the training database, and identified the traffic sign. The created image was put through a template matching procedure and compared to an image database that holds a single photograph, that was shot in front of each sign. The orthoimage was created from stereo pictures for this purpose. Therefore, the orthoimage was employed, rather than comparing the image obtained from the urban roadways to the database. The performance achieved from this method was 93.1% accuracy. Bangquan et al. (2019) [8] presented a traffic sign recognition method using an efficient convolutional neural network (ENet). The dataset used in the experiment was GTSRB, with 43 classes of traffic sign, and split into 39,209 images for the training set and 12,630 for the test set. The training process of ENet, applied the Adam optimizer with the soft max cross-entropy loss function. Two pre-trained models of CNN, i.e., VGG16 and LeNet, were used in the network. ENet with the LeNet algorithm achieved an accuracy of 98.6%, whereas VGG16 achieved 96.7% accuracy using GTSRB. Mehta et al. (2019) [9] came up with a deep CNN method for traffic sign classification. Several activation functions, optimizers, and dropout rate were tuned in the experiment. The dataset used was BTSD, which was retrieved from the video clips. Convolutional neural networks with the Adam optimizer, a dropout rate of 0.3, and softmax activation functions, achieved the highest accuracy, of 97.06%. A lightweight CNN was proposed in Zhang et al. (2020) [10], for traffic sign classification. Two models were tested in this paper, namely teacher network and student network. The datasets used for training and testing were GTSRB and BTSC. The main purpose of the teacher network was to train the network before passing it to the student network, with fewer layers, and to enhance the network’s capacity to identify traffic signs. The result, after pruning, obtained from GTSRB was 99.38% accuracy, while from BTSC was 98.89% accuracy. In Jonah and Orike (2021) [11], there were several CNNs trained and tested for traffic sign recognition. The models used were VGG16, ResNet50, and the proposed CNN. The dataset used in the paper was GTSRB, with 43 traffic signs classes, of 34,799 training images, 4410 validation images, and 12,630 testing images. The results showed that VGG16 achieved 95.5% accuracy, ResNet50 achieved 95.4% accuracy, and the proposed CNN model achieved 96.0% accuracy, which was the best performance among the others. Vincent et al. (2020) [12] presented a traffic sign recognition with CNN on the GTSRB dataset. The dataset contained 34,799 training images, 4410 validation images, and 12,630 testing images. The proposed CNN model consisted of four fully connected layers, four convolutional layers, two pooling layers, and one flattening layer. The proposed method achieved 98.44% accuracy for the GTSRB dataset. In Madan et al. (2019) [13], a method using a hybrid combination of histogram of gradients (HOG) features and speed up robust features (SURF), with a CNN classifier, was proposed to classify traffic signs. The only dataset used in the paper was the GTSRB dataset, which contains 39,029 training images. The HOG features with SURF added, were directly supplied to the CNN classifier. The accuracy of the suggested pipeline, employing the fundamental design, was 98.07%. By employing a branching CNN architecture, the method’s performance was improved, with a higher accuracy, of 98.48%. Serna and Ruichek (2018) [14] used several CNN models to perform traffic sign recognition. The traffic sign datasets used were the European Traffic Sign Dataset (ETSD), which was self-collected by the authors, and also GTSRB. Traffic data from six European nations, namely Belgium, Croatia, France, Germany, the Netherlands, and Sweden made up the dataset. The models used in this paper were LeNet-5, IDSIA, URV, CNN asymmetricK, and CNN 8-layers. The results demonstrated that the CNN asymmetricK and CNN 8-layers achieved almost the same accuracy for both datasets, but CNN 8-layers achieved a slightly higher accuracy, of 99.37% accuracy for GTSRB and 98.99% accuracy for ETSD. A combined CNN (CCNN) was used in Chen et al. (2017) [15], to solve traffic sign recognition, where two CNNs with a basic network were used to determine the probability of the superclass and subclass of the traffic signs. Based on the color, form, and function of the sign, the 43 subclasses of traffic signs were classified into five superclasses, namely red circular prohibitory signs, red triangular danger signs, blue circular mandatory signs, black circular derestriction signs, and other signs. The results retrieved from both models were 97.96% accuracy and 98.26% accuracy. The experiments also proved that CCNN with data augmentation performed better. Zheng and Jiang (2022) [16] proposed traffic sign recognition with several CNN models and vision transformer (ViT) models. Three datasets were used in the paper: GTSRB, Indian Cautionary Traffic Sign, and TSRD (Chinese Traffic Sign Detection Benchmark), which contain 43, 15, and 103 traffic sign classes, respectively. The CNN models used in the experiment were VGG16, ResNet, DenseNet, MobileNet, SqueezeNet, ShuffleNet, and MnasNet. Whereas, the ViT used Real-Former, Sinkhorn Transformer, Nyströmformer, and Transformer in Transformer (TNT). The result obtained by the CNN for GTSRB was 98.82% accuracy with the DenseNet model, 99.11% accuracy with ShuffleNet, for the Indian dataset, and 99.42% accuracy with DenseNet, for the Chinese dataset. On the other hand, ViT achieved 86.03% accuracy for GTSRB with RealFormer, 97.10% accuracy for the Indian dataset without any ViT models, and 95.05% accuracy for the Chinese dataset with TNT. The experimental results suggested that transformers are less competitive than CNNs in the task of classifying traffic signs. Another CNN model was presented in Usha et al. (2021) [17] for traffic sign recognition. The dataset used was the GTSRB dataset, which consists of 43 classes, with a total of 39,209 images. Convolution, pooling, and drop out layers made up the proposed CNN architecture. The characteristics from the data were retrieved at each layer, to aid in categorizing the image. The training of the model was executed for only 15 epochs and achieved an accuracy of 97.8%. Fang et al. (2022) [18] introduced a method for traffic sign recognition with MicronNet-BN-Factorization (MicronNet-BF). The dataset used was the GTSRB dataset, with 43 classes, which contains 39,209 training images and 12,630 testing images. Several datasets, such as BTSC, MNIST, SVHN, Cifar10, and Cifar100, were used to compare the results of the proposed methods. A small deep neural network called MicronNet, was suggested for embedded devices to classify traffic signs. The enhanced MicronNet-BF, that fused batch normalization, factorization, and MicronNet, obtained the best result on the GTSRB, which was 99.38% accuracy, using only 1.41 s. Fu and Wang (2021) [19] introduced a method using prototypes of traffic signs, with the pairing of a multi-scale convolutional network (MSCN) and a multi-column deep neural network (MCDNN). There were two datasets used in the experiment, where TSRD was applied for the prototype and GTSRB was used as the testing dataset. MSCNs were trained using pre-training datasets that had undergone different pre-processing. The proposed method achieved an accuracy of 90.13%. Aziz and Youssef (2018) [20] proposed traffic sign recognition using feature extraction and an extreme learning machine (ELM), for classification. The two datasets used in the experiment were the GTSRB and BTSC datasets. There were three feature extraction techniques used, which were HOG, compound local binary patterns (CLBP), and Gabor features. The features were subsequently passed into ELM for classification. The accuracies obtained from the proposed methods were 99.10% for GTSRB and 98.30% for BTSC. It was proved that the proposed method performed better than SVM and KNN. Soni et al. (2019) [21] proposed a method using HOG and LBP features, together with PCA and SVM. TSRD, which consists of 6164 images, with 58 classes, where 4170 are training images and 1994 are testing images, was utilized. There were three main categories of traffic sign identified, which were forbidden sign with red circular shape, mandatory sign with blue circular shape, and warning sign with black triangular shape. For the traffic sign recognition, HOG and LBP were applied to extract the features of each traffic sign. The experiments were conducted with four methods, namely, HOG with SVM, HOG with PCA and SVM, LBP with SVM, and LBP with PCA and SVM. The best performing method was LBP with PCA and SVM classifier, where it achieved 84.44% accuracy. Table 1 provides a summary of the existing works in traffic sign recognition.

Table 1.

Summary of the existing literature on traffic sign recognition.

Author	Algorithm	Dataset	Accuracy (%)
Siniosoglou et al. (2021) [4]	Deep autoencoder	CATERED	99.19
Kerim and Efe (2021) [5]	ANN	GTSRB	95
Li et al. (2019) [6]	CNN	GTSRB	97.4
		BTSD	98.1
Yazdan and Varshosaz (2021) [7]	Normalized cross-correlation (NCC)	BTSD	93.10
Bangquan et al. (2019) [8]	LeNet	GTSRB	98.6
	VGG16		96.7
Mehta et al. (2019) [9]	CNN	BTSD	97.06
Zhang et al. (2020) [10]	CNN	GTSRB	99.38
		BTSC	98.89
Jonah and Orike (2021) [11]	VGG16	GTSRB	95.5
	ResNet50		95.4
	CNN		96.0
Vincent et al. (2020) [12]	CNN	GTSRB	98.44
Madan et al. (2019) [13]	Basic CNN	GTSRB	98.07
	Branching CNN		98.48
Serna and Ruichek (2018) [14]	CNN	GTSRB	99.37
		ETSD	98.99
Chen et al. (2017) [15]	MCNN	GTSRB	97.96
	MCNN		98.26
Zheng and Jiang (2022) [16]	DenseNet	GTSRB	98.82
		CCTSDB	99.42
	ShuffleNet	ICTS	99.11
	RealFormer	GTSRB	86.03
	TNT	CCTSDB	95.05
Usha et al. (2021) [17]	CNN	GTSRB	97.80
Fang et al. (2022) [18]	MicronNet-BF	GTSRB	99.38
Fu and Wang (2021) [19]	MSCN + MCDNN	TSRD (train), GTSRB (test)	90.13
Aziz and Youssef (2018) [20]	HOG, CLBP, Gabor, ELM	GTSRB	99.10
		BTSC	98.30
Soni et al. (2019) [21]	LBP, HOG, PCA, SVM	TSRD (Chinese)	84.44