Computer Vision Applications in Intelligent Transportation Systems

Computer Vision Applications in Intelligent Transportation Systems: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Transportation Science & Technology

Contributor:

As technology continues to develop, computer vision (CV) applications are becoming increasingly widespread in the intelligent transportation systems (ITS) context. These applications are developed to improve the efficiency of transportation systems, increase their level of intelligence, and enhance traffic safety. Advances in CV play an important role in solving problems in the fields of traffic monitoring and control, incident detection and management, road usage pricing, and road condition monitoring, among many others, by providing more effective methods.

intelligent transportation systems
computer vision
automatic number plate recognition (ANPR)
traffic sign detection
vehicle detection
pedestrian detection
lane line detection
obstacle detection
anomaly detection
structural damage detection
autonomous vehi

1. Introduction

Smart city technologies are an important element of effectively managing the rapid industrialization of the world today, as they can help to address the economic and environmental problems resulting from the increase in urban populations. Smart cities, which integrate traditional infrastructure and public services with technology to create a more efficient, sustainable, and accessible system while meeting the needs of city residents, also transform the traditional understanding of city management. Intelligent transportation systems (ITS), which are among the key components of smart cities, are developed to improve transportation safety and mobility, reduce environmental impact, promote sustainable transportation development, and enhance productivity [1].

ITSs offer modern solutions to transportation-related problems, such as traffic jams and accidents, and help to ensure the safety of road users by utilizing data collected from surrounding vehicles, infrastructure, and other networks. ITS applications exist in a variety of forms, including collaborative highway maneuvers, sharing road safety information, optimization of traffic signals, and autonomous driving [2]. ITS, which can be defined as integrated transportation management systems consisting of advanced data communication, information processing, and traffic management technologies, can instantly process real-time data collected from heterogeneous sources and analyze it to support better decision making [3].

Decisions that were formerly made based on human experience can now be made using computers by digitizing information. Moreover, predictions and forecasting can also be improved through the use of new-generation artificial intelligence (AI) algorithms. Thanks to AI technologies, it is possible to develop systems that can make decisions based on data. These technologies have also led to radical changes in many areas, including public transportation and transportation systems, and have helped to make different modes of transportation safer, greener, smarter, and more efficient [4]. Yuan et al. [5] divide AI applications in the field of ITS into three main categories, namely (i) detection/recognition, (ii) prediction, and (iii) management. Machine learning (ML) methods, a sub-branch of AI, act as the brain function of ITS and determine the accuracy, reliability, and smartness of the systems. In particular, in recent years, it has been observed that deep learning (DL) methods, which are a subset of ML methods, are being effectively utilized in classification and prediction works in different areas of ITS [3].

Computer vision (CV) is an AI field that enables machines to derive meaningful information from digital images, videos, and other visual inputs, as well as to act based on this information [6]. CV, in which both ML and DL methods are used, addresses image and video processing problems and offers solutions that can be used in the process of automating transportation systems and making them safer. CV techniques are actively used in various ITS applications, such as automatic license plate detection and recognition, traffic sign detection and recognition, vehicle detection and classification, pedestrian detection, obstacle and lane line detection, anomaly detection in video surveillance cameras, vehicle and passenger tracking, structural damage detection, and autonomous vehicle applications. CV methods are appealing in these applications largely due to their cost-effectiveness, as well as the wide range of applications that CV can support [7].

CV methods used in ITS are categorized and examined under 10 headings, as shown in Figure 1.

Figure 1. Computer Vision Applications in Intelligent Transportation Systems.

CV applications in the field of ITS, along with the methods used, datasets, performance evaluation criteria, and success rates, are examined in a holistic and comprehensive way.
The problems and application areas addressed by CV applications in ITS are investigated.
The potential effects of CV studies on the transportation sector are evaluated.
The applicability, contributions, shortcomings, challenges, future research areas, and trends of CV applications in ITS are summarized.
Suggestions are made that will aid in improving the efficiency and effectiveness of transportation systems, increasing their safety levels, and making them smarter through CV studies in the future.
This research surveys over 300 studies that shed light on the development of CV techniques in the field of ITS. These studies have been published in journals listed in top electronic libraries and presented at leading conferences. The survey further presents recent academic papers and review articles that can be consulted by researchers aiming to conduct detailed analysis of the categories of CV applications.
It is believed that this survey can provide useful insights for researchers working on the potential effects of CV techniques, the automation of transportation systems, and the improvement of the efficiency and safety of ITS.

2. Computer Vision Studies in the Field of ITS

2.1. Evolution of Computer Vision Studies

While there are many methods used in CV studies in the literature, the methods most commonly used in the field of ITS are summarized in the following sections.

2.1.1. Handcrafted Techniques

Early CV researchers focused primarily on the use of different handcrafted spatiotemporal features and traditional image-processing methods [8]. Handcrafted features are those obtained with the help of several algorithms using the information that exists in the image itself. These features have been widely used in previous works using traditional ML approaches for object recognition.

Deformable part-based models, integral channel features (ICF), aggregated channel features (ACF), histograms of oriented gradients (HOG), local binary patterns (LBPs), scale-invariant feature transform (SIFT), Gabor filters, local ternary patterns (LTPs), local phase quantization (LPQ), rotation-invariant co-occurrence local binary patterns, completed local binary patterns, rotated local binary pattern images, and globally rotation-invariant multi-scale co-occurrence local binary patterns are among the handcrafted techniques that were used to extract features from images in previous studies [9]. Newer approaches, such as convolutional neural networks (CNNs), do not require such handcrafted features, as they can learn features from the image data.

2.1.2. Machine Learning and Deep Learning Methods

Machine learning, one of the most prominent subfields of AI, deals with the design and creation of algorithms for the recognition of complex patterns and decision making based on experimental data [10]. Problems handled with ML methods can be broadly categorized into (i) supervised, (ii) unsupervised, and (iii) reinforcement learning methods. In supervised learning, the goal is to estimate an output by taking feature vectors as inputs. Here, the ML algorithms establish a temporary model between the input and output values. The model attempts to estimate the output of the test data, which it has never seen before. If the ML model divides the input data into certain categories, then it is considered to be a classification; if the model tries to find continuous values using input values, then it is considered to be a regression. For both problems, the data must be labeled beforehand. The most frequently used algorithms for classification are support vector machine (SVM), collective/ensemble learning, k-nearest neighbors, and random forest (RF). Support vector regression and Gaussian process regression models are used in the literature for regression. Supervised learning models have been used for the classification of vehicles [11,12], classification of traffic lights [13], recognition and classification of license plate characters [14,15,16,17], detection of traffic signs [18], detection of pedestrians [19,20], etc.

Since assigning labels to millions of data points is a laborious and inefficient process, unlabeled data can be grouped through the use of unsupervised learning algorithms. Using different mathematical infrastructures, these algorithms classify data according to their own criteria. Among the unsupervised learning algorithms, methods such as k-means, density-based spatial clustering of applications with noise (DBSCAN), and the Gaussian Mixture Model (GMM) are used to identify groups and clusters. Unsupervised learning models have been used for the recognition of license plates [21], detection of obstacles [22], detection of road cracks [23], etc.

Based on the idea that there may be no available training data in some cases, reinforcement learning models have been developed, inspired by the knowledge acquisition processes of infants. These algorithms utilize a type of learning that tries to find the steps that a subject (a robot, an autonomous vehicle, etc.) must perform in order to receive the highest reward in the environment. Subjects working according to the reward–punishment mechanism perform actions in an attempt to understand the environment. After a range of these actions have been performed, the steps that lead to the highest reward score are saved, and these turn into behaviors. There are studies in the literature in which reinforcement learning methods were used in traffic signal control systems [24], traffic timing applications [25], and for the detection of lane lines [26].

Although traditional ML methods such as SVM [16,19,27], Bayesian networks [28], and the Kalman filter (KF) [29,30] were used in early ITS research [3], the problem-solving capabilities of algorithms have improved over time due to the development of hardware resources and the increasing amount of training data. As can be seen in Figure 2, while the concept of ML was dominant in the years between 1980 and 2010 [31], it was observed that these classical ML algorithms had difficulties processing large amounts of data; in response, artificial neural network (ANN)-based models began to emerge. However, since classical ANN models are insufficient for processing big data, modern ANN structures have been developed, which led to the development of DL models. While models that extract features from images and those that perform classification using these features are separate in ML algorithms, DL models can perform both processes in a single artificial neural stack.

Figure 2. Evolution of Artificial Intelligence, Machine Learning, and Deep Learning.

2.1.3. Deep Neural Networks (DNNs)

A DNN consists of multiple layers of ANN architectures and DNN models. It contains an input layer, one or more hidden layers, and an output layer. As a groundbreaking innovation, DNNs have produced satisfactory results on basic tasks such as the classification, detection, and segmentation of objects. Thus, AI technologies have become important in the field of ITS thanks to DNNs.

There are many types of DNN models which are used for different purposes. For example, deep belief networks (DBN) have been used for facial recognition [32] and crack detection [33]; stacked auto-encoder (SAE) networks have been used for object detection [34], image compression [35], and video retrieval [36]; restricted Boltzmann machines (RBM) have been used for face recognition [37], and YOLO (You Only Look Once)-based DL methods have been utilized in object-detection [38] tasks.

2.1.4. Convolutional Neural Networks (CNNs)

In the field of CV, the DNN most widely used to extract features from images is the CNN. In essence, CNNs try to imitate the working principles of the human brain and visual cortex, making use of multiple layers to recognize objects. One of the outstanding strengths of CNNs is their ability to classify objects into thousands of classes. Other advantages of CNNs include their relative robustness to image noise, along with their robustness to rotation and changes in the position of objects in an image. Their biggest disadvantages are their long training time and the need for a large training dataset [39]. The use of graphics cards and parallel processors during training contributes positively to the training and classification time of CNN models.

Variants of CNN networks are widely used in CV studies in the field of ITS. There are a number of CNN-based studies in the literature, such as those focused on automatic license plate recognition [40,41], traffic sign detection and recognition [25,42,43,44,45,46,47,48,49,50,51], vehicle detection [52,53,54,55], pedestrian detection [56,57,58,59,60], lane line detection [61,62,63], obstacle detection [64], video anomaly detection [65,66,67,68], structural damage detection [69,70,71,72,73,74,75,76,77,78], and steering angle detection [79,80,81,82] in autonomous vehicles. The most popular and advanced CNN-based architectures in the literature [83,84] are presented in Figure 3.

Figure 3. CNN Architectures.

2.1.5. Recurrent Neural Networks (RNNs)

RNNs are specially designed for modeling sequence data. The RNN is a powerful DL method, as it can directly learn the mapping between input and output sequences. However, traditional RNNs are impacted by the gradient vanishing problem. Long short-term memory (LSTM) networks were developed to solve this problem. An LSTM network is a type of RNN that can learn order dependence in sequence prediction tasks. In LSTM networks, memory cells are designed to maintain their state over time and learn long-term dependencies. RNNs have been used for license plate recognition [85], lane line detection [63], and crack classification [76] tasks, as well as in autonomous vehicle applications [86].

The gated recurrent unit (GRU) is a simplified variant of LSTM that does not contain discrete memory cells. The GRU is faster to train, while retaining its resilience to the vanishing gradient problem.

Convolutional LSTM networks have been used for the detection of anomalies in videos [87,88,89,90,91,92], as well as in autonomous vehicle applications [86,93], while a convolutional GRU network was used for video anomaly detection [94].

2.1.6. Generative Adversarial Networks (GANs)

The GAN is an approach based on generative modeling that uses DL methods to produce high-quality images. In recent years, GANs have been widely studied by DL communities in the context of video anomaly detection studies.

Generative modeling is an unsupervised learning method that involves automatically discovering and learning regularities or patterns in the input data, which the model can use to generate or create new examples that may be reasonably drawn from the original dataset. GANs are based on a learning approach that utilizes two sub-models, called the discriminator and generator, to train generative models. GAN is based on the idea of training implicitly through the discriminator, which is an ANN that dynamically updates itself and can gauge how realistic the input appears. Rather than minimizing the difference from a particular image, the generator learns in an unsupervised manner to fool the discriminator. GANs have been widely used in recent video anomaly detection studies [95,96,97,98,99].

2.1.7. Other Methods

Hybrid methods include a combination of multiple ML or DL methods used in CV techniques. There are many intelligent transportation applications for this approach, such as license plate recognition [85,100,101], video anomaly detection [68,89,92,102], automatic license plate recognition [25,103], vehicle detection [11,12,53,55], pedestrian detection [58,104], lane line detection [63,105], obstacle detection [106,107,108,109,110], structural damage detection [111,112,113], and autonomous vehicle applications [13,114,115].

Vaswani et al. [116] introduced an encoder–decoder architecture based on attention layers, named the transformer. A transformer neural network takes an input sentence in the form of a sequence of vectors, converts it into a vector called an encoding, and then decodes it back into another sequence. An essential part of the transformer is the attention mechanism, which represents how important other tokens in an input are for the encoding of a given token. Transformers are used for image classification, object detection, and image compression in CV applications. In the field of ITS, they have been used in license plate recognition [85], pedestrian detection [117], and driver distraction detection [118] studies.

2.2. Computer Vision Functions

Among the data emerging in the field of ITS, visual data are among the most voluminous kind. CV studies enable the analysis of both images and videos and provide detailed information about the traffic situation. Figure 4 presents some of the basic functions performed by CV techniques in the field of ITS. As can be seen from the figure, CV methods play a significant role in performing basic functions such as (i) classification, (ii) object detection, (iii) semantic segmentation, and (iv) instance segmentation [119].

Figure 4. Basic Functions Performed by Computer Vision Techniques in the Field of ITS.

Object classification can be performed by using CV techniques to process the image or video data obtained by the cameras. A label can be assigned automatically to each sub-object in the image. To achieve this, the objects are divided into parts and given to the model.

Another function performed using CV techniques is object detection. The detection of traffic objects such as vehicles and pedestrians in an image plays a vital role in the development of many applications. Important functions, such as detecting traffic density, detecting pedestrians that suddenly appear on the road, or detecting the locations of other vehicles for autonomous driving vehicles, can be performed with DL-based object detection models. The main feature that distinguishes object detection from classification is that the former can determine the coordinates of the area in which it is located, in addition to classifying each relevant object in the image. AI models of this kind can perform both classification and regression. The object with corner coordinate points becomes positionable by the machine in the image.

In the semantic segmentation context, all pixels belonging to objects are classified. As can be seen in Figure 4, cars are automatically marked in blue and pedestrians in red by CV techniques. Grouping all pixels of the object and assigning the appropriate class to each is a challenging problem. Semantic segmentation models assign the same groups of objects to a single class. However, vehicles and pedestrians in traffic sometimes need to be grouped individually. Under these circumstances, instance segmentation methods are used. The purpose of instance segmentation, like semantic segmentation, is to assign classes to pixels. With instance segmentation, objects belonging to the same class can be grouped separately, even if they overlap.

A framework outlining which problems in the field of ITS can be solved with CV techniques adapted from [120] is presented in Table 1.

Table 1. Problems Addressed by CV Techniques in the Field of ITS.

Computer Vision Function	Application Areas	Sample Datasets	Performance Metrics
Object Detection	Problem: Boxing the objects in the image/video and finding their coordinates in the image Detection of traffic lights Detection and classification of traffic signs Pedestrian detection Detection of vehicle type and vehicle counting	COCO CityScape ImageNet LISA GTSDB (German Traffic Light Detection) Pascal VOC CIFAR-10/CIFAR-100	mAP (mean average precision) Accuracy Precision Recall AP (average precision) RMSE (root mean squared error)
Object Segmentation	Problem: Classifying the pixels of the objects in the image and thus obtaining the individual masks of the object Speed estimation Determination and tracking of road lines Route optimization	COCO CityScape BD100K KITTI LISA	mAP
Image Enhancement	Problem: Restoring images that have been corrupted by low lighting, haze, rain, and fog Removal of raindrops on images obtained from camera sensors Bringing low-resolution objects up to high resolution Sharpening blurry images Conversion of fish-eye camera systems to Cartesian coordinate system	REDS	PSNR (peak signal-to-noise ratio)
Object Tracking	Problem: Tracking objects in video Tracking of pedestrians and vehicles Vehicle speed detection Route extraction	MOT19	MOTA (multiple object tracking accuracy)
Event Identification/Prediction	Problem: Making sense of what happened in the video Accident recognition/prediction Congestion estimation Detection of dangerous situations and routes	UCF101 Kinetics600	Accuracy mAP
Anomaly Detection	Problem: Detection of abnormal behavior in transportation systems Pedestrians/objects suddenly appearing on the road Anomalies that may arise in rail systems Detection of improper driver behaviors (drowsy/drunk driving, text messaging, cell phone use, etc.) Detection of traffic rule violations and suspicious vehicles with automatic license plate recognition systems	UCSD Ped1 UCSD Ped 2 Avenue UMN UCF Real World Street Scene CIFAR-10/CIFAR-100 ShanghaiTech	AUC (area under curve) Accuracy mAP
Density Analysis	Problem: Determining the density of pedestrians, passengers, or vehicles Density analysis in public transport contexts Automatic detection of traffic jams Determination of vehicle density in parking lots Detecting the density of pedestrians in certain locations	Oxford 5K UCSD Mall UCF_CC_50 ShanghaiTech WorldExpo’10	MAE (mean absolute error) MSE (mean squared error)
Image/Event Search	Problem: Extraction of certain vehicles, pedestrians, or license plates from existing visual archives Searching of target plates in traffic camera archives for law enforcement units Searching digital archives for people or vehicles for security purposes Detection of similar objects belonging to a certain object category.	Oxford 5K Pascal VOC	Accuracy

Computer Vision Function

Application Areas

Sample Datasets

Performance Metrics

Object

Detection

Problem: Boxing the objects in the image/video and finding their coordinates in the image

Detection of traffic lights
Detection and classification of traffic signs
Pedestrian detection
Detection of vehicle type and vehicle counting

COCO

CityScape

ImageNet

LISA

GTSDB (German Traffic Light Detection)

Pascal VOC

CIFAR-10/CIFAR-100

mAP (mean average precision)

Accuracy

Precision

Recall

AP (average precision)

RMSE (root mean squared error)

Object

Segmentation

Problem: Classifying the pixels of the objects in the image and thus obtaining the individual masks of the object

Speed estimation
Determination and tracking of road lines
Route optimization

COCO

CityScape

BD100K

KITTI

LISA

mAP

Image

Enhancement

Problem: Restoring images that have been corrupted by low lighting, haze, rain, and fog

Removal of raindrops on images obtained from camera sensors
Bringing low-resolution objects up to high resolution
Sharpening blurry images
Conversion of fish-eye camera systems to Cartesian coordinate system

REDS

PSNR (peak signal-to-noise ratio)

Object

Tracking

Problem: Tracking objects in video

Tracking of pedestrians and vehicles
Vehicle speed detection
Route extraction

MOT19

MOTA (multiple object tracking accuracy)

Event

Identification/Prediction

Problem: Making sense of what happened in the video

Accident recognition/prediction
Congestion estimation
Detection of dangerous situations and routes

UCF101

Kinetics600

Accuracy

mAP

Anomaly

Detection

Problem: Detection of abnormal behavior in transportation systems

Pedestrians/objects suddenly appearing on the road
Anomalies that may arise in rail systems
Detection of improper driver behaviors (drowsy/drunk driving, text messaging, cell phone use, etc.)
Detection of traffic rule violations and suspicious vehicles with automatic license plate recognition systems

UCSD Ped1

UCSD Ped 2

Avenue

UMN

UCF Real World

Street Scene

CIFAR-10/CIFAR-100

ShanghaiTech

AUC (area under curve)

Accuracy

mAP

Density

Analysis

Problem: Determining the density of pedestrians, passengers, or vehicles

Density analysis in public transport contexts
Automatic detection of traffic jams
Determination of vehicle density in parking lots
Detecting the density of pedestrians in certain locations

Oxford 5K

UCSD

Mall

UCF_CC_50 ShanghaiTech

WorldExpo’10

MAE (mean absolute error)

MSE (mean squared error)

Image/Event Search

Problem: Extraction of certain vehicles, pedestrians, or license plates from existing visual archives

Searching of target plates in traffic camera archives for law enforcement units
Searching digital archives for people or vehicles for security purposes
Detection of similar objects belonging to a certain object category.

Oxford 5K

Pascal VOC

Accuracy

3. Computer Vision Applications in Intelligent Transportation Systems

ITSs have made many contributions to transport systems, including improving transport safety, increasing transport system efficiency, aiding law enforcement, and boosting energy conservation and emissions reduction. CV applications play an important role in this context and are thus of interest to researchers. In the Web of Science (WoS) database, there are more than a thousand studies that have been published in the field of CV in ITS since 2000. Since the field of ITS is multi-disciplinary, it has been observed that these publications extend across multiple scientific publication categories, such as electrical/electronic engineering, computer science, transportation science technology, civil engineering, telecommunications, and automation control systems.

Research into the use of CV methods in road transport systems was presented in [7] and [121], while a comprehensive review of traditional CV techniques for traffic analysis systems with a particular focus on urban environments was presented in [122]. However, those studies lack the state-of-the-art CV methods developed within the last decade. ML techniques have been used effectively to make transportation systems more efficient, especially in recent years, in. In current research, it has been noted that traditional ML models are now being replaced by new learning techniques and that DL techniques are widely used in ITS. A comprehensive study focusing on the use of DL models to increase the intelligence of transportation systems was presented by Wang et al. in [3]. Authors explored the use of DL models in various transportation applications including (i) traffic sign recognition, (ii) traffic flow prediction, (iii) traffic speed prediction, and (iv) travel time prediction. Applicability and shortcomings of DL models in the context of ITS and evolving future trends were also argued.

This entry is adapted from the peer-reviewed paper 10.3390/s23062938

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.