Bounding Box Based Perception Modules

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Paweł Kowalczyk	--	1166	2022-04-26 09:21:53	\|
2	update references and layout	Amina Yu	-10 word(s)	1156	2022-04-27 03:50:54	\|

This entry is adapted from the peer-reviewed paper 10.3390/electronics11081182

quality metrics image detection bounding box

1. Perception Module

Perception modules use raw data streams obtained from sensors mounted on car like camera, radar or lidar devices to recognize and interpret the surroundings. Raw data collected by sensors must be properly interpreted and processed to be understood by computer. This type of analysis is carried out by algorithms supported mainly by many trained neural networks (detectors). Most perception modules based on computer vision systems use bounding boxes to mark the recognized objects in each separate frame of the video stream. Bounding boxes studied are rectangles with sides aligned in parallel to the sides of the frame, and thus they can be stored as four coordinates of their opposite corners. Each perception module in a vehicle is specialized in a particular task. Bounding boxes are used to mark objects such as pedestrians, other vehicles, their lights (separately), road signs, speed bumps and traffic light signalization in streams of video data ^[1]^[2]^[3]^[4]^[5]^[6]. Based on this description, the vehicle steering system can make decisions in accordance with pre-programmed protocols and logic called ADAS (Advanced Driver Assistance Systems). It is thus fundamentally important that this description is adequate, detailed and reliable, to secure the basis on which the decisions are made.

2. Testing and Verification of Car Perception

In the process of developing perception modules, it is important to check the quality of their interpretation of data from sensors. Such verification must be carried out with high regularity if essential changes are introduced into their algorithms. Data collected by the sensors is saved during the original data collection process which involves fleet of testing cars with mounted sensors on board and logging machines to save raw data (for example, video from camera mounted in front of vehicle). When such data is returned to the laboratory, it can be reused in the resimulation process, which is the coordinated reconstruction of the time-ordered stream of information from sensors. Subsequently, this information is implanted into the input of the perception module version that is being tested at the moment. In this way, perception model results for given scenes are obtained. Those results consist of bounding boxes describing different elements of surrounding recorded on frames coming from the camera. In order to train detectors and verify its effects it is necessary to first describe exactly what in collected by the sensors data should be found and interpreted by the perception modules. The reference system towards which can be compared the results that come from the detectors is called ground truth (GT). To create this reference, raw video data needs to be labeled manually which means creation of description of the expected results from the perception modules, frame by frame. This is handled by the staff of appropriately trained people who manually analyze the collected sensor data and label it. It is done according to predefined principles, it is laborious and time-consuming. Based on these additional data, it is finally possible to calculate the quality of the results obtained from the perception module, which in a broad context enables their development and evaluation of effectiveness in real conditions. In order to reduce the time and hardware complexity of such analysis high automation is required. Well designed algorithm provides reproducibility of the evaluation results. Research on the development of the perception of smart vehicle are related to human safety and are designed to minimize the amount and damage caused by road accidents. It is therefore necessary to create methodology that will reliably and objectively assess the quality of prototypes and enable quick and effective problem localization ^[7]. The requirements such as, for example, SOTIF ^[8] (Safety Of The Intended Functionality) are created so that all car manufacturers, researchers and lawmakers could use a universal set of requirements, recommendations and good practices. This document underlines the need to confirm the effectiveness of ADAS in different situations for all functionalities those systems provides.

3. Evaluation Methodology

Huge amount of work and resources devoted to acquiring, storing and preparing them motivates the creation of evaluation methodology that make the best use of them. There is a need to ensure efficiency of development, testing and validation process of perception modules, to evaluate how well the system output depicts the stream of ground truth. This means that task involves a methodology for the comparison of two rectangles as well as sequences of them that will provide specific information relevant to context of module specialization. Amounts of data that needs to be analyzed introduces the need for full automation and repeatability of such a process. This methodology has to be clearly decisive therefore, it is important to design approaches that will allow the synthesis of detailed conclusions to use the potential of collected data. It is to describe a novel evaluation methodology for perception modules used in automotive vehicles. Designed solutions should aid engineers to estimate quality of detectors working on video data that comes from camera mounted at the front of the moving vehicle. It is well suited to compare GT and detectors output as a form of bounding boxes. It provides tools to assess local quality in separate pictures and summarize results in sequence of frames (tracking quality) while understanding the need of quick response in traffic conditions on the road. Presented methodology can serve as precise definition of correct recognition that highlights the fact that various types of objects, although all described by rectangles require their own special approach to evaluation. It is achieved by focusing the measures on different aspects of rectangles which allows to filter specific information about the comparison and assign appropriate meaning to it for the whole analysis. Quality measure can be successfully used as a base for matching algorithm but beside the definition of true positive it should be treated as a metric that is directly interpreted and passed to higher levels of evaluation chain—bounding box sequence analysis. To help with interpretation of results that was proposed ways to visualize output of it. Both for quality of GT representation, as well as alerts of false positives which are natural consequence of matching algorithm. Methodology was presented separately for the following object classes examples: pedestrians, moving vehicles, traffic lights and signs. To achieve target of adaptation to different classes of objects and different applications in the evaluation process (matching, quality summary, combining the sequence of false positive results) methodology has to be parameters reliant. The calibration process—meaning of all parameters and their influence on final results is described. ^[1]^[2]^[3]^[4]^[5]^[6]^[7]^[8]

References

Jiang, S.; Liang, S.; Chen, C.; Zhu, Y.; Li, X. Class Agnostic Image Common Object Detection. IEEE Trans. Image Process. 2019, 28, 2836–2846.
Kim, H.; Kim, C. Locator-Checker-Scaler Object Tracking Using Spatially Ordered and Weighted Patch Descriptor. IEEE Trans. Image Process. 2017, 26, 3817–3830.
Long, Y.; Gong, Y.; Xiao, Z.; Liu, Q. Accurate Object Localization in Remote Sensing Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2486–2498.
Ren, W.; Huang, K.; Tao, D.; Tan, T. Weakly Supervised Large Scale Object Localization with Multiple Instance Learning and Bag Splitting. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 405–416.
Zeng, X.; Ouyang, W.; Yan, J.; Li, H.; Xiao, T.; Wang, K.; Liu, Y.; Zhou, Y.; Yang, B.; Wang, Z.; et al. Crafting GBD-Net for Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2109–2123.
Zhang, X.; Cheng, L.; Li, B.; Hu, H. Too Far to See? Not Really!—Pedestrian Detection With Scale-Aware Localization Policy. IEEE Trans. Image Process. 2018, 27, 3703–3715.
Uřičář, M.; Hurych, D.; Krizek, P.; Yogamani, S. Challenges in Designing Datasets and Validation for Autonomous Driving. arXiv 2019, arXiv:1901.09270.
ISO/PAS 21448:2019; Road Vehicles—Safety of the Intended Functionality. International Organization for Standardization: London, UK, 2019.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Automation & Control Systems

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Paweł Kowalczyk

Jacek Izydorczyk

View Times: 484

Update Date: 27 Apr 2022

Table of Contents

Video Upload Options

Confirm

1. Perception Module

2. Testing and Verification of Car Perception

3. Evaluation Methodology

References