Perception modules use raw data streams obtained from sensors mounted on car like camera, radar or lidar devices to recognize and interpret the surroundings. Raw data collected by sensors must be properly interpreted and processed to be understood by computer. This type of analysis is carried out by algorithms supported mainly by many trained neural networks (detectors). Most perception modules based on computer vision systems use bounding boxes to mark the recognized objects in each separate frame of the video stream.
1. Perception Module
Perception modules use raw data streams obtained from sensors mounted on car like camera, radar or lidar devices to recognize and interpret the surroundings. Raw data collected by sensors must be properly interpreted and processed to be understood by computer. This type of analysis is carried out by algorithms supported mainly by many trained neural networks (detectors). Most perception modules based on computer vision systems use bounding boxes to mark the recognized objects in each separate frame of the video stream. Bounding boxes studied
in this work are rectangles with sides aligned in parallel to the sides of the frame, and thus they can be stored as four coordinates of their opposite corners. Each perception module in a vehicle is specialized in a particular task. Bounding boxes are used to mark objects such as pedestrians, other vehicles, their lights (separately), road signs, speed bumps and traffic light signalization in streams of video data
[1][2][3][4][5][6][1,2,3,4,5,6]. Based on this description, the vehicle steering system can make decisions in accordance with pre-programmed protocols and logic called ADAS (Advanced Driver Assistance Systems). It is thus fundamentally important that this description is adequate, detailed and reliable, to secure the basis on which the decisions are made.
2. Testing and Verification of Car Perception
In the process of developing perception modules, it is important to check the quality of their interpretation of data from sensors. Such verification must be carried out with high regularity if essential changes are introduced into their algorithms. Data collected by the sensors is saved during the original data collection process which involves fleet of testing cars with mounted sensors on board and logging machines to save raw data (
for i.e
xample., video from camera mounted in front of vehicle). When such data is returned to the laboratory, it can be reused in the resimulation process, which is the coordinated reconstruction of the time-ordered stream of information from sensors. Subsequently, this information is implanted into the input of the perception module version that is being tested at the moment. In this way, perception model results for given scenes are obtained. Those results consist of bounding boxes describing different elements of surrounding recorded on frames coming from the camera. In order to train detectors and verify its effects it is necessary to first describe exactly what in collected by the sensors data should be found and interpreted by the perception modules.
TheOur reference system towards which
can be we can compare
d the results that come from the detectors is called ground truth (GT). To create this reference, raw video data needs to be labeled manually which means creation of description of the expected results from the perception modules, frame by frame. This is handled by the staff of appropriately trained people who manually analyze the collected sensor data and label it. It is done according to predefined principles,
itthis work is laborious and time-consuming. Based on these additional data, it is finally possible to calculate the quality of the results obtained from the perception module, which in a broad context enables their development and evaluation of effectiveness in real conditions. In order to reduce the time and hardware complexity of such analysis high automation is required. Well designed algorithm provides reproducibility of the evaluation results. Research on the development of the perception of smart vehicle are related to human safety and are designed to minimize the amount and damage caused by road accidents. It is therefore necessary to create methodology that will reliably and objectively assess the quality of prototypes and enable quick and effective problem localization
[7]. The requirements such as, for example, SOTIF
[8] (Safety Of The Intended Functionality) are created so that all car manufacturers, researchers and lawmakers could use a universal set of requirements, recommendations and good practices. This document underlines the need to confirm the effectiveness of ADAS in different situations for all functionalities those systems provides.
3. Evaluation Methodology
Huge amount of work and resources devoted to acquiring, storing and preparing them motivates the creation of evaluation methodology that make the best use of them. There is a need to ensure efficiency of development, testing and validation process of perception modules, to evaluate how well the system output depicts the stream of ground truth. This means that task involves a methodology for the comparison of two rectangles as well as sequences of them that will provide specific information relevant to context of module specialization. Amounts of data that needs to be analyzed introduces the need for full automation and repeatability of such a process. This methodology has to be clearly decisive therefore, it is important to design approaches that will allow the synthesis of detailed conclusions to use the potential of collected data. IThe purpose of this article is to describe a novel evaluation methodology for perception modules used in automotive vehicles. Designed solutions should aid engineers to estimate quality of detectors working on video data that comes from camera mounted at the front of the moving vehicle. It is well suited to compare GT and detectors output as a form of bounding boxes. It provides tools to assess local quality in separate pictures and summarize results in sequence of frames (tracking quality) while understanding the need of quick response in traffic conditions on the road. Presented methodology can serve as precise definition of correct recognition that highlights the fact that various types of objects, although all described by rectangles require their own special approach to evaluation. It is achieved by focusing the measures on different aspects of rectangles which allows to filter specific information about the comparison and assign appropriate meaning to it for the whole analysis. Quality measure can be successfully used as a base for matching algorithm but beside the definition of true positive it should be treated as a metric that is directly interpreted and passed to higher levels of evaluation chain—bounding box sequence analysis. To help with interpretation of results that waswe proposed ways to visualize output of itthis analysis. Both for quality of GT representation, as well as alerts of false positives which are natural consequence of matching algorithm. Methodology was presented separately for the following object classes examples: pedestrians, moving vehicles, traffic lights and signs. To achieve target of adaptation to different classes of objects and different applications in the evaluation process (matching, quality summary, combining the sequence of false positive results) methodology has to be parameters reliant. The calibration process—meaning of all parameters and their influence on final results is described in work.