Accurately estimating the pose of a vehicle is important for autonomous parking. The study of around view monitor (AVM)-based visual Simultaneous Localization and Mapping (SLAM) has gained attention due to its affordability, commercial availability, and suitability for parking scenarios characterized by rapid rotations and back-and-forth movements of the vehicle.
1. Introduction
One of the important technologies for the commercialization of fully autonomous parking is to accurately estimate the pose of the vehicle. Simultaneous Localization and Mapping (SLAM) technology has been developed for localizing the vehicle [
1]. Recently, visual SLAM [
2], which utilizes cameras as the main sensor, has demonstrated significant advancements in autonomous vehicles, wherein a front camera is commonly employed to estimate the vehicle pose by detecting changes in feature locations or intensity variations of pixels within the image [
3]. However, visual SLAM based on a front camera is less suitable for parking scenarios due to the narrow field of view (FOV) of the sensor and a motion bias problem [
4] in which the performance of SLAM in forward and backward motion is different.
Instead of a front camera, an around view monitor (AVM) has been also investigated for its application to autonomous parking in visual SLAM. AVM provides a bird’s eye view image using cameras facing in four different directions. Many studies [
5,
6,
7] have applied AVM-based visual SLAM to parking scenarios by taking advantage of wide FOV and no motion bias. These studies have used road-marking information as semantic features to avoid the deformation caused by Inverse Perspective Mapping (IPM). However, inaccurate pose estimation occurs due to AVM distortion error caused by uneven ground and inaccurate camera calibration.
Despite the fact that numerous approaches [
8,
9,
10,
11,
12] have been explored to achieve accurate bird’s eye view images, they still exhibit several limitations, including limited quantitative comparison between features observed in AVM and features in real-world environments and substantial efforts for collecting data. Alternatively, in [
5,
6], they avoided the influence of distortion errors by utilizing an additional Inertial Measurement Unit (IMU) sensor based on a pre-built map or leveraging an externally provided High Definition (HD) vector map. The approach of [
7] attempted to create an accurate map in real-time using the sliding window fusion technique without additional information, but it exhibited insufficient accuracy in pose estimation for autonomous parking.
2. Visual SLAM
2.1. Front-Camera-Based Visual SLAM
In the field of visual SLAM, there are various methods using a front camera as the main sensor. These can be broadly categorized into direct and feature-based methods. Direct methods use the raw image, while feature-based methods use only features in the image. Representative direct methods of visual SLAM include Direct Sparse Odometry (DSO), which estimates camera pose by minimizing the pixel intensity difference between two images [
13]. ORB-SLAM2 [
14] is a well-known feature-based method that estimates camera pose by minimizing the distance difference between matched feature points. Another method, Semi-direct Visual Odometry (SVO), combines the advantages of both direct and feature-based methods [
15]. However, these methods are not suitable for autonomous parking. A front camera with a narrow FOV can have challenges detecting changes in the surrounding environment of a vehicle during rapid rotation. Moreover, depth estimation in these methods causes a difference in SLAM performance between forward and backward motion [
4].
2.2. AVM-Based Visual SLAM
Various SLAM studies have proposed utilizing AVM as the main sensor for autonomous parking. Autonomous valet parking-SLAM (AVP-SLAM) [
5] estimates the pose of the vehicle by matching current road-marking features with a road-marking map created in the preceding process. AVP-Loc [
6] matches current road-marking features to an HD vector map of a parking lot for vehicle localization. In [
7], the edge information of both free space and road markings are utilized as input features to estimate the pose of a vehicle using Iterative Closest Point (ICP) [
16]. However, the AVM distortion error was not taken into account in these studies. In [
5], the inherent inaccuracies caused by the distortion error in the pre-built map necessitated an additional IMU to enhance the localization accuracy. The method of [
6] employed an externally provided HD vector map to avoid distortion errors. Although the approach in [
7] aimed to create an accurate map in real-time using the sliding window fusion technique without additional information, the construction of maps that include distortion errors negatively affects the localization performance of autonomous vehicles.
3. AVM Image Enhancement Techniques
3.1. AVM Image Modification Using Automatic Calibration
Several methods have been proposed for the automatic calibration of AVM images, utilizing extracted feature shapes to achieve accurate bird’s eye views. In [
8], a method was proposed to estimate the extrinsic parameters of four AVM cameras using point patterns. The method proposed in [
9] aims to calibrate an AVM image by matching the gradient and position of the lane observed from the front and rear cameras with that seen from the left and right cameras. On the other hand, the method proposed in [
10] focuses on calibrating the AVM image to ensure the detected parking lines are parallel or perpendicular to the vehicle. However, since these methods only perform relative comparisons between the images of each camera and do not quantitatively compare the AVM image with the real environment, it may not be considered as a complete solution to address AVM distortion errors.
3.2. AVM Image Generation Using Deep Learning
In recent years, there has been a shift towards deep learning-based AVM image generation methods that use Neural Networks (NN) as viewpoint transformation functions, instead of using homography geometric information. HDMapNet [
11] uses a Multilayer Perceptron (MLP) to convert each camera image from AVM into a bird’s eye view, and then creates an AVM image through an attaching process using installed camera locations. Another method, BEVformer [
12], uses the Transformer model to create an AVM image without an additional attaching process. However, these methods require a large amount of training data.
To estimate the vehicle pose without the need for pre-map construction or additional sensors, the algorithm of the Hybrid Bird’s-Eye Edge-Based Semantic Visual SLAM [
7] is selected. Instead of using the method directly, it was modified in two aspects to improve the pose estimation performance. First, parking line information is utilized as input data instead of edge information from semantic features and free space. Second, Generalized ICP [
17] is used instead of ICP for pose estimation. GICP is a pointcloud registration algorithm developed from ICP that addresses inaccurate correspondences. The modified Hybrid Bird’s-Eye Edge-Based Semantic Visual SLAM is used to examine the impact of distortion errors, resulting from inaccurate AVM calibration, on the performance of AVM-based SLAM.
This entry is adapted from the peer-reviewed paper 10.3390/s23187947