T In this study the Simultaneous Localization and Mapping (SLAM) technique has achieved astonishing progress and has generated considerable interest in theauthors first give an overview of the different SLAM implementation approaches and then discuss the applications of SLAM for autonomous driving community. With its conceptual roots in navigation and mapping, SLAM outperforms some traditional positioning and localization techniques since it can support more reliable and robust localization, planning, and controlling to meet some key criteria for autonomous driving. An overview of the differentwith respect to different driving scenarios, vehicle system components and the characteristics of the SLAM approaches. The authors then discuss some challenging issues and current solutions when applying SLAM for autonomous driving. Some quantitative quality analysis means to evaluate the characteristics and performance of SLAM systems and to monitor the risk in SLAM implementestimation approaches and then discuss the applications ofre discussed. In addition, this study describes a real-world road test to demonstrate a multi-sensor-based modernized SLAM procedure for autonomous driving with respect to different driving scenarios, vehicle system components and the characteristics of. The numerical results show that a high-precision 3D point cloud map can be generated by the SLAM approaches are givenprocedure with the integration of Lidar and GNSS/INS.
1. Introduction
Autonomous (also callDe
d spe
lf-driving, driverless, or robotic) vehicle operation is a significant academic as well as an industrial research topic. It is predicted that fully autonomous vehicles will become an important part of total vehicle sales in the next decades. The promotion of autonomous vehicles draws attention to the many advantages, such as service for disabled or elderly persons, reduction in driver stress and costs, reduction in road accidents, elimination of the need for conventional public transit services, etc. [1][2].
A typical autnding on the different characteristics of SLAM techniques, there could be different application
omous
vehicle system contains four key parts: localization, perception, planning, and controlling (Figure 1). Positioning is the process of obtaining a (for autonomo
ving or us
tatic) object’s coordinates with respect to a given coordinate system. The coordinate system may be a local coordinate system or a geodetic datum such as WGS84. Localization is a process of estimating the carrier’s pose (position and attitude) in relation to a reference frame or a map. The perception system monitors the road environment around the host vehicle and identifies interested objects such as pedestrians, other vehicles, traffic lights, signage, etc.
Figure 1. Functional components of an autonomous driving system.
By determining the co driving. One classification of the applications is whether they are offline or
dinates o
f objects in the surrounding environment a map can be generated. This process is known as Mapping.
Patnline. A map satisfying a h
planni
ng is the step that utilizes localization, mapping, and perception information to determine the optimal path in subsequent driving epochs, guiding the automated vehicle from one location to another location. This plan is then converted into action using the controlling system components, e.g., brake control before the detected traffic lights, etc.
All tgh-performance requirement is typically generated offline, such
ese parts are closely related. The location information for both vehicle and road entities can be obtained by combining the position, perception, and maas the High Definition (HD) map
information[1].
In cFo
ntrast, localization and mapping can be used to support better perception. Accurate localization and perception information is essential for correct planning and controlling.
To achir this kind of 3D point cloud map, an offline
ve fully automated driving, there are some key requirements that need to be considered for the localization and perception steps. The first is map generation process ensures the accuracy
. For autonomous driving, the information about where the road is and where the vehicle is within the lane supports the planning and controlling steps. To realize these, and to ensure vehicle safety, there is a stringent requirement for position estimation at the lane level, or even the “where-in-lane” level (i.e., the sub-lane level). Recognition range is important because the planning and controlling steps need enough processing time for the vehicle to react [3]. R and reliability of the map. Such maps can be pre-generated to support the real-time operatio
bustn
ess means the localization and perception should be robust to any changes while driving, such as of autonomous vehicles.
1 High Definition Map Generation and Updating
As
driving s
cenarios (urban, highway, tunnel, rural, etc.), lighting conditions, weather, etc.
Tratated
itional vehicle localization and perception techniques cannot meet all of the aforementioned requirements. For instance, GNSS error occurs as the signals may be distorted, or even blocked, by trees, urban canyons, tunnels, etc. Often an inertial navigation system (INS) is used to support navigation during GNSS signal outages, to continue providing position, velocity, and altitude information. However, inertial measurement bias needs frequently estimated corrections or calibration, which is best achieved using GNSS measurements. Nevertheless, an integrated GNSS/INS system is still not sufficient since highly automated driving requires not only positioningearlier, SLAM can be used to generate digital maps used for autonomous driving, such as the HD map information[1]. of the host vehicle, bDu
t also the spatial characteristics of the objects in the surrounding environment. Hence perceptive sensors, such as Lidar and Cameras, are often used for both localization and perception. Lidar can acquire a 3D point cloud directly and map the environment, with the aid of GNSS and INS, to an accuracy that can reach the centimeter level in urban road driving conditions [4]. Howevere to the stringent requirements,
the high
cost has limited the commercial adoption of Lidar systems in vehicles. Furthermore, its accuracy is influenced by weather (such as rain) and lighting conditions. Compared to Lidar, Camera systems have lower accuracy but are also affected by numerous error sources [5][6]. Neverquality sensors are used. Lidar is one of th
ele
ss, they are much cheaper, smaller in size, require less maintenance, and use less energy. Vision-based systems can provide abundant environment information, similar to what human eyes can perceive, and the data can be fused with other sensors to determine the location of detected features.
A core sensors for automated cars as it can generate map with
rich road environment information is essential for the aforementioned sensors to achieve accurate and robust localization and perception. Pre-stored road information makes autonomous driving robust to the changing environment and road dynamics. The recognition range requirement can be satisfied since an onboard map can pigh-density 3D point clouds. High-end GNSS and INS technology are also used to provide
timely information on the road network. Map-based localization and navigation have been studied using different types of map inaccurate position information.
Google Map is one example as itCameras can provide
s worldwide map i information
including images, topographic details, and satellite images [7],that is and it is
available via mobile phone and vehicle apps. However, the use of maps will be limited by the accuracy of the maps, and in some selected areas the map’s resolution may be inadequate. In [8], imilar to the information detect
he
authors considered low-accuracy maps for navigation by combining data from other sensors. They detected moving objects using Lidar data and utilized a GNSS/INS system with a coarse open-source GIS map. Their results show their d by human eyes. The fusion
technique can successfully detect and track moving objects. A precise curb-map-based localization method that uses a 3D-Lidar sensor and a high-precision map is proposed in [9]. However, of sensor dat
his method will fa
il when curb information is lacking, or obstructed.
Recently, and so-ca
lled “high-definition” (HD) maps have received considerable interest in the context of autonomous driving since they contain very accurate, and large volumes of, road networknalysis of road information
[10]. According to
somge
major players in the commercial HD map market, 10–20 cm accuracy has been achieved [11][12], and it is predicted that in the next generation of nerate HD maps
, a few centimeters of accuracy will be reached. Such maps contain conneeds considerable
information on road features, not only the static road entities and road geometry (curvature, grades, etc.), but also traffic management information such as traffic signs, traffic lights, speed limits, road markings, and so on. The autonomous car can use the HD map to precisely locate the host-car within the road lane and to estimate the relative location of the car with respect to road objects by matching the landmarks which are recognized by onboard sensors with pre-stored information within the HD map.
computational power, which is not feasible in current onboard vehicle systems. Therefore
, maps, especiallythe HD map
s, play several roles in support of autonomous driving and may be able to meet the stringent requirements of accuracy, precision, recognition ranging, robustness, and information richness. However, the application of the “map” for autonomous driving is also facilitated by te is built-up offline, using techniques such as
Simultaneous Localoptimization
and Mapping (SLAM). SLAM is a process by which a moving platform builds a map of the environment and uses that map to deduce its location at the same time. SLAM, which is widely used in the robotic field, has been demonstrated [13][14] as be-based SLAM. The offlin
g applicable
for autonomous vehicle operations as it can support not only accurate map generation but also online localization within a previously generated map.
With appropriatmap creation can be pe
sensor
information (perception data, absolute and dead reckoning position information), a high-density and accurate map can be generated offline by SLAM. When driving, the self-driving car can locate itself within the pre-stored map by matching the sensor data to the map. SLAM can also be used to address the problem of DATMO (detection and tracking of moving objects) [15] which is iformed by driving the road network several times to collect inform
porta
nt for detecting pedestrians or other moving objects. As the static parts of the environment are localized and mapped by SLAM, the dynamic components can concurrently be detected and tracked relative to the static objects or features. However, SLAM also has some challenging issues when applied to autonomous driving applications. For instance, “loop closure” can be used to reduce the accumulated bias within SLAM estimation in indoor or urban scenarios, but it is not normally applicable to highway scenarios.
2. Key SLAM Techniques
Sition, and then all the collected perceptive sensor information and position information
ce i
ts initial introduction in 1986 [16], a varis proce
ty of SLAM techniquess
have been developed. SLAM has its conceptual roots in geodesy and geospatial mapping [17].
In general, ted together
e are two types of approaches to SLAM estimation: filter-based and optimization-based. Both approaches estimate the vehicle pose states and map states at the same time. The vehicle pose includes 3D or 2D vehicle position, but sometimes also velocity, orientation or attitude, depending on the sensor(s) used and on the application(s)to improve the accuracy of the final map.
2.1. Online and Offline SLAM
Figure 2 aAn
d Figure 3 illustrate
two generxa
l SLAM implementations: online SLAM and offline SLAM (sometimes referred to as full SLAM). According to [18], full SLAM seeks to calculate variables over the entire path along with the mple of a HD map
, i
nstead of just the current pose, while the online SLAM problem is solved by removings shown in pastFigure 4 poses from the full SLAM problem[2].
Figure 24. An Description of online SLAM.
Figure 3. Dimage from a high de
scrfini
ption of offlinetion map (https://here.com/) SLAM[2].
H
The
re, xk repr
esents the vehicle po
se (position, attitude, velocity, etc.) at time k. m is thad e
map that con
sists of stored landmarks (f1–f4) with their pviro
sition
states. uk is the control inputs that represent the vehicle m
otion information betwe
en time epochs k − 1 nt and
k, such as acceler
ation, turn angle, etc., which can be acquired from vehicle motion sensors such as wheel encoders or an inertial sensor. At some epoch k, the onbooad rules may cha
rd sen
sors (such as Camera, Lidar, and Radar) will perceive the environment and detect one or more landmarks. The relative observations between the vehicle and all the observed landmarks are denoted as zk. Wge, for instance, the speed limit
h this information, the variables (including the vehicle pose and the map states) can be estimated.
Thmay be reduced due
rect
angle with blue background in Figure 2 o roa
nd
Figure 3 represents the state variables that are estimated in these two
implementations. In most cases, for
online SLAM, only the current vehicle pose xk+2 is estimk, roa
ted
while the map is generated and updated with the most recent measurements (uk+2 infrastructure ma
ndy zk+2), whbe
reas in the case of the offline SLAM implementation, the whole trajectory of the vehicle is updated together with the whole map. All the available control and observation measurements will be utilized together for the offline SLAM implementation.
However, with the changed due to building development
of SLAM algorithms and increased computational capabilities,
the full SLAM solution may be obtained in real-time with an efficient SLAM algorithm, which can also be treated as an online problemand so on. Therefore
, implementing a SLAM method online or offline may be dependent on whether the measurement inputs (control and observation) it requires are from current/history or from future epochs, and on its processing time (real-time or not) the HD map needs frequent updates.
2.2. Filter-Based SLAM
Filter-based S
LAM recu
rsively solves the SLAM problem in two steps. Firstly, the vehicle and map states are predicted with processing models and control inputs. In the next step, a correction of the predicted state is carried out using the current sensor observations. Therefore, the filter-based SLAM is suitable for och updates can utilize the online
SLAM.
Extend
ed Ka
lman Filter-based SLAM (EKF-SLAM) represents a standard solution for the SLAM problem. It is deriveta collected from
Bayesian filtering in which all variables are treated as Gaussian random variables. It consists of two steps: time update (prediction) and measurement update (filtering). At each time epoch the measurement and motion models are linearized (using the current state with the first-order Taylor expansion). However, since the linearization is not made around the true value of the state vector, but around the estimated value [19], any autonomous car. For example, the dat
he lineariza
tion error will accumulate and could cause a divergence of the estimation. Therefore, inconsistencies can occur.
Ano is transmitt
her issue re
lated to EKF-SLAM is the continuous expansion of map size which makes the quadratic calculation process of large-scale SLAM impractical. For autonomous driving, the complex road environment and long driving period will introduce a large number of features, which makes real-timd to central (cloud) computers where the update computation
not feasible. A large number of algorithms have been developed in order to improve computational efficiency. For example, the Compressed Extended Kalman Filter (CEKF) [20] algorithm s are performed. Other ca
n significantly r
educe computations by focusing on local areas and then extending the filtered information to the global map. Algorithms with sub-maps have also been used to address the computation issues [21][22][23][24]. A new s can receive such cloud-b
la
nk map is used to replace the old map when the old one reaches a predefined map size. A higher-level map is maintained to track the link between each sub-map.
There are sed updates and make a timely adjus
otme
other filter-based SLAM approaches, such as some variants of the Kalman Filter. One of them, the Information Filternt to driving plans. Jo et al. (IF),[3] is prop
agated with the inverse form of the state error covariance matrix, which makes this method more stableosed a [25]. This method is more popular in multi-vehicle SLAM
tcha
n in single-vehicle systems.
Another clnge upda
ss of filte
r-based SLAM techniques is the Particle Filter (PF) which has become popular in recent years. PF executes Sequential Monte-Carlo (SMC) estimation by a set of random point clusters (or particles) representing the Bayesian aposteriori. The (SLAMCU) algorithm, utilizing a Rao–Blackwellized P
article Filter was proposed in [26]. Fast-SLAM is a populaF appr
implementatio
n that treats the robot position distribution as a set of Rao–Blackwellized particles, and uses an EKF to maintain local maps. In this way, the computational complexity of SLAM is greatly reduced. Real-time application is possible with Fast-SLAMach for online vehicle position and (new) [27], ma
king online SLAM p
ossible for autonomous driving. Another advantage over EKF is that the particle filters can cope with non-linear motion models state estimation. In the [28]. How
ever, accor
ding tok of [294][30],
Fa
st-SLAM suffers from degeneration since it cannot forget the past. If marginalizing the map and when resampling is performed, statistical accuracy is lost.
2.3. Optimization-Based SLAM
Full SLAM esti new feature layer of HD ma
teps
all the vehicle pose and map states using the entire sensor data, and it is mostly optimization based. Similar to filter-based SLAM, the optimization-based SLAM system consists of two main parts: the frontend and the backend. In the frontend step, the SLAM system extracts the constraints of the problem with the sensor data, for example, by performing feature detection and matching, motion estimation, loop closure detection, etc. Nonlinear optimization is then applied to acquire the maximum likelihood estimation at the backend.
Graph SLAM can be generated using Graph SLAM when a vehicle is temporarily stopped or in a parki
s on
e of the main classes of full SLAM which uses a graphical structure to represent the Bayesian SLAM. All the platform poses along the whole trajectory and all the detected g lot. The new feature
s are treated as nodes. Spatial constraints between poses are encoded in the edges between the nodes. These constraints result layer from o
bservations, odometry measurements, and from loop closure constraints. After the graph construction, graph optimization is applied in order to optimize the graph model of the whole trajectory and map. To solve the full optimization and to calculate the Gaussian approximation of the aposteriori, a number of methods can be used, such as Gauss–Newton or Levenberg–Marquardtne vehicle can then be uploaded to the map cloud [31].
For gra
ph-based SLAM, the size of its covarian
ce matrix and update time are constant after generating the graph, therefore graph SLAM has become popular for building large-scale maps. Reducing the computational complexity of the optimization step has become one of the main research topics for practical implementations of the high-dimensional SLAM problem. The key to solving the optimization step efficiently is the sparsity of the normal matrix. The fact that each measurement is only associated with a very limited number of variables makes d integrated with that from other vehicles into a new feature layer in the ma
trix very sparse. With Cholesky factorization and QR factorization methods, the information matrix and measurement Jacobian matrix can be factorized efficiently, and hence the computational cost can be significantly reduced. Several algorithms have been proposed, such as TORO and g2o. The sub-map method is also a popular strategy for solving large-scale problems [32][33][34][35][36]. Tp cloud, thus enabling more precise and robust veh
e sub-maps ic
an be optimized independently and are related to a local coordinate frame. The sub-map coordinates can be treated as pose nodes, linked with motion constraints or loop closure constraints. Thus, a global pose graph is generated. In this way the computational complexity and update time will be improvedle localization. In the work of Zhang et al.
Smoothing and Mapping (SAM)[5],
another
optimization-based SLAM algorithm, is a type of nonlinear least squares problem. Such a least squares problem can be solved incrementally by Incremental Smoothing and Mapping (iSAM) [37] eal-time semantic segmentation and Visua
nd iSAM2 [38]. Onl
ine SLAM
can be obtained with incremental SAMs as they avoid unnecessary calculations with the entire covariance matrix. iSAM2 is more efficient as it uses a Bayes tree to obtain incremental variable re-ordering and fluid re-linearization.
SLAM++were combined to is anothge
r incremental solution for nonlinear least squares optimization-based SLAM which is very efficient. Moreover, for online SLAM implementations, fast state covariance recovery is very important fornerate semantic point cloud data
association, obtaining reduced state representations, active decision-making, and next best-view [39][40]. SLAM++ has aof the road en
adv
antage as it allows for incremental covariance calculation ironment, which
is faster than other implementationswas then [40].
Table 1 is a summa
ry of t
he characteristics of some typical SLAM techniques. Note that Graph SLAM utilizes all available observations and control information and can achieve very accurate and robust estimation results. It is suitable for offline applications and its performance relies on a good initial state guess. Filter-based SLAM is more suitable for small-scale environmeched with a pre-constructed HD map to confirm map elements that have not changed, and generate new elements when
used for online estimation, but for the complex environment, a real-time computation may be difficult with the traditional EKF-SLAM. Other variants or fastSLAM should be considered. The incremental optimization method can do incremental updating, so as to provide an optimal estimation of a large-scale map with very high efficiency and in real-timeappearing, thus facilitating crowdsource updates of HD maps.
2.4. Sensors and Fusion Method for SLAM
New mall Local Map Generation
SLAM
methods hca
ve appeared thanks to advances in sensor and computing technology. These methods are also optimization-based or filtered-based at the backend estimation step while the frontend step is highly dependent on the application of different sensor modalities. Two of the major sensors used for SLAM are Lidar and Camera. The Lidar method has become popular due to its simplicity and accuracy compared to other sensorsn also be used for small local areas. One example is within parking areas. The driving speed in a parking lot is low, therefore [52]. Tthe
core of Lvi
dar-based localization and mapping is scan-matching, which recovers the relative position and orientation of two scans or point clouds. Popular approaches for scan matching include the Iterative Closet Point (ICP) algorithm and its variants [53][54][55], and thsion technique will be more robust than in other high-speed driving sce
n
ormal distribution transform (NDT) [56]arios. The
se methods are highly dependent on good initial guess, and are impacted by local minimums [57][58]. Some other matchiparking area could be unkn
g metho
ds include probabilistic methods such as correlative scan matching (CSM)wn (public parking lot or [59], fega
tur
e-based methods [57][60]age),
and o
thers. Many of the scan-matching methods focus on initial free of or robust to, initialization error, but they still face the computation efficiency challenge.
Some range r known (home zone)–both case
ns
ors that can be
used for SLAM estimation are Radar and Sonar/ultrasonic sensors. Radar works in a similar manner to Lidar, but the system emits radio waves instead of light to measure the distance to objects. Furthermore, since Radar can observe the relative velocity between the sensor and the object using the measured Doppler shift [61]nefit from SLAM. Since SLAM can be used without GNSS signals, it is suitable for
distinguisvehi
ng between stationary and moving objects, and can be used to discard moving objects during the map-building process [62]. Scles in indoo
me r
esearch on using Radar for SLAM can be found in or underground parking [42][62][63][64][65][66]. When compare
d to Lida
r, lower price, lower power consumption, and less sensitivity to atmospheric conditions make it well suited for outdoor applications. However, Radar has lower measurement resolution, and its detections are more sparse than Lidar. Thus, it is harder to match Radar data and deal with the data association problem, which results in its 3D mapping being less accurate.
Ss, using just the perceptive sensor and odometry measurements (velocity, turn angle) o
nar
/ultrasonic sensors also IMU measure
the time-of-flight (TOF) to determine the distance to objects, by sending and receiving sound waves. Sonar-based SLAM was initially used for underwater [67][68], aments. For unknown
d indoor [69] ap
public
ations. It has become popular due to its low cost and low power consumption. It is not affected by visibility restrictions and can be used with multiple surface types [70]. H parking areas, the po
wever, si
milar to Radar, it obtains sparse information and suffers from inaccurate feature extraction and long processing time. Thus, it is of limited use for high-speed vehicle applications. Moreover, Sonar/ ultrasonic sensors have limited sensing range and may be affected by environmental noise and other platforms using ultrasound with thtion of the car and the obstacles, such as pillars, sidewalls, etc., can be estimated at the same
frequency [71].
Camera is anot
her popular sensor for SLAM. Di
fferent techniques have been developed, such as monocular [72][73], stme
reo [74][75][76][77],
and mgu
lti-camera [78][79][80][81]. These techniques can be useid
in
a wide range of environments, both indoor and outdoor. The single-camerag the parking system
is easy to deploy, however, it suffers from scale uncertainty. For home zone parking, [82]. St
he
reo-camera systems can overcome the scale factor problem and can retrieve 3D structural information by comparing the same scene from two different perspectives [61]. Multi-camera s pre-generated map and a frequent parking trajectory
stems have gained increasing interest, particularly as they achieve a large field of view [78] or are even capable of panoramcan be stored wi
c vthi
sion
[81]. Tth
is syste
m is more robust in complicated environments, while single sensor system may be very vulnerable to environmental interference [81]. However, t automated vehicle system. Each
e integration of Cameras requires additional software and hardware, and requires more calibration and synchronization effort [71][83]. Anothetime the car retur
special Camera, the RGB-D Camera, has been
studied by the SLAM and computer vision communities [84][85][86][87][88][89][90][91] since it can dis home, re
ct-l
y obtain depth information. However, this system is mainly applicable in indoor environments because it uses infrared spectrum light and is therefore sensitive to external illuminationocalization using the stored map [70].
The Visual SLAM can
also be c
lassified as feature-based or direct SLAM depending on how the measurements are used. The feature-based SLAM repeatedly detects features in images and utilizes descriptive features for tracking and depth estimation [92]. Some fundamental frameworks for this feature-based system inarried out by matching detec
lude MonoSLAM [72][93], PTAM [94], ORB-SLAM [95], and ORB-SLAM2 [96]. Inste
ad
of using any ffeature
detectors and descriptors, the direct SLAM method uses the whole image. Examples of direct SLAM include DTAM [97], LSD-SLAM [73], as with the map. The frequen
dt SVO [98]. A dense otr
semi-dense environment model ca
n be acquired by these methods, which makes them more computationally demanding than feature-based methods. Engel et al. [74] extenjectory could be used
ed the LSD-SLAM from a monocular to a stereo model while Caruso et al. [99] extended the LSD-SLAM to for the plan
omn
idirectional model. A detailed review of Visual SLAM can be found in [5] and [70][92][100][101].
Each of ing and cont
hese per
ceptive sensors has its advantages aolling steps.
An
d limitations. Lidar aapproach
es can provide precise and long-range observations, but with limitations such as being sensitive to atmospheric conditions, being expensive, and currently rather bulky. Radar systems are relatively low cost, but are more suitable for object detection than for 3D map building. Sonar/ultrasonic sensors are not suitable for high-speed platform applications. Cameras are low-cost, even when multiple Cameras are used. Cameras can also provide rich visual information. However, they are sensitive to environment texture and light, and in general, have high computational demands. Therefore, a popular strategy is to combine a variety of sensors, making the SLAM system more robust.
T that utilizes multi-level surface (MLS) maps to locate the vehicle, and to calculate and plan the veh
ere are several strategi
es to integrate data from different sensors for SLAM. One is fusing independently processed sensor results to then obtain the final solution. In [102], a mappcle path within in
g methdo
d that merged two grid maps, which were generated individually from laser and stereo camera measurements, into a single grid map or parking areas was proposed
. In this method, the measurements in of the different sensors need to be mapped to a joint reference system[6]. In
[103], a mult
i-sensor SLAM system th
at combined the 3-DoF pose estimation from laser readings, the 6-DoF pose estimation from a monocular visual system, and the inertialis study, graph-based
navigation estimation results to generate the final 6-DoF position using an EKF processing scheme was proposed. For this type of strategy, the sensors can provide redundancy and the system will be robust to possible single-sensor failure. A decision-making step may be needed to identify whether the data from each sensor is reliable, and to decide whether to adopt the estimation from that sensor modality or to ignore it. Another fusion strategy is using an assistant sensor to improve the performance of other sensor-based SLAM algorithms. The main sensor could be Lidar or Camera, while the assistant sensor could be any other type of sensor. In this strategy, the assistant sensor is used to overcome the limitations of the main sensor. The work in [104] incorporaSLAM was used for mapping, and the MLS map is then used to plan a global path from the start
ed visual information to provide a good initial guess on the rigid body transformto the destination, and t
hen used this initial transformation to seed the ICP framework. Huang et al. [105] extrao robustly loc
ted the depth of point-ba
sed and line-based landmarks from the Lidar data. The proposed system used this depth information to guide camera tracking and also to support the subsequent point-line bundle adjustment to further improve the estimation accuracy.
The above lize the vehicle with laser range measurement
wo s
trategies can be combined. In the work of
[1067],
thea fusing consists of two models, one deals with feature fusion that utilizes line feature information from an image to remove any “pseudo-segments”, which result from dynamic objects, in the laser segments. Another is a modified grid map and an EKF SLAM
framework that incorporates the state estimates obtained from the individual monocular and laser SLAM in order to reduce the pose estimation covariance and improve localization accuracy. This modified SLAM framework can run even when one sensor fails since the sensor SLAM processes are parallel to each other.
Soalgorithm were used with W-band radar for autonom
e examples of mo
re tight fusion can also be found in the literature. The work of [107] combus back-in
ed both the laser point cloud data and image feature point data as constraints and conducted a graph optimization with both of these constraints using a specific cost function. Furthermore, an image feature-based loop closure was added to this system to remove accumulation errors.
Iparking. In this work, an efficien
ert
ial EKF SLAM
incorporates an inertial measurement unit (IMU) as an assistant sensor. The IMU can be fused with the Camera or Lidar to support pose (position, velocity, attitude) estimation. With an IMU, the attitudes, especially the heading, are observable [108].algorithm was proposed to enable The integr
ation of IMU measurements can also improve the motion tracking performance during the gaps of observations. For instance,eal-time processing. In for a Visual SLAM[8],
illuminat
ion change, texture-less area, or motion blur will cause losses of visual tracks [108]. For a Lidar system, thhe authors propose
raw Lid
ar scan data may suffer from skewing caused by high-acceleration motion, such as moving fast or shaking suddenly, resulting in sensing error that is difficult to account fo an around-view monitor (AVM)/ Lidar
[109]. The work of [110] used an IMU sensor
to deal with f
ast velocity changes and to initialize motion estimates for scan-matching Lidar odometry to support their LOAM system. The high-frequency IMU data between two Lidar scans can be used to de-skew Lidar point clouds and improve their accuracy [109].
The fususion method to recognize the parki
on
of inertial sensors can be as a simple assistant [111][112] or more tig
htly coupled [108][113][114][115]. For the simple la
ssistan
t case, the IMU is mainly usee and to provide
orientation information, such as to support the system initialization. The IMU is used as prior for the whole system, and the IMU measurements are not used for further optimization. For the tightly coupled case, IMU data is fused with Camera/Lidar states to build up measurement models, and then perform state estimation and feedback to the inertial navigation system to improve navigationrapid loop closing performance
[116]. The
refore the former method is more efficient than the latter, however, it is less accurate [117]. For thabove studies have
tightly coupled
case, a Kalman filter could be used to correct the IMU states, even during GNSS outages [118].
2.5. Deep Learning-Based SLAM
Mosemonstrated that
bo
f the aforementioned SLAM methods are geometric modelth filter-based
, which build up models of platform motion and the environment SLAM and optimization-based
on geometry. These methods have achieved great success in the past decade. However, they still face many challenging issues. For instance, Visual SLAM (VSLAM) is limited under extreme lighting conditions. For large-scale applications, the model-based methods need to deal with large amounts of information, such as features and dynamic obstacles. Recently, deep learning techniques, such as data-driven approaches developed in the computer vision field, have attracted more attention. Many researchers have attempted to apply deep learning methods to SLAM problems.
MSLAM can be used to support efficient and accurate vehicle parking assistance (local area mapping and lo
st of the c
urrent research activities focus on utilizing learning-based methods for VSLAM problems since deep learning techniques have made breakthroughs ialization), even without GNSS. In the
areas of image classification, recognition, object detection, and imagework of Qin et al. segmentation [1199]. For instance,
deep learning has been successfully applied to the visual odometry (VO) problem, which is an important element of VSLAM. Optical flow estimpose graph optimization is
utilized in some learned VO models as inputs [120][121][122][123][124]. The applicatiperformed so as to
n of learning approaches can be applied in an end-to-end manner without adopting any module in the conventional VO pipelineachieve an optimized trajectory [125][126]. Wan
g et al. [125] introd
uced a
n end-to-end VO algorithm with deep Recurrent Convolutional Neural Networks (RCNNs) by combining CNNs with the RNNs. With this algorithm, the pose of the camera is directly estimated from raw RGB images, and neither prior knowledge nor parameters are needed to recover the absolute scale [125]. L global map of a parking lot, with semantic features such as guide si
et al. [127] proposed agn
Uns
upervised Deep Learning based VO system (UnDeepVO) which is trained with stereo image pairs and then performs both pose estimation and dense depth map estimation with monocular images. Unlike the one proposed by Wang et al. [125], groun, parking lines, and speed bumps. These kind
truth is
not needed for UnDeepVO since it operates in an unsupervised manner.
Thof features are more
(l
earning-based methods can be combined with the VSLAM system to replace or add on an individual or some modules of traditional SLAM, such as image depth estimatioong-term) stable and robust than
[128][129][130], pose est
radi
mation [131][132][133], tiona
nd l
oop closure [134][135][136][137], etc., t geo
im
prove the traditional method. Li et al. [138] propoetrical features
ed a, fully unsupervised deep learning-based VSLAM that contains several components, including Mapping-net, Tracking-net, Loop-net, and a graph optimization unit. This DeepSLAM method can achieve accurate pose estimation and is robust in some challenging scenarios, combining the important geometric models and constraints into the network architecture and the loss function.
Seespecially in underground parking environments. An EKF was then used to com
atic p
erception of the environment and semantic segmentlete the localization
are current research topics in the computer vision field. They can provide a high-level understanding of the environment and are extremely important system for autonomous
applications. The rapid development of deep learning cadriving.
3 Localization within the Existing Map
In
assist in the introduction of sema
ntic information into VSLAM [139] for p-base
manticd segmentation [140][141][142], localization
, a
nd mapp matching
[143][144][145][146][147], and dynam
ic obje
ct removal [148][149][150][151]. Some thod
etailed reviews of deep learning-based VSLAM can be found in [92][139][152][153][154].
Fusion wiis used t
h an inertial senso
r can also benefit from deep learning techniques, especially the RNN, which has an advantage in integrating temporal match “live” data with map information
and helping to establish consistency between nearby frames [139]. The integration of vis, using methods su
al and inertial data witch
RNN or Long Short-Term Memory (LSTM), a variant of RNN that allows RNN to learn long-term trends [155], h as Itera
s been proven t
o be more effective and convenient than traditional fusion [156][157][158]. According to ive Cl
ark et al. [157], the data-driven appro
ach se
liminates the need for manual synchronization of the camera and IMU, and the need for manual calibration between the camera and IMU. It outperforms the traditional fusion method since it is robust to calibration errors and can mitigate sensor drifts. However, to deal with the drift problem, a further extensionst Point (ICP), Normal Distribution Transform (NDT), and others of[1][10]. tTh
e learning-based visual-inertial odometry system to a larger SLAM-like system with loop-closure detection and global relocalization still needs to be investigated.
Comparedese algorithms can be linked to the
visual-based SLAM
, the applications of deep learning techniques for laser scanners or Lidar-based SLAM are still in the early stages and can be considered a new challenge problem since SLAM executes loop closing and [159]. Vre
-l
as et al. [160] used CNN for Locali
dza
r odometry estimation by tion using
the IMU sensor to support rotation parameter estimation. The results are competitive with state-of-the-art methods such as LOAM. Li et al. [161] intsimilar methods. For a SLAM pro
ducble
d an end-to-end Lidar odometry, LO-Net, which has high efficiency, and high accuracy, and can handle dynamic objects. However, this method is trained with ground truth data, which limits its application to large-scale outdoor scenarios. Li et al. [162] desim, the ability to recogn
ed a vi
sual-Lidar odometry framework, which is self-supervised, without using any ground truth labels. The results indicate that this VLO method outperforms other current self-supervised visual or Lidar odometry methods, and performs better than fully supervised VOs. Data-driven approaches also make semantic segmentation of Lidar data more accuratze a previously mapped object or feature and
faster, making it suitable for supporting autonomous vehicles [163][164][165]. Mto relo
ving objec
ts can be distinguished from static objects by LMNet [166] based on CNNs of 3D Late the vehi
dar sc
ans. One limitation of some cost-effective 3D Lidar applications for autonomous driving in challenging dynamic le within the environment
s is its relatively sparse point clouds. In order to overcome this drawback, high-resolution camera images were utilized by Yue et al. [167] to e is essential for correctin
rich the raw 3D point cloud. ERFNet is employed to seg
ment the image with the aid of sparse the maps Lidar data[11].
Meanwhile, tThe
sparsity invariant CNN (SCNN) is employed to predict the dense point cloud. Then the enriched point clouds can be refined by combining these two outputs using a multi-layer convolutional neural network (MCNN). Finally, Lidar SLAM can be performed with this enriched point cloud. Better target segmentation can be achieved with this Lidar data enrichment neural network method. However, due to the small training dataset, this method did not show improvement in SLAM accuracy with the enriched point cloud when compared to the original sparse point cloud. More training and further investigation of dynamic objects may be needed to satisfy autonomous driving application requirementsrefore, the reuse of a pre-generated map to localize the vehicle can be [167].
The generation of co
mplex deep learn
ing architectures has contributed to achieving more accurate, robust, adaptive, and efficient computer vision solutions, confirming the great potential for their application to SLAM problems. The availability of large-scale datasets is still the key to boosting these applications. Moreover, with no need for ground truth, unsupervised learning is more promising for SLAM applications in autonomous driving. Compared to the traditional SLAM algorithms, data-driven SLAM is still in the development stage, especially for Lidar SLAM. In addition, combining multiple sensing modalities may overcome the shortcomings of individual sensors, for which the learning methods-based integration system still needs further investigsidered an extension of a SLAM algorithm. In other words, the pre-generated and stored map can be treated as a type of “sensor” to support localization.