Simultaneous Localization and Mapping System in Dynamic Environment: Comparison
Please note this is a comparison between Version 2 by Lindsay Dong and Version 3 by Lindsay Dong.

Simultaneous localization and mapping (SLAM) plays a crucial role in the field of intelligent mobile robots. However, the traditional Visual SLAM (VSLAM) framework is based on strong assumptions about static environments, which are not applicable to dynamic real-world environments. The correctness of re-localization and recall of loop closure detection are both lower when the mobile robot loses frames in a dynamic environment.

  • semantic topology graph
  • Visual SLAM
  • ORB-SLAM2
  • dynamic environment
  • mobile robots

1. Introduction

Simultaneous localization and mapping (SLAM) is one of the core problems in mobile robotics research [1][2]. Compared to laser sensors, vision sensors have the advantages of fine perception, low price, smaller size, and lighter weight. Thus, Visual SLAM (VSLAM) has made great progress in the last few decades. Among various VSLAM algorithms, feature-based algorithms are widely used in long-term robot deployment because of their high efficiency and scalability. However, most existing SLAM systems rely on hand-crafted visual features, such as SIFT [3], the Shi–Tomasi method [4], and ORB [5], which may not provide consistent feature detection and association results in dynamic environments. For example, when either the scene or the viewpoint has been changed, ORB-SLAM2 frequently fails to recognize previously visited scenes because of less visual feature information [6]. Thus, the mobile robot needs to use re-localization and loop closure detection to identify whether this scene has been visited before. Generally, the re-localization is accomplished using loop closure detection to correlate information. Firstly, the bag-of-words model is used to extract candidate frames for relocation with high similarity to the current frame. Then, the frame-to-frame matching method is used to match the local features of the current frame and re-localize candidate frames until all the candidate frames are traversed or the information is associated. However, in dynamic environments, Visual SLAM often fails in re-localization and loop closure detection due to the insufficient numbers of matched local feature point pairs. However, the numbers of matched local feature point pairs are insufficient because of dynamic objects, which can vastly impair the performance of the Visual SLAM system [7]. Therefore, Visual SLAM often fails in re-localization and loop closure detection. In order to address this challenging topic, some researchers have performed some work on feature removal [8][9]. The stability of the system’s visual odometer is enhanced through removing feature points from dynamic objects.
In the real dynamic scene, re-localization and loop closure detection will fail with limited feature information [10]. Therefore, the mobile robot often needs to go back to the previous place to extract more feature information, so as to complete the feature information matching to realize re-localization and loop closure detection. The combination of Visual SLAM and deep learning can solve this problem better than traditional methods [11]. The main idea of deep learning is to extract image features using a network trained in advance [12]. However, the disadvantage of deep learning is that it is computationally intensive and requires high-performance equipment. Furthermore, it is difficult to construct suitable models to store high-dimensional feature vectors. Thus, in recent years, researchers have attempted to introduce semantic information into Visual SLAM [13][14]. Using semantic information to describe the environment can effectively simplify the process of saving and comparing environmental information [15][16].
ORB-SLAM2 is the most widely used of the Visual SLAM frameworks [6]. The re-localization and loop closure detection of the ORB-SLAM2 framework are mainly accomplished by using the current frame to match feature points with the candidate keyframes. Therefore, the numbers of extracted feature points determine the accuracy and matching speed of re-localization and loop closure detection. The more feature points are extracted, the more accurate the localization and the faster the corresponding speed. However, it will affect the feature point extraction if the dynamic object moves fast in a dynamic environment, which will affect the accuracy of re-localization and loop closure detection. Despite this, the relative positions and distances of static objects do not change, irrespective of the motion of dynamic objects in the dynamic environment.

2. Simultaneous Localization and Mapping System in Dynamic Environment

2.1. Re-Localization

ORB-SLAM2, as a sparse feature-based VSLAM method, is prone to tracking loss in position estimation. The re-localization function of ORB-SLAM2 is activated when frames are lost. The re-localization is realized by discriminant coordinate regression. This method uses the PnP algorithm to obtain the essential matrix after using RANSAC to obtain the fundamental matrix describing the relationship between two image positions, and then uses the PnP algorithm to obtain the essential matrix jointly with the camera internal parameters to solve the camera poses [17]. The purpose is to obtain a sufficient number of matched feature points from the before and after sequence frame images for rotation and translation to solve the camera poses so that the lost camera poses can be estimated and the tracking process can be resumed [18]. The core idea of re-localization is to find the keyframe that is closest to the current frame among the previous keyframes. Firstly, the BoW of the current frame must be calculated. Then, the frames with high similarity and a higher number of matching feature points than 15 in the BoW model are determined as the sequence of candidate keyframes. Finally, the success of re-localization is determined by whether the number of interior points matching the current frame with the candidate frames is greater than 50. The specific process is shown in Figure 1. Due to the presence of dynamic objects in the environment, incorrect matching information often occurs during feature matching.
Figure 1.
The process of re-localization.

2.2. Loop Closure Detection (LCD)

Loop closure detection (LCD) is the ability of a mobile robot to recognize a scene that has been reached and to realize loop closure. The basic process is to calculate the similarity by comparing between keyframes and determine whether they pass through the same place or “return to the origin”. The LCD problem is the process of determining the correlation between current and historical data to identify whether a location has been reached before. The essence of the LCD problem is to reduce the cumulative error in map construction [19]. With the development of computer vision, the LCD algorithm based on appearance information has become the mainstream algorithm in the early stage, among which BoVW is the most common algorithm [20]. BoVW has been widely used in the LCD of VSLAM systems because of its high detection efficiency and retrieval accuracy. However, the presence of dynamic objects will interfere with the judgment of LCD [21]. In recent years, the continuous development of deep learning technology in the fields of image recognition, computer vision, and mobile robotics has provided new solution ideas for the LCD module in SLAM systems. DS-SLAM [22] and SegNet [23] were employed to segment dynamic objects. DS-SLAM removed the feature points located in the area of the dynamic scene to alleviate dynamic interference in loop detection. The authors of reference [24] proposed to realize LCD by integrating visual–spatial–semantic information with features of a topological graph and a convolutional neural network. The authors first built semantic topological graphs based on semantic information and then used random walk descriptors to characterize the topological graphs for graph matching. Finally, the authors calculated the geometric similarity and the appearance similarity to determine the loop closure detection. Reference [25] presents a strategy that models the visual scene as a semantic sub-graph by only preserving the semantic and geometric information from object detection. The authors used a sparse Kuhn–Munkres algorithm to speed up the search for correspondence among nodes. The shape similarity and the Euclidean distance between objects in the 3D space were leveraged and united to measure the image similarity through graph matching.

References

  1. Durrant-Whyte, H.; Bailey, T. Simultaneous localization and mapping: Part i. IEEE Robot. Autom. Mag. 2006, 13, 99–110.
  2. Cadena, C.; Carlone, L.; Carrillo, H.; Latif, Y.; Scaramuzza, D.; Neira, J.; Reid, I.; Leonard, J.J. Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age. IEEE Trans. Robot. 2016, 32, 1309–1332.
  3. Low, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110.
  4. Tomasi, J.S.C. Good features to track. In Proceedings of the 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 21–23 June 1994; Volume 84, pp. 593–600.
  5. Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G.R. Orb: An efficient alternative to sift or surf. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2011, Barcelona, Spain, 6–13 November 2011.
  6. Mur-Artal, R.; Tardós, J.D. Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE Trans. Robot. 2017, 33, 1255–1262.
  7. Tourani, A.; Bavle, H.; Sanchez-Lopez, J.L.; Voos, H. Visual SLAM: What are the current trends and what to expect? Sensors 2022, 22, 9297.
  8. Ai, Y.b.; Rui, T.; Yang, X.q.; He, J.l.; Fu, L.; Li, J.b.; Lu, M. Visual SLAM in dynamic environments based on object detection. Def. Technol. 2021, 17, 1712–1721.
  9. Theodorou, C.; Velisavljevic, V.; Dyo, V. Visual SLAM for Dynamic Environments Based on Object Detection and Optical Flow for Dynamic Object Removal. Sensors 2022, 22, 7553.
  10. Wang, Y.; Bu, H.; Zhang, X.; Cheng, J. YPD-SLAM: A Real-Time VSLAM System for Handling Dynamic Indoor Environments. Sensors 2022, 22, 8561.
  11. Mokssit, S.; Licea, D.B.; Guermah, B.; Ghogho, M. Deep learning techniques for visual slam: A survey. IEEE Access 2023, 11, 20026–20050.
  12. Wang, S.; Lv, X.; Liu, X.; Ye, D. Compressed holistic convnet representations for detecting loop closures in dynamic environments. IEEE Access 2020, 8, 60552–60574.
  13. Ge, G.; Zhang, Y.; Wang, W.; Jiang, Q.; Hu, L.; Wang, Y. Text-mcl: Autonomous Mobile Robot Localization in Similar Environment Using Text-Level Semantic Information. Machines 2022, 10, 169.
  14. Yang, S.; Fan, G.; Bai, L.; Zhao, C.; Li, D. SGC-VSLAM: A semantic and geometric constraints VSLAM for dynamic indoor environments. Sensors 2020, 20, 2432.
  15. Singh, G.; Wu, M.; Do, M.V.; Lam, S.-K. Fast semantic-aware motion state detection for visual slam in dynamic environment. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23014–23030.
  16. Shao, C.; Zhang, L.; Pan, W. Faster r-cnn learning-based semantic filter for geometry estimation and its application in vslam systems. IEEE Trans. Intell. Transp. Syst. 2022, 23, 5257–5266.
  17. Sweeney, C.; Fragoso, V.; Höllerer, T.; Turk, M. gDLS: A Scalable Solution to the Generalized Pose and Scale Problem. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Switzerland, 2014; pp. 16–31.
  18. Memon, A.R.; Wang, H.; Hussain, A. Loop closure detection using supervised and unsupervised deep neural networks for monocular slam systems. Robot. Auton. Syst. 2020, 126, 103470.
  19. Ma, J.; Wang, S.; Zhang, K.; He, Z.; Huang, J.; Mei, X. Fast and robust loop-closure detection via convolutional auto-encoder and motion consensus. IEEE Trans. Ind. Inform. 2022, 18, 3681–3691.
  20. Williams, B.; Cummins, M.; Neira, J.; Newman, P.; Reid, I.; Tardos, J. A comparison of loop closing techniques in monocular slam. Robot. Auton. Syst. 2009, 57, 1188–1197.
  21. Zhang, G.; Yan, X.; Ye, Y. Loop closure detection via maximization of mutual information. IEEE Access 2019, 7, 124217–124232.
  22. Yu, C.; Liu, Z.; Liu, X.-J.; Xie, F.; Yang, Y.; Wei, Q.; Fei, Q. DS-SLAM: A semantic visual SLAM towards dynamic environments. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018.
  23. Badrinarayanan, V.; Kendall, A.; Cipolla, R. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 2481–2495.
  24. Yuan, Z.; Xu, K.; Zhou, X.; Deng, B.; Ma, Y. Svg-loop: Semantic-visual-geometric information-based loop closure detection. Remote. Sens. 2021, 13, 3520.
  25. Qin, C.; Zhang, Y.; Liu, Y.; Lv, G. Semantic loop closure detection based on graph matching in multi-objects scenes? J. Vis. Commun. Image Represent. 2021, 76, 103072.
More
ScholarVision Creations