The Methods Applied in Geo-Registration: Comparison
Please note this is a comparison between Version 1 by Chenliang Wang and Version 2 by Alfred Zheng.

In augmented reality applications, geo-registration refers to the process of aligning and matching virtual objects with the geographic location and orientation of the real-world scene. Currently, there are three common methods for pose estimation: sensor-based approaches, vision-based approaches, and hybrid approaches. These methods have been extensively applied in numerous projects and research endeavors.

  • geo-registration
  • sensor-based methods
  • vision-based methods
  • hybrid method

1. Introduction

The rapid development of augmented reality (AR) technology is transforming digital interactions by seamlessly integrating virtual information with the real world. In outdoor settings, AR technology has been widely applied in the field of geospatial information [1]. It connects map information and virtual graphical objects [2] with the real environments, providing more accurate, efficient, and intuitive interactive experiences. The application of AR technology in geospatial information holds significant value and importance. Firstly, AR geospatial applications make unmanned aerial vehicle (UAV) photography [3] efficient and reliable [4], enabling accurate positioning and navigation. Secondly, AR geospatial applications can offer intuitive navigation experiences. Whether it is locating destinations in urban areas [5], navigating for tourism purposes [6] and emergencies [7], or investigating complex outdoor environments [8], AR maps enhance navigation with higher accuracy, real-time performance, and convenience. Furthermore, AR technology demonstrates immense potential in the field of underground engineering construction. In underground pipeline laying [9] and mining surveys [10], AR systems provide a visual perception of the underground space, reducing errors, risks, and enhancing work efficiency.

2. The Methods Applied in Geo-Registration

In augmented reality applications, geo-registration refers to the process of aligning and matching virtual objects with the geographic location and orientation of the real-world scene. Currently, there are three common methods for pose estimation: sensor-based approaches, vision-based approaches, and hybrid approaches. These methods have been extensively applied in numerous projects and research endeavors. However, pose estimation in outdoor scenarios still faces numerous challenges. Factors such as signal interference, environmental variations, lighting changes, feature scarcity, occlusions, and dynamic objects severely impact the accurate determination of the geographic north orientation, alignment with the terrain surface, and precision in coordinate system transformations. These factors can lead to cumulative errors in pose estimation.

2.1. Sensor-Based Methods

Localization in outdoor industrial environments typically utilizes sensors such as GPS and magnetometers to obtain spatial coordinates and orientation [11]. Sensor-based approaches utilize non-visual sensors such as GPS, IMU, and magnetometers to acquire users’ positional and directional information [12]. Subsequently, virtual information is generated based on geospatial databases and aligned with the real environment. These methods primarily rely on built-in, non-visual sensors (e.g., accelerometers, gyroscopes, magnetometers) to obtain the azimuth and tilt angles of the device [13][14]. By fusing this information with GPS localization data, they collectively provide position and orientation estimation. These sensors provide information on linear acceleration, angular velocity, and magnetic field strength. By using filtering and attitude algorithms, the data can be used to infer the pose of the device. Sensor-based approaches are characterized by their low cost, low complexity, and reasonable continuity, making them suitable for simple AR application scenarios. Behzadan and Kamat [2] demonstrated GPS’s effective use for real-time virtual graphics registration in an outdoor AR setting. Accelerometers and gyroscopes can provide accurate and robust attitude initialization, as demonstrated by Tedaldi et al. [15]. However, these sensors also have limitations, such as drift and noise over time [16]. Moreover, the heading angle measured by a magnetometer, which is often integrated with accelerometers and gyroscopes, is susceptible to significant influence from ambient magnetic field noise [17].
An Inertial Measurement Unit (IMU) offers a fast response and update rate for capturing motion changes, but it is vulnerable to noise, drift, and measurement range limitations, affecting the accuracy and stability of AR applications [18]. This will cause deviations or oscillations in the orientation of virtual objects. Consequently, the accuracy and stability of aligning with the Earth’s surface are affected. Furthermore, cumulative errors may arise in spatial rotations, and the conversion between coordinate systems cannot guarantee the spatial rotational invariance of objects relative to the geographic reference frame. RTK-GPS offers centimeter-level positioning accuracy and works well during high-speed motion. However, it may encounter failures and initialization challenges in certain outdoor AR map scenarios [19]. Generally, the distance between the mobile station and the reference station should not exceed 10 km to 15 km, as it could impact the positioning accuracy or even lead to failure [20]. Thus, outdoor localization using consumer-grade sensors is a challenging problem [21] that requires a method that can combine the strengths and overcome the weaknesses of different sensors.
By combining the measurements from RTK-GPS, including the position, velocity, and heading angle, with the IMU data, improved outcomes can be achieved at a lower cost. Moreover, IMUs have a high output frequency and enable the measurement of the six degrees of freedom (6DOF) pose. This feature makes them ideal for short-term applications in environments with limited visual texture and rapid motion [22]. The distinctive characteristics of IMUs make them a valuable complement to RTK-GPS signals, leading to their widespread adoption and extensive investigation in conjunction with RTK-GPS integration [23][24]. However, one limitation that arises is the zero offset of the accelerometer and the gyroscope, which leads to a significant pose offset over time [25]. Moreover, when employing low-accuracy (consumer-grade) RTK-GPS in conjunction with a highly accurate IMU, the impact of north positioning errors becomes significant during substantial changes in altitude [26]. Furthermore, the pose estimation approach that combines RTK-GPS and IMU is limited to open areas because it relies on satellite availability [27].
Due to potential deviations between sensor data, environmental interference or drift can destabilize the azimuth estimation of such methods, as they rely on sensor data such as magnetometer readings. Sensor-based approaches are prone to error accumulation. The estimation of position and pose using sensor data, such as accelerometer and gyroscope data, requires integration and filtering, which exacerbates the issue of error accumulation. Those issues make sensor-based approaches suffer from insufficient accuracy and stability in both position alignment and map motion tracking.

2.2. Vision-Based Methods

An alternative approach is the utilization of vision-based methods. These methodologies can be considered as a specific instance within the broader category of sensor-based approaches, where the camera functions as the primary sensor. Different from the conventional sensor method, vision-based approaches utilize cameras to capture real-world images. These images are then processed using techniques such as feature extraction, matching, and tracking to detect and match feature points within the environment. By analyzing the positional changes of these feature points, it is possible to estimate the device’s pose, the user’s location and orientation information, and the geographical positioning within the real environment. This allows for the alignment between virtual information and the real world [28]. With the advancement of spatial data acquisition technologies, recent studies have focused on geographic localization and registration using multi-source data, including satellite imagery and video data [29]. However, these methods are both complex and expensive. On the one hand, the accuracy of the heading angle in certain localization methods that rely on multi-source data may not be appropriate for outdoor, wide-range AR applications [30]. On the other hand, their wide application is challenging because they require a substantial number of pre-matched georeferenced images or a large database of point clouds captured from the physical world [21]. A more efficient approach is to employ automated computer vision techniques to generate three-dimensional point clouds for positioning [31]. However, this method requires pre-generation of point clouds, which primarily relies on visual features. Camera-based pose estimation methods involve capturing images of the environment using cameras and utilizing computer vision algorithms to infer the position and orientation of the device. These methods typically involve feature point detection, camera calibration, and visual geometric algorithms, which in turn require significant computational resources and algorithmic complexity. Even with map motion tracking employed, they still result in a slow system response due to the heavy computation load required for feature recognition, localization, and mapping.
Visual methods are better suited for pose estimation in local scenes because the camera is primarily used as a local sensor in these methods [32]. In outdoor global geo-registration and pose estimation, the camera is susceptible to environmental interferences, resulting in degraded image quality or the loss of feature points. The influence of blurred visual features on the approach becomes increasingly noticeable as the velocity of motion increases. As a consequence, the visual data may not be able to provide stable and reliable orientation estimation. The SIFT feature algorithm, for instance, is considered a relatively reliable visual feature algorithm [33]. However, it requires the computation of key point orientations, which can be influenced by noise and lighting conditions. Furthermore, the SIFT feature algorithm is limited to handling small-angle rotations and may fail when dealing with large-angle rotations. Neural network-based methods exhibit stronger capabilities in extracting image features compared to traditional visual feature algorithms. However, they require a large amount of training data, and in the case of rare or novel objects, there may not be enough data to achieve satisfactory generalization capabilities [30]. In summary, visual methods have limitations in global position alignment and require abundant feature dependencies and motion constraints from MAR (Motion and Attitude Reference) devices for surface alignment.

2.3. Hybrid Method

Single sensors and vision alone are insufficient to achieve robust, accurate, and large-scale 6-DOF localization in complex real-world environments. Hybrid methods for pose estimation integrate non-visual sensor and visual information to obtain more accurate and stable results in estimating pose. To achieve a robust and accurate outdoor geo-registration system, it is necessary to fuse multiple complementary sensors. By leveraging the complementary advantages of sensors and visual data, the limitations of individual methods can be overcome, particularly in demanding outdoor applications such as urban and highway environments [34].
The visual-inertial fusion method has become popular for positioning and navigation, especially in GPS-denied environments [35], due to the complementary nature of vision and IMU sensors [36]. By integrating vision and IMU sensors [37], this method overcomes the limitations of using either vision or IMU sensors alone [38][39]. Vision sensors can be affected by factors such as lighting, occlusion, and feature matching, while visual localization methods rely on a large amount of georeferenced or pre-registered data. IMU, on the other hand, suffers from issues such as accumulated errors and zero drift. By comparing and calibrating the attitude estimation of the IMU with visual data, it is possible to achieve more accurate pose estimation. Common methods for this fusion include Extended Kalman Filtering (EKF) and Tightly Coupled Filtering. The fusion of the camera and the IMU sensor can solve the problems of low output frequency and the accuracy of visual pose estimation, enhancing the robustness of the positioning results.
However, current multi-sensor fusion methods have limitations. The advantage of multi-sensor fusion is its adaptability and stability, but its implementation is also more complex. These methods fail to simultaneously meet the requirements of high-precision initial and motion pose estimation with a low-cost solution. Although these methods may provide cost advantages, they have limitations in achieving both accurate pose estimation and cost-effectiveness. There is a trade-off and inherent constraints between low-cost solutions and high-precision requirements. Ren et al. [3] achieved geo-registration in low-cost UAVs by fusing RTK-GPS and the IMU sensor. However, the limited accuracy of the IMU sensor in these UAVs led to imprecise attitude fusion. Their method heavily relies on the stability of RTK-GPS data. Burkard and Fuchs-Kittowski [21] estimated the gravity vector and geographic north in visual-inertial fusion registration through user gesture calibration. However, the accuracy of the registration relies on manually input information. Oskiper et al. [40] utilized road segmentation direction information and annotated remote sensing image data in their visual-inertial method to achieve accurate global initial registration. However, their performance in pose matching may degrade under outdoor continuous motion and spatial rotation. Hansen et al. [41] proposed a precise method for estimating positioning and orientation using LiDAR, an intelligent IMU with high accuracy, and a pressure sensor that can measure altitude. However, the high cost of these devices prevents widespread adoption. Multi-sensor fusion is primarily hindered by the lack of precision in surface alignment.
 
 

References

  1. Cheng, Y.; Zhu, G.; Yang, C.; Miao, G.; Ge, W. Characteristics of augmented map research from a cartographic perspective. Cartogr. Geogr. Inf. Sci. 2022, 49, 426–442.
  2. Behzadan, A.H.; Kamat, V.R. Georeferenced Registration of Construction Graphics in Mobile Outdoor Augmented Reality. J. Comput. Civ. Eng. 2007, 21, 247–258.
  3. Ren, X.; Sun, M.; Jiang, C.; Liu, L.; Huang, W. An Augmented Reality Geo-Registration Method for Ground Target Localization from a Low-Cost UAV Platform. Sensors 2018, 18, 3739.
  4. Liu, D.; Chen, J.; Hu, D.; Zhang, Z. Dynamic BIM-augmented UAV safety inspection for water diversion project. Comput. Ind. 2019, 108, 163–177.
  5. Portalés, C.; Lerma, J.L.; Navarro, S. Augmented reality and photogrammetry: A synergy to visualize physical and virtual city environments. ISPRS J. Photogramm. Remote Sens. 2010, 65, 134–142.
  6. Xiao, W.; Mills, J.; Guidi, G.; Rodríguez-Gonzálvez, P.; Gonizzi Barsanti, S.; González-Aguilera, D. Geoinformatics for the conservation and promotion of cultural heritage in support of the UN Sustainable Development Goals. ISPRS J. Photogramm. Remote Sens. 2018, 142, 389–406.
  7. Ma, X.; Sun, J.; Zhang, G.; Ma, M.; Gong, J. Enhanced Expression and Interaction of Paper Tourism Maps Based on Augmented Reality for Emergency Response. In Proceedings of the 2018 2nd International Conference on Big Data and Internet of Things—BDIOT 2018, Beijing, China, 24–26 October 2018; ACM Press: New York, NY, USA, 2018; pp. 105–109.
  8. Gazcón, N.F.; Trippel Nagel, J.M.; Bjerg, E.A.; Castro, S.M. Fieldwork in Geosciences assisted by ARGeo: A mobile Augmented Reality system. Comput. Geosci. 2018, 121, 30–38.
  9. Li, W.; Han, Y.; Liu, Y.; Zhu, C.; Ren, Y.; Wang, Y.; Chen, G. Real-time location-based rendering of urban underground pipelines. ISPRS Int. J. Geo-Inf. 2018, 7, 32.
  10. Suh, J.; Lee, S.; Choi, Y. UMineAR: Mobile-tablet-based abandoned mine hazard site investigation support system using augmented reality. Minerals 2017, 7, 198.
  11. Li, P.; Qin, T.; Hu, B.; Zhu, F.; Shen, S. Monocular Visual-Inertial State Estimation for Mobile Augmented Reality. In Proceedings of the 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Nantes, France, 9–13 October 2017; pp. 11–21.
  12. Von Stumberg, L.; Usenko, V.; Cremers, D. Direct Sparse Visual-Inertial Odometry Using Dynamic Marginalization. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 2510–2517.
  13. Trimpe, S.; D’Andrea, R. Accelerometer-based tilt estimation of a rigid body with only rotational degrees of freedom. In Proceedings of the 2010 IEEE International Conference on Robotics and Automation, Anchorage, AK, USA, 3–7 May 2010; pp. 2630–2636.
  14. Zhang, Z.-Q.; Yang, G.-Z. Calibration of Miniature Inertial and Magnetic Sensor Units for Robust Attitude Estimation. IEEE Trans. Instrum. Meas. 2014, 63, 711–718.
  15. Tedaldi, D.; Pretto, A.; Menegatti, E. A robust and easy to implement method for IMU calibration without external equipments. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 3042–3049.
  16. Thong, Y.K.; Woolfson, M.S.; Crowe, J.A.; Hayes-Gill, B.R.; Challis, R.E. Dependence of inertial measurements of distance on accelerometer noise. Meas. Sci. Technol. 2002, 13, 1163–1172.
  17. Ryohei, H.; Michael, C. Outdoor Navigation System by AR. SHS Web Conf. 2021, 102, 04002.
  18. Wang, Y.J.; Gao, J.Q.; Li, M.H.; Shen, Y.; Hasanyan, D.; Li, J.F.; Viehland, D. A review on equivalent magnetic noise of magnetoelectric laminate sensors. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 2014, 372, 20120455.
  19. Morales, Y.; Tsubouchi, T. DGPS, RTK-GPS and StarFire DGPS Performance Under Tree Shading Environments. In Proceedings of the 2007 IEEE International Conference on Integration Technology, Shenzhen, China, 20–24 March 2007; pp. 519–524.
  20. Kim, M.G.; Park, J.K. Accuracy Evaluation of Internet RTK GPS by Satellite Signal Reception Environment. J. Korean Soc. Surv. Geod. Photogramm. Cartogr. 2013, 31, 277–283.
  21. Burkard, S.; Fuchs-Kittowski, F. User-Aided Global Registration Method using Geospatial 3D Data for Large-Scale Mobile Outdoor Augmented Reality. In Proceedings of the 2020 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Recife, Brazil, 9–13 November 2020; pp. 104–109.
  22. Randeniya, D.I.B.; Sarkar, S.; Gunaratne, M. Vision–IMU Integration Using a Slow-Frame-Rate Monocular Vision System in an Actual Roadway Setting. IEEE Trans. Intell. Transp. Syst. 2010, 11, 256–266.
  23. Suwandi, B.; Kitasuka, T.; Aritsugi, M. Low-cost IMU and GPS fusion strategy for apron vehicle positioning. In Proceedings of the TENCON 2017—2017 IEEE Region 10 Conference, Penang, Malaysia, 5–8 November 2017; pp. 449–454.
  24. Wang, S.; Deng, Z.; Yin, G. An Accurate GPS-IMU/DR Data Fusion Method for Driverless Car Based on a Set of Predictive Models and Grid Constraints. Sensors 2016, 16, 280.
  25. Mahdi, A.E.; Azouz, A.; Abdalla, A.; Abosekeen, A. IMU-Error Estimation and Cancellation Using ANFIS for Improved UAV Navigation. In Proceedings of the 2022 13th International Conference on Electrical Engineering (ICEENG), Cairo, Egypt, 29–31 March 2022; pp. 120–124.
  26. Huang, W.; Sun, M.; Li, S. A 3D GIS-based interactive registration mechanism for outdoor augmented reality system. Expert Syst. Appl. 2016, 55, 48–58.
  27. Qimin, X.; Bin, C.; Xu, L.; Xixiang, L.; Yuan, T. Vision-IMU Integrated Vehicle Pose Estimation based on Hybrid Multi-Feature Deep Neural Network and Federated Filter. In Proceedings of the 2021 28th Saint Petersburg International Conference on Integrated Navigation Systems (ICINS), Saint Petersburg, Russia, 31 May–2 June 2021; pp. 1–5.
  28. Liu, R.; Zhang, J.; Chen, S.; Yang, T.; Arth, C. Accurate real-time visual SLAM combining building models and GPS for mobile robot. J. Real-Time Image Process. 2021, 18, 419–429.
  29. Toker, A.; Zhou, Q.; Maximov, M.; Leal-Taix’e, L. Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 6484–6493.
  30. Mithun, N.C.; Minhas, K.S.; Chiu, H.-P.; Oskiper, T.; Sizintsev, M.; Samarasekera, S.; Kumar, R. Cross-View Visual Geo-Localization for Outdoor Augmented Reality. In Proceedings of the 2023 IEEE Conference Virtual Reality and 3D User Interfaces (VR), Shanghai, China, 25–29 March 2023; pp. 493–502.
  31. Ventura, J.; Höllerer, T. Wide-area scene mapping for mobile visual tracking. In Proceedings of the 2012 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Atlanta, GA, USA, 5–8 November 2012; pp. 3–12.
  32. Qin, T.; Cao, S.; Pan, J.; Shen, S. A General Optimization-based Framework for Global Pose Estimation with Multiple Sensors 2019. arXiv 2019, arXiv:1901.03642.
  33. Qu, X.; Soheilian, B.; Habets, E.; Paparoditis, N. Evaluation of sift and surf for vision based localization. ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, XLI-B3-685, 685–692.
  34. Wan, G.; Yang, X.; Cai, R.; Li, H.; Zhou, Y.; Wang, H.; Song, S. Robust and Precise Vehicle Localization Based on Multi-Sensor Fusion in Diverse City Scenes. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, QLD, Australia, 21–25 May 2018; pp. 4670–4677.
  35. Hesch, J.A.; Kottas, D.G.; Bowman, S.L.; Roumeliotis, S.I. Consistency Analysis and Improvement of Vision-aided Inertial Navigation. IEEE Trans. Robot. 2014, 30, 158–176.
  36. Corke, P.; Lobo, J.; Dias, J. An Introduction to Inertial and Visual Sensing. Int. J. Robot. Res. 2007, 26, 519–535.
  37. Foxlin, E.; Naimark, L. VIS-Tracker: A wearable vision-inertial self-tracker. In Proceedings of the IEEE Virtual Reality, 2003, Los Angeles, CA, USA, 22–26 March 2003; pp. 199–206.
  38. Schall, G.; Wagner, D.; Reitmayr, G.; Taichmann, E.; Wieser, M.; Schmalstieg, D.; Hofmann-Wellenhof, B. Global pose estimation using multi-sensor fusion for outdoor Augmented Reality. In Proceedings of the 2009 8th IEEE International Symposium on Mixed and Augmented Reality, Orlando, FL, USA, 19–22 October 2009; pp. 153–162.
  39. Waegel, K.; Brooks, F.P. Filling the gaps: Hybrid vision and inertial tracking. In Proceedings of the 2013 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), Adelaide, SA, Australia, 1–4 October 2013; pp. 1–4.
  40. Oskiper, T.; Samarasekera, S.; Kumar, R. Global Heading Estimation For Wide Area Augmented Reality Using Road Semantics For Geo-referencing. In Proceedings of the 2021 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct), Bari, Italy, 4–8 October 2021; pp. 427–428.
  41. Hansen, L.H.; Fleck, P.; Stranner, M.; Schmalstieg, D.; Arth, C. Augmented Reality for Subsurface Utility Engineering, Revisited. IEEE Trans. Vis. Comput. Graph. 2021, 27, 4119–4128.
More
Video Production Service