Environment Perception System of Quadruped Robots

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Guangrong Chen	--	1788	2023-06-08 16:05:57	\|
2	references update	Fanny Huang	-1 word(s)	1787	2023-06-12 02:59:50	\|

This entry is adapted from the peer-reviewed paper 10.3390/drones7050329

Due to the high stability and adaptability, quadruped robots are highly discussed in the robotics field. To overcome the complicated environment indoor or outdoor, the quadruped robots should be configured with an environment perception system, which mostly contain LiDAR or a vision sensor, and SLAM (Simultaneous Localization and Mapping) is deployed.

quadruped robot simultaneous localization and mapping deep learning

1. Introduction

According to the type of motion, mobile robots can be classified into three categories, wheeled, crawler, and legged ^[1]. Wheeled robots are suitable for simple terrains, crawler robots can move on complex terrains, but their movement flexibility is poor. Compared to the former two, legged robots only require discrete points instead of continuous motion when planning their motion path, allowing them to adapt to more complex terrains ^[2]. Legged robots can be further divided into monopods ^[3], bipeds ^[4], quadrupeds ^[5], hexapods ^[6], etc. Among them, quadruped robots offer both high stability and adaptability, allowing them to navigate more complex terrains than biped robots without the complexity of hexapod robots. As a result, they have become a research hotspot in the field of robotics. In the research of quadruped robots, improving their adaptability to the external environment, specifically their ability to autonomously perceive and interact with the external environment, is a highly researched topic. An autonomous legged robot requires an accurate, real-time running, simultaneous localization and mapping (SLAM) algorithm without human intervention ^[7].

Most outdoor navigation systems, such as surface ships, use Global Navigation Satellite Systems (GNSS) ^[8], such as the Global Positioning System (GPS), to measure their position. Xia X et al. proposed An autonomous vehicle sideslip angle estimation algorithm based on consensus and vehicle kinematics/dynamics synthesis. Based on the velocity error measurements between the reduced Inertial Navigation System (R-INS) and the GNSS, a velocity-based Kalman filter is formalized to estimate the velocity errors, attitude errors, and gyro bias errors of the R-INS ^[9]. Gao L et al. proposed a vehicle localization system based on vehicle chassis sensors considering vehicle lateral velocity to improve the accuracy of vehicle stand-alone localization in highly dynamic driving conditions during GNSS outages ^[10]. However, these signals are weak and vulnerable to intentional or unintentional interference. To address these problems, SLAM has emerged as a research hotspot in the field of robot autonomous navigation. Two mainstream technologies in SLAM are laser-SLAM and visual-SLAM, which are based on LiDAR sensors and visual sensors, respectively. Each sensor has its advantages and disadvantages. Visual sensors can obtain relatively accurate detection results at close distances, but their detection distance is limited and they are more sensitive to the external environment. They are usually used for semantic interpretation of the scene but cannot perform well in harsh lighting conditions. On the other hand, LiDAR sensors can detect further distances and have stronger anti-jamming capabilities, making them important for obstacle detection and tracking. However, they have poor performance in the detection of color, texture, and features. Therefore, fusing LiDAR and visual information can overcome their drawbacks and improve the stability and accuracy of detection ^[11].

In addition, machine learning and deep learning techniques are widely used for more complex object detection and scene perception, including image classification and object detection. Commonly used algorithms include Convolutional Neural Networks (CNNs) and the YOLO (You Only Look Once) network. Liang Y et al. presented a novel lightweight convolutional module (LCM), namely convolutional layers module (CEModule), focusing on the CE part. CEModule increases the number of key features to maintain a high level of accuracy in classification. In the meantime, CEModule employs a group convolution strategy to reduce floating-point operations (FLOPs) incurred in the training process ^[12]. Zhou P et al. proposed a lightweight unmanned aerial vehicle video object detection based on spatial-temporal correlation, an efficient deep learning model on unmanned aerial vehicles (UAVs) to fit the restriction of low computational powers and low power consumption ^[13].

2. Single Sensor Detection

The current research on the perception of the external environment using a single sensor is relatively mature. Manuel et al. proposed an algorithm that performs autonomous 3D reconstruction of an environment using a single 2D LiDAR sensor and implemented it on a mobile platform using the Robot Operating System (ROS) ^[14]. Woo et al. proposed a Ceiling Vision-based Simultaneous Localization and Mapping (CV-SLAM) technique using a single ceiling vision sensor ^[15]. They addressed the rotation and affine transform problems of the ceiling vision by using a 3D gradient orientation estimation method and multi-view description of landmarks. Based on that, they reconstructed the 3D landmark map in real-time using the Extended Kalman filter-based SLAM framework. Andrew et al. presented the MonoSLAM algorithm, which can recover the 3D trajectory of a monocular camera ^[16]. The core part of the research is to online create a sparse but persistent map of natural landmarks within a probabilistic framework. The work also extended the range of robotic systems to humanoid robots and augmented reality with a hand-held camera. Dominik Belter applied a simultaneous localization and mapping algorithm to localize a hexapod robot using data from compact RGB-D sensors. This approach employed a new concept that combines fast visual odometry to track sensor motion and visual features to track radar scans. Experiments showed that visual radar features can be used to accurately estimate ship trajectories across a wide range of datasets ^[17].

3. Multi-Sensor Fusion

Multi-sensor fusion is an effective method to improve a robot’s ability to perceive the external environment ^[18]. For example, one common fusion approach is to combine cameras and LiDARs. Cameras can obtain complex external environment information with a high frame rate and high pixel count, but they are easily affected by lighting conditions. On the other hand, LiDAR is less affected by light and can provide more accurate position and depth information, but it cannot capture visual information. By fusing the data from these two sensors, the robustness of perception can be greatly improved ^[19]. Joel et al. fused LiDAR and color imagery for pedestrian detection using CNNs ^[20]. They incorporated LiDAR by up-sampling the point cloud to a dense depth map and extracting three features representing horizontal disparity, height above ground, and angle (HHA) features. These features were then used as extra image channels and fed into CNNs to learn a deep hierarchy of feature representation. Mohamed Dhouioui proposed an embedded system based on two types of data, radar signals and camera images, aiming to identify and classify obstacles on the road. They used machine learning methods and signal processing techniques to optimize the overall computation performance and efficiency ^[21]. Elena incorporated vision and laser fusion techniques for simultaneous localization and mapping of Micro Air Vehicles (MAVs) in indoor rescue and/or identification navigation missions. The technique fused laser and visual information, as well as measurement data from inertial components, to obtain reliable 6DOF pose estimation of MAV within a local map. Experimental results showed that sensor fusion can improve position estimation under different test conditions and obtain accurate maps ^[22]. When considering robotic applications in complex scenarios, traditional geometric maps appear inaccurate due to their lack of interaction with the environment. Based on this, Jing Li et al. proposed building a three-dimensional (3D) semantic map with large-scale and precise integration of LiDAR and camera information to more accurately present real-time road scenes ^[23]. First, they performed SLAM through multi-sensor fusion of LiDAR and inertial measurement unit (IMU) data to locate the robot’s position and build a map of the surrounding scene while the robot moves. Furthermore, they employed a CNN-based image semantic segmentation to develop a semantic map of the environment. To address the incompleteness of environmental perception when using only a 2D LiDAR, they calibrated the point cloud information from the RGBD camera Kinectv2 and the 2D LiDAR using internal and external parameters based on the Cartographer algorithm ^[24]. Precise calibration of the rigid body transform between the sensors is crucial for correct data fusion. To simplify the calibration process, Michelle et al. presented the first framework that makes use of CNNs for odometry estimation by fusing data from 2D laser scanners and monocular cameras without requiring sensor calibration ^[25]. Mary et al. presented a fusion of a six-degrees-of-freedom (6-DoF) inertial sensor and a monocular vision ^[26]. They integrated a monocular vision-based object detection algorithm using Speeded-Up Robust Feature (SURF) and Random Sample Consensus (RANSAC) algorithms to improve the accuracy of detection. By fusing data from inertial sensors and a camera using an Extended Kalman Filter (EKF), they estimated the position and orientation of the mobile robot. Xia X et al. proposed an automated driving systems data acquisition and analytics platform. It presents a holistic pipeline from the raw advanced sensory data collection to data processing, which is capable of processing the sensor data from multi-CAVs (connected automated vehicle) and extracting the objects’ Identity (ID) number, position, speed, and orientation information in the map and Frenet coordinates ^[27]. Liu W et al. proposed a novel kinematic-model-based VSA (vehicle slip angle) estimation method by fusing information from a GNSS and an IMU ^[28]. Xia X et al. proposed a method for the IMU and automotive onboard sensors fusion to estimate the yaw misalignment autonomously ^[29].

4. Deep Learning Method

In the application of assisted driving systems, a model that can accurately identify partially occluded targets in complex backgrounds and perform short-term tracking and the early warning of fully occluded targets is required. Based on this, Kun Wang et al. proposed a method based on YOLOv3 ^[30], which can improve the detection accuracy while supporting real-time operation and realize real-time alarm for completely occluded targets. They first obtained a more appropriate prior frame setting through categorical K-means clustering. Then, they used DIOUNMS instead of the traditional non-maximum suppression (NMS) technique. Additionally, to improve the system’s ability to identify occluded targets, they proposed a tracking algorithm based on Kalman filter and Hungarian matching. Qiu et al. proposed an Adaptive Spatial Feature Fusion (ASFF) YOLOv5 network (ASFF-YOLOv5) to improve the accuracy of recognition and detection of multiple multiscale road traffic elements ^[31]. The first step was to use the K-means algorithm for clustering statistics on the range of multiscale road traffic elements. Then, they employed a spatial pyramid pooling fast (SPPF) structure to enhance the accuracy of information extraction. To address the problems in object detection in drone-captured scenarios due to different altitudes and high drone speeds, Zhu et al. proposed TPH-YOLOv5 to handle different object scales and motion blur ^[32]. Based on YOLOv5, they added an additional prediction head to detect objects of different scales. They replaced the original prediction heads with Transformer Prediction Heads (TPH) and integrated the Convolutional Block Attention Model (CBAM) to identify attention regions in scenarios with dense objects. Experiments on the VisDrone2021 dataset demonstrated that TPH-YOLOv5 performed well, with impressive interpretability, in drone-captured scenarios. Liu W et al. proposed a novel algorithm referred to as YOLOv5-tassel to detect tassels in UAV-based (Unmanned aerial vehicle) RGB imagery ^[33].

References

Chen, G.; Wei, N.; Yan, L.; Lu, H.; Li, J. Perturbation-based approximate analytic solutions to an articulated SLIP model for legged robots. Commun. Nonlinear Sci. Numer. Simul. 2023, 117, 106943.
Hui, Z. Research on Environmental Perception, Recognition and Leader Following Algorithm of the Quadruped Robot. Ph.D. Thesis, Shandong University, Jinan, China, 2016.
Chen, G.; Wang, J.; Wang, S.; Zhao, J.; Shen, W. Compliance control for a hydraulic bouncing system. ISA Trans. 2018, 79, 232–238.
Chen, G.; Wei, N.; Lu, H.; Yan, L.; Li, J. Optimization and evaluation of swing leg retraction for a hydraulic biped robot. J. Field Robot. 2023. early view.
Chen, G.; Guo, S.; Hou, B.; Wang, J. Virtual model control for quadruped robots. IEEE Access 2020, 8, 140736–140751.
Gao, Y.; Wang, D.; Wei, W.; Yu, Q.; Liu, X.; Wei, Y. Constrained Predictive Tracking Control for Unmanned Hexapod Robot with Tripod Gait. Drones 2022, 6, 246.
Lee, J.W.; Lee, W.; Kim, K.D. An algorithm for local dynamic map generation for safe UAV navigation. Drones 2021, 5, 88.
Lee, D.K.; Nedelkov, F.; Akos, D.M. Assessment of Android Network Positioning as an Alternative Source of Navigation for Drone Operations. Drones 2022, 6, 35.
Xia, X.; Hashemi, E.; Xiong, L.; Khajepour, A. Autonomous Vehicle Kinematics and Dynamics Synthesis for Sideslip Angle Estimation Based on Consensus Kalman Filter. IEEE Trans. Control Syst. Technol. 2022, 31, 179–192.
Gao, L.; Xiong, L.; Xia, X.; Lu, Y.; Yu, Z.; Khajepour, A. Improved vehicle localization using on-board sensors and vehicle lateral velocity. IEEE Sens. J. 2022, 22, 6818–6831.
Ramachandran, A.; Sangaiah, A.K. A review on object detection in unmanned aerial vehicle surveillance. Int. J. Cogn. Comput. Eng. 2021, 2, 215–228.
Liang, Y.; Li, M.; Jiang, C.; Liu, G. CEModule: A computation efficient module for lightweight convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021. early access.
Zhou, P.; Liu, G.; Wang, J.; Weng, Q.; Zhang, K.; Zhou, Z. Lightweight unmanned aerial vehicle video object detection based on spatial-temporal correlation. Int. J. Commun. Syst. 2022, 35, e5334.
Ocando, M.G.; Certad, N.; Alvarado, S.; Terrones, Á. Autonomous 2D SLAM and 3D mapping of an environment using a single 2D LIDAR and ROS. In Proceedings of the 2017 Latin American Robotics Symposium (LARS) and 2017 Brazilian Symposium on Robotics (SBR), Curitiba, Brazil, 8–11 November 2017; pp. 1–6.
Jeong, W.; Lee, K.M. CV-SLAM: A new ceiling vision-based SLAM technique. In Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, AB, Canada, 2–6 August 2005; pp. 3195–3200.
Davison, A.J.; Reid, I.D.; Molton, N.D.; Stasse, O. MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 1052–1067.
Belter, D.; Nowicki, M.; Skrzypczyński, P. Evaluating map-based RGB-D SLAM on an autonomous walking robot. In International Conference on Automation, 2–4 March 2016, Warsaw, Poland; Springer: Cham, Switzerland, 2016; pp. 469–481.
Callmer, J.; Törnqvist, D.; Gustafsson, F.; Svensson, H.; Carlbom, P. Radar SLAM using visual features. EURASIP J. Adv. Signal Process. 2011, 2011, 71.
Mittal, A.; Shivakumara, P.; Pal, U.; Lu, T.; Blumenstein, M. A new method for detection and prediction of occluded text in natural scene images. Signal Process. Image Commun. 2022, 100, 116512.
Schlosser, J.; Chow, C.K.; Kira, Z. Fusing lidar and images for pedestrian detection using convolutional neural networks. In Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA), Stockholm, Sweden, 16–21 May 2016; pp. 2198–2205.
Dhouioui, M.; Frikha, T. Design and implementation of a radar and camera-based obstacle classification system using machine-learning techniques. J. Real-Time Image Process. 2021, 18, 2403–2415.
López, E.; Barea, R.; Gómez, A.; Saltos, Á.; Bergasa, L.M.; Molinos, E.J.; Nemra, A. Indoor SLAM for micro aerial vehicles using visual and laser sensor fusion. In Robot 2015: Second Iberian Robotics Conference; Springer: Cham, Switzerland, 2016; pp. 531–542.
Li, J.; Zhang, X.; Li, J.; Liu, Y.; Wang, J. Building and optimization of 3D semantic map based on Lidar and camera fusion. Neurocomputing 2020, 409, 394–407.
Jin, D. Research on Laser Vision Fusion SLAM and Navigation for Mobile Robots in Complex Indoor Environments. Ph.D. Thesis, Harbin Institute of Technology, Harbin, China, 2020.
Valente, M.; Joly, C.; de La Fortelle, A. Deep sensor fusion for real-time odometry estimation. In Proceedings of the 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, 3–8 November 2019; pp. 6679–6685.
Alatise, M.B.; Hancke, G.P. Pose estimation of a mobile robot based on fusion of IMU data and vision data using an extended Kalman filter. Sensors 2017, 17, 2164.
Xia, X.; Meng, Z.; Han, X.; Li, H.; Tsukiji, T.; Xu, R.; Zheng, Z.; Ma, J. An automated driving systems data acquisition and analytics platform. Transp. Res. Part C Emerg. Technol. 2023, 151, 104120.
Liu, W.; Xia, X.; Xiong, L.; Lu, Y.; Gao, L.; Yu, Z. Automated vehicle sideslip angle estimation considering signal measurement characteristic. IEEE Sens. J. 2021, 21, 21675–21687.
Xia, X.; Xiong, L.; Huang, Y.; Lu, Y.; Gao, L.; Xu, N.; Yu, Z. Estimation on IMU yaw misalignment by fusing information of automotive onboard sensors. Mech. Syst. Signal Process. 2022, 162, 107993.
Wang, K.; Liu, M.; Ye, Z. An advanced YOLOv3 method for small-scale road object detection. Appl. Soft Comput. 2021, 112, 107846.
Qiu, M.; Huang, L.; Tang, B.H. ASFF-YOLOv5: Multielement detection method for road traffic in UAV images based on multiscale feature fusion. Remote Sens. 2022, 14, 3498.
Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788.
Liu, W.; Quijano, K.; Crawford, M.M. YOLOv5-Tassel: Detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 8085–8094.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Engineering, Mechanical

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Guangrong Chen

Liang Hong

View Times: 223

Update Date: 12 Jun 2023

Table of Contents

Video Upload Options

Confirm