Robot Target Recognition and Grasping Pose Estimation

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Bin Zhao	--	1102	2024-01-04 16:15:35

This entry is adapted from the peer-reviewed paper 10.3390/s24010195

The deployment of collaborative robots on a large scale in intelligent factories has increased with the convenience of human–robot collaboration to replace manual labor in undertaking 3C (Computer, Communication, and Consumer Electronics) manufacturing and intelligent handling. As a result, the intelligence, speed, and reliability of these robots have become core factors that impact the productivity and production quality of intelligent factories. Objectively sorting objects is a vital and habitual responsibility in intelligent production lines featuring collaborative robots.

deep learning collaborative robots grasping detection

1. Introduction

In recent years, the deployment of collaborative robots on a large scale in intelligent factories has increased with the convenience of human–robot collaboration to replace manual labor in undertaking 3C (Computer, Communication, and Consumer Electronics) manufacturing and intelligent handling. As a result, the intelligence, speed, and reliability of these robots have become core factors that impact the productivity and production quality of intelligent factories. Objectively sorting objects is a vital and habitual responsibility in intelligent production lines featuring collaborative robots. However, the conventional visual grasping technique solely operates on recognized, regularly formed pieces on the plane. This inability to accommodate intelligent factory products’ diverse and custom grasping needs is a significant limitation. Furthermore, manual sorting and grasping carried out by collaborative robots are prone to errors and omissions in intelligent production lines that involve many tasks and fast-paced operations. Such errors degrade the quality of the products in the intelligent production line and lead to confusion in the scheduling of upstream and downstream operations, ultimately leading to increased production costs. The vision-based collaborative robot grasping problem has become a widespread research topic in industry and academia.

To address these issues, implementing deep vision technology is a viable technical solution ^[1]. Deep vision technology accurately identifies and locates target objects on the intelligent production line. Simultaneously, it predicts the width and angle of the grasped objects, and the combination can accurately and efficiently complete the visual grasping task. Existing visual grasping techniques can accurately recognize and grasp objects with known types, regular shapes, and placement on a flat surface ^[2]. However, there rarely exist mature methods to identify object categories and simultaneously predict grasping scales and postures. Objective evaluations of grasping techniques are necessary to advance robotic grasping capabilities ^[3]^[4]^[5].

In intelligent production lines, the sorting of objects is a composite task comprising of two subtasks. First, targets must be classified based on their categories, which requires different sorting strategies such as varying speeds and destinations for each category. Then, after determining the category of the target, we grab and move the target to a specific location. We prioritize the orientation of the object and the opening width of the clamp in this process.

2. Target Classification and Detection

The objective of target detection is to identify the class and location of a target in a specified scene. Deep learning has yielded promising results in this area. The progress in computing capability combined with advances in deep learning has led to many fruitful outcomes in the field of target detection ^[6]. In the domain of 2D target detection, two categories of algorithmic approaches exist: the two-stage method exemplified by R-CNN and Faster R-CNN and the single-stage method represented by YOLO and SSD. Each type of approach has different priorities; the two-stage method focuses on accuracy but exhibits slower detection speed, whereas the single-stage method prioritizes detection speed over a certain degree of detection accuracy reduction.

In recent years, researchers have focused more on one-stage methods than two-stage methods because of their more concise designs and competitive performance ^[7]. Based on R-CNN and YOLO, Lin introduced RetinaNet, which concentrates the detector on hard-to-classify targets during training using the Focal Loss function. This leads to a detection accuracy that is comparable to Faster R-CNN. Meanwhile, Li proposed RepPoints, which employs a point set to predict targets and introduces spatial constraints for penalizing outliers during adaptive learning; the algorithm outperforms Faster R-CNN. Law and Deng departed from the previous assumption and considered target detection as the issue of key points. They proposed CornerNet and achieved excellent outcomes during that time. Zhou proposed CenterNet, which utilizes anchors only at the current target location rather than the entire image. Additionally, CenterNet does not require NMS for further filtering and achieves higher accuracy compared to previous methods. Faster R-CNN is a prevalent region-based, two-stage target detection method in computer vision. Its previous versions include R-CNN and Fast R-CNN. The basic idea of Fast R-CNN is to use a continuous convolutional layer to derive a feature map from the input image. A selective search algorithm is then applied to obtain region proposals, followed by pooling to adjust the proposed regions to a fixed size. Finally, the regions are inputted to the fully connected layer for classification and regression.

3. Target Grasping Pose Estimation

Obtaining visual information from various sensors is a more convenient alternative to obtaining a precise 3D model of an object. The research literature demonstrates excellent results in target classification research ^[8]^[9]^[10] and in target detection research ^[11]^[12], where the ability of deep learning to automatically learn image features has been extensively investigated. Many researchers have conducted extensive research in the area of robot grasping detection. Y. Domae proposed a technique to acquire the best grasping pose by developing a model of both the gripper and the target to be grasped, using a depth map. Suitable for settings with varying placement of multiple targets, the gripper model utilizes two mask images ^[13]^[14]^[15]. One of the images describes the contact region where the target object should be placed for stable grasping. The second image describes the collision region that should not be occupied by any other objects during the grasping process, in order to avoid collisions. The measure of graspability is calculated by convolving the mask image with a depth map that has been converted into a binary form. The threshold for each region differs based on the minimum height of 3D points in the region and the length of the gripper ^[16]^[17]^[18]. The proposed approach is appropriate for general objects since it does not presume any 3D model of the object ^[19]^[20]. Jeffrey Mahler developed Dex-Net2.0, which employs probabilistic models to create a comprehensive point cloud and robust grasping plans with precise labeling. Dex-Net 2.0 designs a deep grasp quality convolutional neural network (GQ-CNN) model and trains it to evaluate the robustness of grasping with candidate grasp planning and point cloud. The GQ-CNN model allows for obtaining candidate grasping plans by using edge detection on the input point cloud. By sampling these candidate plans, the most robust grasping can be performed based on the GQ-CNN estimation, enabling the planning of grasping on the actual robot.

References

Mohammed, M.Q.; Kwek, L.C.; Chua, S.C.; Aljaloud, A.S.; Al-Dhaqm, A.; Al-Mekhlafi, Z.G.; Mohammed, B.A. Deep reinforcement learning-based robotic grasping in clutter and occlusion. Sustainability 2021, 13, 13686.
Levine, S.; Pastor, P.; Krizhevsky, A.; Ibarz, J.; Quillen, D. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 2018, 37, 421–436.
Ku, Y.; Yang, J.; Fang, H.; Xiao, W.; Zhuang, J. Deep learning of grasping detection for a robot used in sorting construction and demolition waste. J. Mater. Cycles Waste Manag. 2020, 23, 84–95.
Zhang, H.; Peeters, J.; Demeester, E.; Kellens, K. Deep Learning Reactive Robotic Grasping with a Versatile Vacuum Gripper. IEEE Trans. Robot. 2022, 39, 1244–1259.
Shang, W.; Song, F.; Zhao, Z.; Gao, H.; Cong, S.; Li, Z. Deep learning method for grasping novel objects using dexterous hands. IEEE Trans. Cybern. 2020, 52, 2750–2762.
Chen, P.; Lu, W. Deep reinforcement learning based moving object grasping. Inf. Sci. 2021, 565, 62–76.
Hu, J.; Li, Q.; Bai, Q. Research on Robot Grasping Based on Deep Learning for Real-Life Scenarios. Micromachines 2023, 14, 1392.
Kleeberger, K.; Bormann, R.; Kraus, W.; Huber, M.F. A survey on learning-based robotic grasping. Curr. Robot. Rep. 2020, 1, 239–249.
Wei, B.; Ye, X.; Long, C.; Du, Z.; Li, B.; Yin, B.; Yang, X. Discriminative Active Learning for Robotic Grasping in Cluttered Scene. IEEE Robot. Autom. Lett. 2023, 8, 1858–1865.
Bergamini, L.; Sposato, M.; Pellicciari, M.; Peruzzini, M.; Calderara, S.; Schmidt, J. Deep learning-based method for vision-guided robotic grasping of unknown objects. Adv. Eng. Inform. 2020, 44, 101052.
Sekkat, H.; Tigani, S.; Saadane, R.; Chehri, A. Vision-based robotic arm control algorithm using deep reinforcement learning for autonomous objects grasping. Appl. Sci. 2021, 11, 7917.
Zhong, B.; Huang, H.; Lobaton, E. Reliable vision-based grasping target recognition for upper limb prostheses. IEEE Trans. Cybern. 2022, 52, 1750–1762.
Zhao, B.; Wu, C.; Zou, F.; Zhang, X.; Sun, R.; Jiang, Y. Research on Small Sample Multi-Target Grasping Technology Based on Transfer Learning. Sensors 2023, 23, 5826.
Zhao, B.; Wu, C.; Zhang, X.; Sun, R.; Jiang, Y. Target grasping network technology of robot manipulator based on attention mechanism. J. Jilin Univ. (Eng. Technol. Ed.) 2023, 1–9.
Kumra, S.; Joshi, S.; Sahin, F. Gr-convnet v2: A real-time multi-grasp detection network for robotic grasping. Sensors 2022, 22, 6208.
Yun, J.; Jiang, D.; Sun, Y.; Huang, L.; Tao, B.; Jiang, G.; Kong, J.; Weng, Y.; Li, G.; Fang, Z. Grasping pose detection for loose stacked object based on convolutional neural network with multiple self-powered sensors information. IEEE Sens. J. 2022, 23, 20619–20632.
Newbury, R.; Gu, M.; Chumbley, L.; Mousavian, A.; Eppner, C.; Leitner, J.; Bohg, J.; Morales, A.; Asfour, T.; Kragic, D.; et al. Deep learning approaches to grasp synthesis: A review. IEEE Trans. Robot. 2023, 39, 3994–4015.
Wong, C.C.; Chien, M.Y.; Chen, R.J.; Aoyama, H.; Wong, K.Y. Moving object prediction and grasping system of robot manipulator. IEEE Access 2022, 10, 20159–20172.
Santhakumar, K.; Kasaei, H. Lifelong 3D object recognition and grasp synthesis using dual memory recurrent self-organization networks. Neural Netw. 2022, 150, 167–180.
Yin, Z.; Li, Y. Overview of robotic grasp detection from 2D to 3D. Cogn. Robot. 2022, 2, 73–82.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Engineering, Industrial

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Ruohuai Sun

Chengdong Wu

Xue Zhao

Bin Zhao

Yang Jiang

View Times: 131

Update Date: 04 Jan 2024

Table of Contents

Video Upload Options

Confirm

1. Introduction

2. Target Classification and Detection

3. Target Grasping Pose Estimation

References