Vision-Based Robotic Applications: Comparison
Please note this is a comparison between Version 1 by Md Tanzil Shahria and Version 2 by Sirius Huang.

Being an emerging technology, robotic manipulation has encountered tremendous advancements due to technological developments starting from using sensors to artificial intelligence. Over the decades, robotic manipulation has advanced in terms of the versatility and flexibility of mobile robot platforms. Thus, robots are now capable of interacting with the world around them. To interact with the real world, robots require various sensory inputs from their surroundings, and the use of vision is rapidly increasing, as vision is unquestionably a rich source of information for a robotic system.

  • computer vision
  • robot manipulation
  • sensors
  • vision-based control

1. Introduction

Robotic manipulation alludes to the manner in which robots directly and indirectly interact with surrounding objects. Such interaction includes picking and grasping objects [1][2][3][1,2,3], moving objects from place to place [4][5][4,5], folding laundry [6], packing boxes [7], operating as per user requirement, etc. Object manipulation is considered the pivotal role of robotics. Over time, robot manipulation has encountered considerable changes that cause technological development in both industry and academia.
Manual robot manipulation was one of the initial steps of automation [8][9][8,9]. A manual robot refers to a manipulation system that requires continuous human involvement to operate [10]. In the beginning, spatial algebra [11], forward kinematics [12][13][14][12,13,14], differential kinematics [15][16][17][15,16,17], inverse kinematics [18][19][20][21][22][18,19,20,21,22], etc. were explored by researchers for pick and place tasks, which is not the only application of robotic manipulation systems but the stepping-stone for a wide range of possibilities [23]. The capability of gripping, holding, and manipulating objects requires dexterity, perception of touch, and response from eyes and muscles; mimicking all these attributes is a complex and tedious task [24]. Thus, researchers have explored a wide range of algorithms to adopt and design more efficient and appropriate models for this task. Through time, manual manipulators got advanced and had individual control systems according to their specification and application [25][26][25,26].
Along with the individual use of robotic manipulation systems, it has a wide range of industrial applications nowadays as it can be applied to complex and diverse tasks [27]. Hence, typical manipulative devices have become less suited in these times [28]. Different kinds of new technologies, such as wireless communication, augmented reality [29], etc., are being adopted and applied in manipulation systems to uncover the most suitable and friendly human–robot collaboration model for specific tasks [30]. To make the process more efficient and productive and to obtain successful execution, researchers have introduced automation in this field [31].
To habituate to the automated system, researchers first introduced automation in the motion planning technique [3][32][3,32], which eventually contributed to the automated robotic manipulation system. Automated and semi-automated manipulation systems not only boost the performance of industrial robots but also contribute to other fields of robotics such as mobile robots [33], assistive robots [34], swarm robots [35], etc. While designing the automated system, the utilization of vision is increasing rapidly as vision is undoubtedly a loaded source of information [36][37][38][36,37,38]. By properly utilizing vision-based data, a robot can identify, map, localize, and calculate various measurements of any object and respond accordingly to complete its tasks [39][40][41][42][39,40,41,42]. Various studies confirm that vision-based approaches are more appropriate in different fields of robotics such as swarm robotics [35], fruit-picking robots [1], robotic grasping [43], mobile robots [33][44][45][33,44,45], aerial robotics [46], surgical robots [47], etc. To process the vision-based data, different approaches are being introduced by the researchers. However, learning-based approaches are at the center of such autonomous approaches, as in the real world, there are too many deviations and learning algorithms that help the robot gain knowledge from its experience with the environment [48][49][50][48,49,50]. Among different learning methods, various neural network-based models [51][52][53][54][51,52,53,54], deep learning-based models [49][50][54][55][56][49,50,54,55,56], and transfer learning models [57][58][59][60][57,58,59,60] are mostly exercised by the experts of manipulation systems, whereas different filter-based approaches are also popular among researchers [61][62][63][61,62,63].

2. Current State

A common structural assumption for manipulative tasks for a robot is that an object or set of objects in the environment is what the robot is trying to manipulate. Because of this, generalization via objects—both across different objects and between similar (or identical) objects in different task instances—is an important aspect of learning to manipulate. Commonly used object-centric manipulation skills and task model representations are often sufficient to generalize across tasks and objects, but adapting to differences in shape, properties, and appearance is required. A wide range of robotic manipulation problems can be solved using the vision-based approach as it works as a better sensory source for the system. Because of that and the availability of a fast processing power, vision-based approaches have become very popular among researchers who are working on robotic manipulation-based problems. A chronological observation depicting the contributions of the researchers based on the addressed problems and their outcomes is compiled in Table 1.
Table 1.
Chronological progression of the vision-based approach.
Figure 1 2 represents the basic categorization of the problems addressed by the researchers. The problems are primarily divided into two categories: control-based problems and application-based problems. Each of these problems is further categorized into several sub-categories. While dealing with control-based problems such as human demonstration-based control [78][93][97][78,93,97], vision (raw images)-based control [74][82][83][85][86][74,82,83,85,86], multi-agent system control [84][92][100][105][84,92,100,105], etc., researchers have tried and succeeded to solve them by adopting vision-based approaches. The addressed control-based problems are designing a vision-based real-time mobile robot controller [74], multi-task learning from demonstration [78][102][106][78,102,106], nonlinear approximation in the control and monitoring of mobile robots [82], control of cable-driven robots [83], leader–follower formation control [84], motion control for a free-floating robot [85], control of soft robots [86], controllers for decentralized robot swarms [92], robot manipulation via human demonstrations [93], and imitation learning for robotic manipulation [97].
Figure 12.
Categorization of problems addressed by the researchers.
Similarly, while solving application-based problems such as object recognition and manipulation [67][69][70][71][77][79][80][91][101][67,69,70,71,77,79,80,91,101], navigation of robots [68][72][73][75][99][104][108][109][68,72,73,75,99,104,108,109], robotic grasping [76][90][107][76,90,107], human–robot interaction [96], etc., researchers successfully applied vision-based approaches and obtained very promising results. The addressed application-based problems are the manipulations of the deformable objects, such as ropes [67], a vision-based tracking system for aerial vehicles [68], object detection without a graphics processing unit (GPU) support for robotic applications [69], detecting and following the human user with a robotic blimp [70], object detection and recognition for autonomous assistive robots [71], path-finding for a humanoid robot [72] or robotic arms [108], navigation of an unmanned surface vehicle [73], vision-based target detection for the safe landing of UAV in both fixed [75] and moving platforms [81], vision-based grasping for robots [76], vision-based dynamic manipulation [77], vision-based object sorting robot manipulator [79], learning complex robotic skills from raw sensory inputs [80], grasping under occlusion for manipulating a robotic system [90], recognition and manipulation of objects [91], human–robot handover applications [96], targeted drug delivery in biological research [103], uncertainty in DNN-based robotic grasping [107], and object tracking via a robotic arm in a real-time 3D environment [109].

3. Applications

Vision-based autonomous robot manipulation for various applications has received a lot of attention in the recent decade. Manipulation based on vision occurs when a robot manipulates an item utilizing computer vision with the feedback from the data of one or more camera sensors. The increased complexity of jobs performed by fully autonomous robots has resulted from advances in computer vision and artificial intelligence. A lot of research is going on in the computer vision field, and it may be able to provide us with more natural, non-contact solutions in the future. Human intelligence is also required for robot decision-making and control in situations in which the environment is mainly unstructured, the objects are unfamiliar, and the motions are unknown. A human–robot interface is a fundamental approach for teleoperation solutions because it serves as a link between the human intellect and the actual motions of the remote robot. The current approach of robot-manipulator in teleoperation, which makes use of vision-based tracking, allows the communication of tasks to the robot manipulator in a natural way, often utilizing the same hand gestures that would ordinarily be used for a task. The use of direct position control of the robot end-effector in vision-based robot manipulation allows for greater precision in manipulating robots. Manipulation of deformable objects, autonomous vision-based tracking systems, tracking moving objects of interest, visual-based real-time robot control, vision-based target detection as well as object recognition, multi-agent system leader–follower formation control using a vision-based tracking scheme and vision-based grasping method to grasp the target object for manipulation are some of the well-known applications in vision-based robot manipulation. Researchers have classified the application of vision-based works into six categories: manipulation of the object, vision-based tracking, object detection, pathfinding/navigation, real-time remote control, and robotic arm/grasping. The summary of recent vision-based applications are mentioned in Table 2.
Table 2.
Application of vision-based works.
Video Production Service