Imitation Learning (IL) is an approach through which an expert teaches a robot to perform a task. An expert can be a human, a trained policy for a task, a synthetic agent, a demonstration from another robot or even a video demonstration of a desired task.
1. Introduction
The flexibility, deformability, and adaptability of soft robots offer significant potential to contribute to society by emulating the performance of biological systems in complex environments
[1]. However, these advantageous features also pose challenges for accurate modeling and dynamic control of such platforms, especially in complex, unstructured, and unpredictable environments
[2]. Although modeling soft robots is widely regarded as a challenging task, numerous researchers have proposed a variety of solutions
[3]. These solutions span from geometrical
[4][5], analytical
[6][7], numerical
[8][9], to learning-based approaches both in the kinematics and dynamics domain
[10][11], respectively.
In the kinematics domain, several control strategies have been proposed for task-space trajectory following
[12] including Jacobian estimation inspired by rigid robot controllers
[13], fuzzy model-based controllers
[14], Probabilistic Movement Primitives (ProMP)-based controller
[15], and inverse kinematics-assisted control strategies
[16]. In the dynamics domain, control approaches such as dynamics predictive control
[17][18], open-loop
[19] or closed-loop
[20][21][22], learning-based control with a model
[23][24], or in a model-free setting
[10][25][26][27] have been proposed for a variety of tasks, including trajectory following, shape control, object/point tracking, and more.
However, practical real-life applications of soft robotics are yet to be fully realized. Carefully reflecting on the current literature, one may observe the dependency of the control solutions on the modeling approaches (analytical, numerical or data-driven) that may potentially be among the reasons to have affected said realization. While the literature acknowledges the potential drawbacks caused by constrained behavior captured by the modeling schemes and thereby limiting the capabilities of the control solutions, there is a paucity of work devoted to either deriving a control solution primarily on the physical robot or online optimizing the derived control solution on the soft robot. This way, it is also possible to limit the sim-to-real gap, which is recognized as one of the main sources of error for soft robot control
[10].
2. Imitation Learning (IL) and Generalization Capability
IL is an approach through which an expert teaches a robot to perform a task. An expert can be a human, a trained policy for a task, a synthetic agent, a demonstration from another robot or even a video demonstration of a desired task. The tasks successfully achieved with this class of algorithms extend from laboratory environment
[28][29] to industrial applications
[30][31]. Algorithms that learn based on exploring their workspace tend to requisite extensive resources (in terms of computational power and time) to produce a policy like RL. The policies thus trained have been found to be robust towards dynamic environments, as they are able to generalize over eluded states
[32]. On the other hand, the salient attributes of IL include their ability to train a policy faster and without a handcrafted reward function
[33]. In IL, exploration is not considered, so the policy is restricted to the knowledge taught by the expert. This causes the approach to fail under unfamiliar scenarios.
Zhang et al.
[34] presented a novel approach to overcome this issue by introducing progressive learning, inspired by the way humans learn to perform a task from a few demonstrations. Their solution was tested with a 4-DoF experimental setup for pouring granular substance from one container to the other, and generalized across different backgrounds than the one with the expert demonstrations. Sasagawa et al.
[30] presented a bilateral control strategy based on position and force. The demonstration setting included a human kinesthetically demonstrating the task on the master side while the robot followed the demonstrations on the slave side for a task of serving food on a plate using a spoon. The generalization capability of the proposed solution was tested against varying sizes of the served objects, the serving spoon length, and the height of the plate.
Among the proposed approaches to overcome said failures in IL, it is recommended in the literature to combine IL with RL. Preliminary results can be obtained with IL, and RL implementation can then be used to intelligently explore the environment to generalize over unfamiliar states as in Zhu et al.
[35], Perico et al.
[36], Sasaki et al.
[37] and Stadie et al.
[38]. In addition, IL complements RL in situations where RL cannot achieve pragmatic solutions, such as performing tasks in environments where the reward is either sparse or delayed
[39]. While this combination has managed to successfully learn policies that are able to generalize well over various scenarios, the solutions presented by Zhu el al. and Perico et al. require more than 1 million and 120,000 (10 min long dataset at 200 Hz) time samples for policy training (approximately
99.75%99.75% and
98%98% more data samples than in Soft DAgger, respectively), which makes these solutions impractical for direct policy learning using soft robots.
On the other hand, the solution by Sasaki et al. has the capability to learn the policy from scratch using approximately 20,000 time samples, which is 87%87% more samples than in Soft DAgger. Conversely, the solution presented by Stadie et al. demonstrated the ability to generalize a previously trained policy in a new environment using approximately 2500 to 3000 time samples, thereby exhibiting similar sample efficiency as Soft DAgger. However, the solution by Stadie et al. was tested in simulation with a fixed Degrees of Freedom (Dof) experimental setup (i.e., three DoF or less), while soft robots are known to exhibit extremely non-linear behavior due to their virtually infinite degrees of freedom, so their solution, while it may be practical for learning to adapt even for soft robots (with increased number of iterations), may also require an impractical number of samples to train the first policy.
Most IL algorithms assume that the expert is an improved and final version of the supervisor. However, there may be scenarios in which the expert is also evolving with the passage of time, for example, a human also learning to perform the task while conducting the demonstrations, or a controller learning from a synthetic RL agent while also responsible for providing actions
[40]. The training data may then include conflicting datasets for policy training, i.e., the same input data may lead to different actions depending on the supervisor at that particular instant. Balakrishna et al.
[41] presented an approach to address this issue, which they claim outperforms deep RL baselines in continuous control tasks and drastically accelerates policy evaluation. Nevertheless, the issue of sample inefficiency persists due to the supervising RL-agent. In addition to RL, the concept of meta-IL (a class of algorithms that combine meta-learning with IL) has also been presented by Duan et al.
[42] and Finn et al.
[43], which enables a robot to learn more efficiently and generalize across multiple tasks in fewer iterations. It is acknowledged here that the solutions in
[42][43] exhibit incredible sample efficiency for task generalization. However, the proposed solutions have been evaluated with considerably smaller state action space (compared to a soft robotic platform). Nevertheless, this class of algorithms can be an area to explore for soft robot control.