Imitation Learning and Generalization Capability

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Muhammad Sunny Nazeer	--	1164	2023-10-11 12:01:24	\|
2	layout	Camila Xu	Meta information modification	1164	2023-10-12 02:54:17	\|

This entry is adapted from the peer-reviewed paper 10.3390/s23198278

Imitation Learning (IL) is an approach through which an expert teaches a robot to perform a task. An expert can be a human, a trained policy for a task, a synthetic agent, a demonstration from another robot or even a video demonstration of a desired task.

DAgger algorithm dynamic behavioral mapping imitation learning

1. Introduction

The flexibility, deformability, and adaptability of soft robots offer significant potential to contribute to society by emulating the performance of biological systems in complex environments ^[1]. However, these advantageous features also pose challenges for accurate modeling and dynamic control of such platforms, especially in complex, unstructured, and unpredictable environments ^[2]. Although modeling soft robots is widely regarded as a challenging task, numerous researchers have proposed a variety of solutions ^[3]. These solutions span from geometrical ^[4]^[5], analytical ^[6]^[7], numerical ^[8]^[9], to learning-based approaches both in the kinematics and dynamics domain ^[10]^[11], respectively.

In the kinematics domain, several control strategies have been proposed for task-space trajectory following ^[12] including Jacobian estimation inspired by rigid robot controllers ^[13], fuzzy model-based controllers ^[14], Probabilistic Movement Primitives (ProMP)-based controller ^[15], and inverse kinematics-assisted control strategies ^[16]. In the dynamics domain, control approaches such as dynamics predictive control ^[17]^[18], open-loop ^[19] or closed-loop ^[20]^[21]^[22], learning-based control with a model ^[23]^[24], or in a model-free setting ^[10]^[25]^[26]^[27] have been proposed for a variety of tasks, including trajectory following, shape control, object/point tracking, and more.

However, practical real-life applications of soft robotics are yet to be fully realized. Carefully reflecting on the current literature, one may observe the dependency of the control solutions on the modeling approaches (analytical, numerical or data-driven) that may potentially be among the reasons to have affected said realization. While the literature acknowledges the potential drawbacks caused by constrained behavior captured by the modeling schemes and thereby limiting the capabilities of the control solutions, there is a paucity of work devoted to either deriving a control solution primarily on the physical robot or online optimizing the derived control solution on the soft robot. This way, it is also possible to limit the sim-to-real gap, which is recognized as one of the main sources of error for soft robot control ^[10].

2. Imitation Learning (IL) and Generalization Capability

IL is an approach through which an expert teaches a robot to perform a task. An expert can be a human, a trained policy for a task, a synthetic agent, a demonstration from another robot or even a video demonstration of a desired task. The tasks successfully achieved with this class of algorithms extend from laboratory environment ^[28]^[29] to industrial applications ^[30]^[31]. Algorithms that learn based on exploring their workspace tend to requisite extensive resources (in terms of computational power and time) to produce a policy like RL. The policies thus trained have been found to be robust towards dynamic environments, as they are able to generalize over eluded states ^[32]. On the other hand, the salient attributes of IL include their ability to train a policy faster and without a handcrafted reward function ^[33]. In IL, exploration is not considered, so the policy is restricted to the knowledge taught by the expert. This causes the approach to fail under unfamiliar scenarios.

Zhang et al. ^[34] presented a novel approach to overcome this issue by introducing progressive learning, inspired by the way humans learn to perform a task from a few demonstrations. Their solution was tested with a 4-DoF experimental setup for pouring granular substance from one container to the other, and generalized across different backgrounds than the one with the expert demonstrations. Sasagawa et al. ^[30] presented a bilateral control strategy based on position and force. The demonstration setting included a human kinesthetically demonstrating the task on the master side while the robot followed the demonstrations on the slave side for a task of serving food on a plate using a spoon. The generalization capability of the proposed solution was tested against varying sizes of the served objects, the serving spoon length, and the height of the plate.

Among the proposed approaches to overcome said failures in IL, it is recommended in the literature to combine IL with RL. Preliminary results can be obtained with IL, and RL implementation can then be used to intelligently explore the environment to generalize over unfamiliar states as in Zhu et al. ^[35], Perico et al. ^[36], Sasaki et al. ^[37] and Stadie et al. ^[38]. In addition, IL complements RL in situations where RL cannot achieve pragmatic solutions, such as performing tasks in environments where the reward is either sparse or delayed ^[39]. While this combination has managed to successfully learn policies that are able to generalize well over various scenarios, the solutions presented by Zhu el al. and Perico et al. require more than 1 million and 120,000 (10 min long dataset at 200 Hz) time samples for policy training (approximately

99.75 %

and

98 %

more data samples than in Soft DAgger, respectively), which makes these solutions impractical for direct policy learning using soft robots.

On the other hand, the solution by Sasaki et al. has the capability to learn the policy from scratch using approximately 20,000 time samples, which is

87 %

more samples than in Soft DAgger. Conversely, the solution presented by Stadie et al. demonstrated the ability to generalize a previously trained policy in a new environment using approximately 2500 to 3000 time samples, thereby exhibiting similar sample efficiency as Soft DAgger. However, the solution by Stadie et al. was tested in simulation with a fixed Degrees of Freedom (Dof) experimental setup (i.e., three DoF or less), while soft robots are known to exhibit extremely non-linear behavior due to their virtually infinite degrees of freedom, so their solution, while it may be practical for learning to adapt even for soft robots (with increased number of iterations), may also require an impractical number of samples to train the first policy.

Most IL algorithms assume that the expert is an improved and final version of the supervisor. However, there may be scenarios in which the expert is also evolving with the passage of time, for example, a human also learning to perform the task while conducting the demonstrations, or a controller learning from a synthetic RL agent while also responsible for providing actions ^[40]. The training data may then include conflicting datasets for policy training, i.e., the same input data may lead to different actions depending on the supervisor at that particular instant. Balakrishna et al. ^[41] presented an approach to address this issue, which they claim outperforms deep RL baselines in continuous control tasks and drastically accelerates policy evaluation. Nevertheless, the issue of sample inefficiency persists due to the supervising RL-agent. In addition to RL, the concept of meta-IL (a class of algorithms that combine meta-learning with IL) has also been presented by Duan et al. ^[42] and Finn et al. ^[43], which enables a robot to learn more efficiently and generalize across multiple tasks in fewer iterations. It is acknowledged here that the solutions in ^[42]^[43] exhibit incredible sample efficiency for task generalization. However, the proposed solutions have been evaluated with considerably smaller state action space (compared to a soft robotic platform). Nevertheless, this class of algorithms can be an area to explore for soft robot control.

References

Kim, S.; Laschi, C.; Trimmer, B. Soft robotics: A bioinspired evolution in robotics. Trends Biotechnol. 2013, 31, 287–294.
Rus, D.; Tolley, M. Design, fabrication and control of soft robots. Nature 2015, 521, 467–475.
Armanini, C.; Boyer, F.; Mathew, A.T.; Duriez, C.; Renda, F. Soft Robots Modeling: A Structured Overview. arXiv 2021, arXiv:2112.03645.
Webster, R.J., III; Jones, B. Design and Kinematic Modeling of Constant Curvature Continuum Robots: A Review. Int. J. Robot. Res. 2010, 29, 1661–1683.
Chawla, A.; Frazelle, C.; Walker, I. A Comparison of Constant Curvature Forward Kinematics for Multisection Continuum Manipulators. In Proceedings of the 2018 Second IEEE International Conference on Robotic Computing (IRC), Laguna Hills, CA, USA, 31 January–2 February 2018; pp. 217–223.
Cao, D.; Tucker, R. Nonlinear dynamics of elastic rods using the Cosserat theory: Modelling and simulation. Int. J. Solids Struct. 2008, 45, 460–477.
Rucker, D.C.; Webster, R.J., III. Statics and Dynamics of Continuum Robots With General Tendon Routing and External Loading. IEEE Trans. Robot. 2011, 27, 1033–1044.
Pozzi, M.; Miguel, E.; Deimel, R.; Malvezzi, M.; Bickel, B.; Brock, O.; Prattichizzo, D. Efficient FEM-Based Simulation of Soft Robots Modeled as Kinematic Chains. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 4206–4213.
Schegg, P.; Duriez, C. Review on generic methods for mechanical modeling, simulation and control of soft robots. PLoS ONE 2022, 17, e0251059.
Kim, D.; Kim, S.H.; Kim, T.; Kang, B.B.; Lee, M.; Park, W.; Ku, S.; Kim, D.; Kwon, J.; Lee, H.; et al. Review of machine learning methods in soft robotics. PLoS ONE 2021, 16, e0246102.
George Thuruthel, T.; Ansari, Y.; Falotico, E.; Laschi, C. Control Strategies for Soft Robotic Manipulators: A Survey. Soft Robot. 2018, 5, 149–163.
George Thuruthel, T.; Falotico, E.; Manti, M.; Pratesi, A.; Cianchetti, M.; Laschi, C. Learning Closed Loop Kinematic Controllers for Continuum Manipulators in Unstructured Environments. Soft Robot. 2017, 4, 285–296.
Vannucci, L.; Falotico, E.; Di Lecce, N.; Dario, P.; Laschi, C. Integrating feedback and predictive control in a Bio-inspired model of visual pursuit implemented on a humanoid robot. Lect. Notes Comput. Sci. 2015, 9222, 256–267.
Qi, P.; Liu, C.; Ataka, A.; Lam, H.K.; Althoefer, K. Kinematic Control of Continuum Manipulators Using a Fuzzy-Model-Based Approach. IEEE Trans. Ind. Electron. 2016, 63, 5022–5035.
Oikonomou, P.; Dometios, A.; Khamassi, M.; Tzafestas, C.S. Task Driven Skill Learning in a Soft-Robotic Arm. In Proceedings of the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021), Prague, Czech Republic, 27 September–1 October 2021.
Mahl, T.; Hildebrandt, A.; Sawodny, O. A Variable Curvature Continuum Kinematics for Kinematic Control of the Bionic Handling Assistant. IEEE Trans. Robot. 2014, 30, 935–949.
Gillespie, M.; Best, C.; Townsend, E.; Wingate, D.; Killpack, M. Learning nonlinear dynamic models of soft robots for model predictive control with neural networks. In Proceedings of the 2018 IEEE International Conference on Soft Robotics (RoboSoft), Livorno, Italy, 24–28 April 2018; pp. 39–45.
Thuruthel, T.G.; Falotico, E.; Renda, F.; Laschi, C. Learning dynamic models for open loop predictive control of soft robotic manipulators. Bioinspiration Biomimetics 2017, 12, 066003.
Thuruthel, T.G.; Falotico, E.; Manti, M.; Laschi, C. Stable Open Loop Control of Soft Robotic Manipulators. IEEE Robot. Autom. Lett. 2018, 3, 1292–1298.
Thuruthel, T.G.; Falotico, E.; Renda, F.; Laschi, C. Model-Based Reinforcement Learning for Closed-Loop Dynamic Control of Soft Robotic Manipulators. IEEE Trans. Robot. 2019, 35, 124–134.
Wu, Q.; Gu, Y.; Li, Y.; Zhang, B.; Chepinskiy, S.A.; Wang, J.; Zhilenkov, A.A.; Krasnov, A.Y.; Chernyi, S. Position Control of Cable-Driven Robotic Soft Arm Based on Deep Reinforcement Learning. Information 2020, 11, 310.
Piqué, F.; Kalidindi, H.T.; Fruzzetti, L.; Laschi, C.; Menciassi, A.; Falotico, E. Controlling Soft Robotic Arms Using Continual Learning. IEEE Robot. Autom. Lett. 2022, 7, 5469–5476.
Ménager, E.; Schegg, P.; Khairallah, E.; Marchal, D.; Dequidt, J.; Preux, P.; Duriez, C. SofaGym: An Open Platform for Reinforcement Learning Based on Soft Robot Simulations. Soft Robot. 2022, 10, 410–430.
Wang, X.; Li, Y.; Kwok, K.W. A Survey for Machine Learning-Based Control of Continuum Robots. Front. Robot. AI 2021, 8, 280.
Yip, M.C.; Camarillo, D.B. Model-Less Feedback Control of Continuum Manipulators in Constrained Environments. IEEE Trans. Robot. 2014, 30, 880–889.
Ge, F.; Wang, X.; Wang, K.; Lee, K.H.; Ho, J.; Fu, H.; Fu, K.C.D.; Kwok, K.W. Vision-Based Online Learning Kinematic Control for Soft Robots Using Local Gaussian Process Regression. IEEE Robot. Autom. Lett. 2019, 4, 1194–1201.
Bhagat, S.; Banerjee, H.; Tse, Z. Deep Reinforcement Learning for Soft, Flexible Robots: Brief Review with Impending Challenges. Robotics 2019, 8, 4.
Malekzadeh, M.; Calinon, S.; Bruno, D.; Caldwell, D. Learning by Imitation with the STIFF-FLOP Surgical Robot: A Biomimetic Approach Inspired by Octopus Movements. Robot. Biomimetics Spec. Issue Med. Robot. 2014, 1, 1–15.
Calinon, S.; Bruno, D.; Malekzadeh, M.; Nanayakkara, T.; Caldwell, D. Human–robot skills transfer interfaces for a flexible surgical robot. Comput. Methods Programs Biomed. 2014, 116, 81–96.
Sasagawa, A.; Fujimoto, K.; Sakaino, S.; Tsuji, T. Imitation Learning Based on Bilateral Control for Human–Robot Cooperation. IEEE Robot. Autom. Lett. 2020, 5, 6169–6176.
Racinskis, P.; Arents, J.; Greitans, M. A Motion Capture and Imitation Learning Based Approach to Robot Control. Appl. Sci. 2022, 12, 7186.
Rao, R. Reinforcement Learning: An Introduction; R.S. Sutton, A.G. Barto (Eds.). Neural Netw. 2000, 13, 133–135.
Osa, T.; Pajarinen, J.; Neumann, G.; Bagnell, J.A.; Abbeel, P.; Peters, J. An Algorithmic Perspective on Imitation Learning. Found. Trends Robot. 2018, 7, 1–179.
Zhang, D.; Fan, W.; Lloyd, J.; Yang, C.; Lepora, N.F. One-shot domain-adaptive imitation learning via progressive learning applied to robotic pouring. IEEE Trans. Autom. Sci. Eng. 2022.
Zhu, Y.; Wang, Z.; Merel, J.; Rusu, A.; Erez, T.; Cabi, S.; Tunyasuvunakool, S.; Kramár, J.; Hadsell, R.; Freitas, N.; et al. Reinforcement and Imitation Learning for Diverse Visuomotor Skills. arXiv 2018, arXiv:1802.09564.
Perico, C.A.V.; De Schutter, J.; Aertbeliën, E. Combining Imitation Learning With Constraint-Based Task Specification and Control. IEEE Robot. Autom. Lett. 2019, 4, 1892–1899.
Sasaki, F.; Yohira, T.; Kawaguchi, A. Sample Efficient Imitation Learning for Continuous Control. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019.
Stadie, B.; Abbeel, P.; Sutskever, I. Third-Person Imitation Learning. arXiv 2017, arXiv:1703.01703.
Chen, Z.; Lin, M. Self-Imitation Learning in Sparse Reward Settings. arXiv 2020, arXiv:2010.06962.
Rusu, A.; Gómez, S.; Gulcehre, C.; Desjardins, G.; Kirkpatrick, J.; Pascanu, R.; Mnih, V.; Kavukcuoglu, K.; Hadsell, R. Policy Distillation. arXiv 2015, arXiv:1511.06295.
Balakrishna, A.; Thananjeyan, B.; Lee, J.; Li, F.; Zahed, A.; Gonzalez, J.E.; Goldberg, K. On-Policy Robot Imitation Learning from a Converging Supervisor. In Proceedings of the Conference on Robot Learning, Osaka, Japan, 30 October–1 November 2019; pp. 24–41.
Duan, Y.; Andrychowicz, M.; Stadie, B.; Ho, J.; Schneider, J.; Sutskever, I.; Abbeel, P.; Zaremba, W. One-Shot Imitation Learning. arXiv 2017, arXiv:1703.07326.
Finn, C.; Yu, T.; Zhang, T.; Abbeel, P.; Levine, S. One-Shot Visual Imitation Learning via Meta-Learning. In Proceedings of the 1st Annual Conference on Robot Learning, Mountain View, CA, USA, 13–15 November 2017; pp. 357–368.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Automation & Control Systems

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Muhammad Sunny Nazeer

Cecilia Laschi

Egidio Falotico

View Times: 196

Update Date: 12 Oct 2023

Table of Contents

Video Upload Options

Confirm

1. Introduction

2. Imitation Learning (IL) and Generalization Capability

References