The reinforcement learning control of an underground loader was investigated in a simulated environment by using a multi-agent deep neural network approach. At the start of each loading cycle, one agent selects the dig position from a depth camera image of a pile of fragmented rock. A second agent is responsible for continuous control of the vehicle, with the goal of filling the bucket at the selected loading point while avoiding collisions, getting stuck, or losing ground traction. This relies on motion and force sensors, as well as on a camera and lidar. Using a soft actor-critic algorithm, the agents learn policies for efficient bucket filling over many subsequent loading cycles, with a clear ability to adapt to the changing environment. The best results—on average, 75% of the max capacity—were obtained when including a penalty for energy usage in the reward.
Author found that it is possible to train a DRL controller to use high-dimensional sensor data from different domains without any pre-processing as input for neural network policies in order to solve the complex control task of repeated mucking with high performance. Even though the bucket is seldom filled to the maximum, it is consistently filled with a good amount. It is common for human operators to use more than one attempt to fill the bucket when the pile’s shape is not optimal, which is a tactic that was not considered here. The inclusion of an energy consumption penalty in the reward function can distinctly reduce the agent’s overall energy usage. This agent learns to avoid certain states that inevitably lead to high energy consumption and the risk of failing to complete the task.
The full paper is available here .