Knowledge-Based Robot Manipulation and Knowledge-Graph Embedding

Knowledge-Based Robot Manipulation and Knowledge-Graph Embedding: History

View Latest Version

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Artificial Intelligence

Contributor:

Shengyi Miao

Autonomous indoor service robots are affected by multiple factors when they are directly involved in manipulation tasks in daily life, such as scenes, objects, and actions. It is of self-evident importance to properly parse these factors and interpret intentions according to human cognition and semantics.

robot manipulation
knowledge graph
representation learning
graph neural network

1. Introduction

It is crucial to apply a strong understanding of relevant skills and task-planning capabilities to service robots, whether in daily life or in industrial-assembly scenarios. Classic task and motion planning (TAMP) [1] relies heavily, often entirely, on predefined planning domains, symbolic rules, and complex strategy searches. This leads to high costs and an inability to process tasks in a way that reflects human cognition and semantics. In general, all the limitations and constraints must be known before starting a task; otherwise, failures in efficient transfer and in the ability to adapt to changing task scenarios may occur

In relation to this problem, the past decade has witnessed rapid developments in the field of robot manipulation, especially in knowledge-based methods of representation and task planning. These enlightening paradigms enable robots to acquire manipulative task-related knowledge from human knowledge. However, knowledge is a higher-dimensional form of organization than data; discrete and structured characteristics also suffer from difficulties in directly describing continuous manipulation data. Therefore, most of the existing knowledge-based robot-manipulation-representation methods focus on static-object information and usually fail to elucidate reasonable decoupling between different factors. Concretely, the descriptions of tasks, actions, and skills are flat and chaotic. Furthermore, only rule-based symbolic calculations are considered during the processes of querying and reasoning.

Efficient and reasonable representations of complex manipulation knowledge require the consideration of both continuous and discrete data, as well as static and dynamic factors. Real-time responses and continuous interactive updates are also necessary for knowledge systems to handle new tasks. Additionally, the reasonable modeling of manipulation processes and the extraction of semantic information are prerequisites for precise inferences and planning.

2. Knowledge-Based Robot Manipulation

The application of service robots in manipulation tasks is usually divided into three levels: the planning level, which includes task analysis and task planning; the strategy level, which includes the motion planning of an action primitive; and the control level, which includes the execution process of the robot’s hardware. Specifically, the knowledge-based method mainly focuses on the planning level, and fruitful results have been achieved with knowledge-graph-based robot-knowledge-representation systems. RoboEarth [2,3] was the first attempt to explore a knowledge-sharing system among robots, which mainly involved storing point clouds, CAD models, and object images in robot manipulation, which can enable robots to build semantic maps, which are required for daily tasks. On this basis, a robot can fully utilize cyberspace in its living space [4]. KnowRob [5] acquires an obtaining object and environmental information from the web and stores it in a knowledge-processing system that can be queried using Prolog, providing the knowledge required for service robots to perform daily manipulation tasks. KnowRob2 [6] improves the ability to acquire and reason knowledge based on KnowRob. RoboBrain [7] focuses on data collection. The knowledge engine stores different modalities of data, including symbols, natural language, tactile senses, robot trajectories, and visual features. Subsequently, they are connected to produce rich, heterogeneous graph representations. Perception and Manipulation Knowledge (PMK) [8] involves the formalization and implementation of a standardized ontology framework to extend robots’ abilities related to manipulation tasks requiring task and motion planning (TAMP). Centering around scalability and responsiveness, TRPO [9] designs corresponding task-planning algorithms, starting from three categories of ontology knowledge: task, environment, and robot. Furthermore, AKB-48 [10] constructs a large-scale knowledge graph of articulated objects, which includes 2037 3D articulated object models from 48 categories in the real world. RoboKG [11] constructs a knowledge base specifically for performing grasping-based manipulation tasks, assisting the robot in the prediction of factors related to grasping, such as which component of the object to grasp, which end effector to use, and how much force to apply during grasping.

These knowledge-representation systems each have at their core a large-scale static-knowledge graph, which takes static objects in robot manipulation as the main description objects, and the description of manipulation behavior remains at the level of the simple recording of high-level tasks or low-level motion parameters. In addition, the architectures of these knowledge-representation systems are either excessively singular and flat, or overly chaotic and disorderly, resulting in high query complexity.

3. Knowledge-Graph Embedding

The knowledge graph originated in semantic networks [13] and is a typical form of structured knowledge representation. It represents structures of facts, consisting of entities, relations, and semantic descriptions [14]. Entities can be real-world objects and abstract concepts; relations represent the relations between entities. The semantic descriptions of entities and their relations contain categories and properties with clear meanings. Mature knowledge-graph schemes include the language-knowledge base WordNet [15], the world-knowledge base Freebase [16], Wikidata [17], DBpedia [18], and ConceptNet [19,20].

The goal of knowledge-graph embedding is to represent the semantic knowledge of the research object as a dense, low-dimensional, real-valued vector using machine learning. The low-dimensional vector representation obtained by embedding is a distributed representation [21]. The embedding of knowledge graphs is focused on entities and relations in the knowledge base, in contrast to mapping, which considers spatial, temporal, and logical dimensions in the Internet of Things [22]. By mapping entities or relations into a low-dimensional vector space, the semantic information can be represented, and the complex semantic associations between entities and relations can be efficiently calculated. This is critical for both knowledge updates and reasoning.

The current mainstream knowledge-graph-embedding method is the translation model. TransE [23] regards relations as translation operations from the head entity to the tail entity in low-dimensional vector space. TransH [24] solves the problem of TransE’s inability to handle 1–n, n–1, and n–n relations well. TransR [25] replaces the projection vector in TransH with a projection matrix.

Graph neural networks mainly solve the problem of non-Euclidean space, which is naturally suitable for the graph structures of knowledge graphs. Nevertheless, classical graph convolutional networks [26] are isomorphic graphs, in which all the edges in the network share the same weight and cannot adapt to different types of relation in knowledge graphs. Therefore, a heterogeneous graph-embedding method [27] is proposed to calculate the weights of different relations separately to deal with multi-relation graph data. In our work, the knowledge base that describes robot manipulation tasks has a hierarchical structure that differs from those of other knowledge bases.

This entry is adapted from the peer-reviewed paper 10.3390/e25040657

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.