SeAttE—Embedding Model Based on Knowledge Graph Completion

SeAttE—Embedding Model Based on Knowledge Graph Completion: Comparison

Please note this is a comparison between Version 2 by Lindsay Dong and Version 6 by Lindsay Dong.

SeAttE is a novel tensor ecomposition model based on Separating Attribute space for knowledge graph completion. SeAttE is the first model among the tensor decomposition family to consider the attribute space separation task. Furthermore, SeAttE transforms the learning of too many parameters for the attribute space separation task into the structure’s design. This operation allows the model to focus on learning the semantic equivalence between relations, causing the performance to approach the theoretical limit.

NLP
knowledge graphs
knowledge representation
link prediction
attribute space

1. Introduction

Knowledge Graphs (KGs) are collections of large-scale triples, such as Freebase ^[1], YAGO ^[2] and DBpedia ^[3]. KGs play a crucial role in applications such as question answering services, search engines, and smart medical care. Although there are billions of triples in KGs, they are still incomplete. These incomplete knowledge bases will bring limitations to practical applications ^[4].

Researchers recently tried to solve the task of link prediction through knowledge graph embedding. Knowledge graph embedding models map entities and relations into low-dimensional vectors (or matrices, tensors), measure the rationality of triples through specific score functions between entities and relations, and rank the triples with scores. TransE ^[1] first proposes to utilize relation vectors as the geometric distance between entities. Then many variants emerge.

The tensor decomposition models ^{[5][6][7][8][9][10][11]} are a family of which the inference performance is relatively good among these variants. RESCAL ^[5] is the basic tensor decomposition model, which is the first tensor decomposition model. Since RESCAL ^[5] represents the relations as a matrix, the large number of parameters makes it difficult for the model to learn effectively. So DisMult ^[6] directly diagonalizes the matrix, which takes the relations as vectors. This operation significantly reduces the number of parameters. There are a large number of complex relation types in the knowledge graphs. However, DisMult is an over-simplified model, which cannot describe complex relations. Then subsequent variants are invented to describe more types of relations, such as asymmetric and hierarchical relations, which are equivalent to designing unique structures for description of specific types of relations. For example, ComplEx ^[7], similarly to DistMult ^[6], forces each relation embedding to be a diagonal matrix but extends such formulation in the complex space. Analogy ^[12] aims at modeling analogical reasoning, which is crucial for any knowledge induction. It employs the general bilinear scoring function but adds two main constraints inspired by analogical structures. TuckER ^[8] relies on the Tucker decomposition ^[13], which factorizes a tensor into a set of vectors and a smaller shared core. SimplE ^[9] forces relation embeddings to be diagonal matrices, similarly to DistMult ^[6], but extends it by associating two separate embeddings with each entity and associating two separate diagonal matrices with each relation. These models mainly explore particular regularization to improve performance. No matter how sophisticated the design of such tensor decomposition models is, they find it difficult to surpass the basic tensor decomposition model theoretically. In addition, the previous tensor decomposition models do not consider the problem of attribute separation. The unnoticed task of attribute separation in the traditional models is just handed over to the training. However, the amount of parameters for this task is tremendous, and the model is prone to overfitting.

In practice, entities are collections of attributes, and different entities can contain various semantic attributes. Comparing triples with different relations should only select specific attributes for comparison. Figure 1 shows the comparison of boxes with the same shape and different colors. A novel model—a tensor decomposition model based on separating attribute space for knowledge graph completion (SeAttE) was proposed. SeAttE transfers the large-parameter learning for the attribute space separation task in traditional tensor decomposition models to the model structure design.

/media/item_content/202204/625f597f92b04electronics-11-01058-g001.png

Figure 1.

Comparison of boxes with the same shape and different colors.

2. Related Work and Critical Differences

Tensor Decomposition Models. These models implicitly consider triples as tensor decomposition. DistMult ^[6] constrains all relation embeddings to be diagonal matrices, which reduces the space of parameters to access a more accessible model to train. RESCAL ^[5] represents each relationship with a total rank matrix. ComplEx ^[7] extends the KG embeddings to the complex space to better model asymmetric and inverse relations. Analogy ^[12] employs the general bilinear scoring function but adds two main constraints inspired by analogical structures. Based on the Tucker decomposition, TuckER ^[8] factorizes a tensor into a set of vectors and a smaller shared core matrix. SimplE ^[9] is a simple enhancement of CP to allow the two embeddings of each entity to be learned dependently. HolE ^[11] is a multiplicative model that is isomorphic to ComplEx ^[7]. Inspired by the recent success of automated machine learning (AutoML), AutoSF ^[10] proposes to automatically design scoring functions for distinct KGs by the AutoML techniques. QuatDE ^[14] captures the variety of relational patterns and separates different semantic information of the entity, using transition vectors to adjust the point position of the entity embedding vectors in the quaternion space via Hamilton product, enhancing the feature interaction capability between elements of the triplet. DensE ^[15] develops a novel knowledge graph embedding method to provide an improved modeling scheme for the complex composition patterns of relations. Geometric Models. Geometric Models interpret relations as geometric transformations in the latent space. TransE ^[1] is the first translation-based method, which treats relations as translation operations from the head entities to the tail entities. Along with TransE ^[1], multiple variants, including TransH ^[16], TransR ^[17] and TransD ^[18], are proposed to improve the embedding performance of KGs. Recently, RotatE ^[19] defines each relation as a rotation from head entities to tail entities. Inspired by the fact that concentric circles in the polar coordinate system can naturally reflect the hierarchy, HAKE ^[20] maps entities into the polar coordinate system. HAKE ^[20] can effectively model the semantic hierarchies in knowledge graphs. OTE ^[21] proposes a distance-based knowledge graph embedding. First, OTE extends the modeling of RotatE from 2D complex domain to high dimensional space with orthogonal relation transforms. Second, graph context is proposed to integrate graph structure information into the distance scoring function to measure the plausibility of the triples during training and inference. Deep Learning Models. Deep Learning Models use deep neural networks to perform knowledge graph completion. ConvE ^[22] and ConvKB ^[23] employ convolutional neural networks to define score functions. CapsE ^[24] embeds entities and relations into one-dimensional vectors under the basic assumption that different embeddings encode homologous aspects in the same positions. CompGCN ^[25] utilizes graph convolutional networks to update the knowledge graph embedding. Neural Tensor Network (NTN) combines E-MLP with several bilinear parts. Nathani ^[26] proposes a novel attention-based feature embedding that captures both entity and relation features in any given entity’s neighborhood. RLH ^[27] is inspired by the hierarchical structure through which a human being handles cognitionally ambiguous cases. The whole reasoning process is decomposed into a hierarchy of two-level Reinforcement Learning policies for encoding historical information and learning structured action space. R2D2 ^[28] is a novel method for automatic reasoning on knowledge graphs based on debate dynamics. R2D2 is to frame the task of triple classification as a debate game between two reinforcement learning agents which extract arguments-paths in the knowledge graph—with the goal to promote the fact being true (thesis) or the fact being false (antithesis), respectively. RNNLogic ^[29] is a probabilistic model. RNNLogic treats logic rules as a latent variable, and simultaneously trains a rule generator as well as a reasoning predictor with logic rules. MADLINK ^[30] introduces an attentive encoder–decoder-based link prediction approach considering both structural information of the KG and the textual entity descriptions. There are also other models, such as DURA ^[31], which are proposed to solve overfitting. RuleGuider ^[32] leverages high-quality rules generated by symbolic-based methods to provide reward supervision for walk-based agents. SFBR ^[33] provides a relation-based semantic filter to extract the attributes that need to be compared and suppress the irrelevant attributes of entities. Together, most of the above studies intend to find a more robust representing approach. Measuring the effectiveness of certain triples is to compare the matching degree of specific attributes based on relations. Only a few models, such as TransH ^[16], TransR ^[17], and TransD ^[18], consider that entities in different triples should have different representation. However, these variants require many resources occupations and are limited to particular models.

3. Background

3.1. KG Completion and Notations

KGs are collections of factual triples $K = \{(h, r, t (), h, t \in E, r \in R \{\})\}$ , where $(h, r, t)$ represents a triple in the knowledge graph, $h, t, r$ are head, tail entities and relations, respectively. ScholarsWe associates the entities $h, t$ and relations r with vectors $h, t, r \in R^{d}$ in knowledge graph embedding. Then scholarswe design an appropriate scoring function $d_{r} (h, t)$ , to map the embedding of the triple to a certain score. For a particular question $(h, r, ?)$ , the task of KG completion is ranking all possible answers and obtain the preference of prediction.

Using $W_{r} \in R^{d \times d}$ and $r \in R^{d}$ to distinguish matrix representation and vector representation of the relations, respectively. T, ⟨⋅⟩ and ∘ denote the operation of transpose, the generalized dot product and the Hadamard product, respectively. Especially, scholarswe utilize $r_{S e A t t E}$ to represent the matrix of relation in SeAttE. Let ∥∥, $diag$ () and $Re$ () denote the $L_{2}$ norm, matrix diagonalization and the real part of complex vectors.

3.2. Basic Models

Tensor Factorization Models. Models in this family interpret link prediction as a task of tensor decomposition, where triples are decomposed into a combination (e.g., a multi-linear product) of low-dimensional vectors for entities and relations. CP ^[34] represents triples with canonical decomposition. Note that the same entity has different representations at the head and tail of the triplet. The score function can be expressed as:

$\begin{matrix} d_{r} (h, t) = ∥h^{T} rt∥ \end{matrix}$

where $h, r, t \in R^{k}$ .

RESCAL ^[5] represents a relation as a matrix $W_{r} \in R^{d \times d}$ that describes the interactions between latent representations of entities. The score function is defined as:

$\begin{matrix} d_{r} (h, t) = ∥h^{T} W_{r} t∥ \end{matrix}$

DistMult ^[6] forces all relations to be diagonal matrices, which consistently reduces the space of parameters to be learned, resulting in a much easier model to train. On the other hand, this makes the scoring function commutative, which amounts to treating all relations as symmetric.

$\begin{matrix} d_{r} (h, t) = ∥h^{T} W_{r} t∥ \end{matrix}$

where $W_{r} = diag (w_{1}, w_{2}, \dots, w_{n})$ .

ComplEx ^[7] extends the real space to complex spaces and constrains the embeddings for relation to be a diagonal matrix. The bilinear product becomes a Hermitian product in complex spaces. The score function can be expressed as:

$\begin{matrix} d_{r} (h, t) = Re (h^{T} diag (r) t) \end{matrix}$

where $h, r, t \in C^{k}$ .

4. SeAttE Model

4.1. Motivation and Design of SeAttE

4.1.1. Motivation

As shown in Figure 2, RESCAL is the basic tensor decomposition model. Since RESCAL represents the relations as a matrix, the large number of parameters makes it difficult for the model to learn effectively. So DisMult directly diagonalizes the matrix, significantly reducing the number of parameters. However, over-simplified models limit the performance. Subsequently, variants are invented for describing specific types of relations, such as asymmetric and hierarchical relations, which are equivalent to designing unique structures for describing specific types of relationships. Such models need to look for special functions to precisely fit different relations categories. Some relations can be well characterized in models, while some are not. This design from a specific relationship type is challenging to cover all relations. No matter how sophisticated the design of such models is, it is difficult to surpass the RESCAL model theoretically. Moreover, the previous tensor decomposition model did not consider the problem of attribute separation. The unnoticed task of attribute separation in the traditional models is just handed over to the training. However, the amount of parameters for this task is tremendous, and the model is prone to overfitting.

/media/item_content/202204/625f5e4c17e72electronics-11-01058-g002.png

Figure 2.

The research routes of current tensor decomposition models.

It is widely accepted that each entity contains different attributes, and the relations describe the association of entities on specific attributes. When comparing the plausibility of triples, the first step is to pick out the semantic dimension that the relation compares and filter out irrelevant dimensions. In the second step, scholarswe compare the correlation of the attributes of heads and tails under specific attributes, whether it satisfies the triples. It is essential to separate the dimensions that need to be compared from those unrelated dimensions. However, existing tensor decomposition models ignore the isolation of attribute dimensions, and these models combine these two steps for training. These models simultaneously complete the separation of attributes and the learning of semantic equivalence. This combination will result in too many parameters for learning. Therefore, scholars make a unique design for the relation matrix based on the subspace theory so that the different semantic spaces will not overlap. The model implements the isolation of different attributes in the structural design. As shown in Figure 3, the left is the traditional entity vector and relation matrix; the right is the entity vector and the relation matrix with the separation of attribute spaces. ScholarsWe perform vector subspace separation on the relation matrix of tensor decomposition models. As shown in Equation (5), the task of attribute isolation is transferred to the model structure design. This operation allows the model to focus on learning the semantic equivalence between relations, resulting in better performance. Since the model is a new embedding model that separates attribute space for knowledge graph completion, scholarswe name the model SeAttE in this paper.

\begin{matrix} d_{r} (h, t () = ∥h \times r \times t ∥∥∥) \\ \Rightarrow d_{r} (h, t () = ∥h \times r_{S e A t t E \times t ∥∥}∥) \end{matrix}

/media/item_content/202204/625f5e798890delectronics-11-01058-g003.png

Figure 3.

The left is the traditional entity vector and relation matrix, the right is the entity vector and the relation matrix with the separation of attribute spaces.

4.1.2. Design

In theory, the subspace separation should be related to the actual relations, which cannot be designed in advance. ScholarsWe design the structure of attribute subspace segmentation to reduce the model’s workload in learning segmentation tasks of different semantic dimensions. In order to facilitate the design and implementation of the model, SeAttE adopts the exact size of attribute subspace design. Assuming that the dimension of each entity vector is d and the dimension of each attribute subspace is k, each entity contains d/k attribute spaces.

r_{S e A t t E = |\begin{matrix} W_{1} & 0 & 0 & 0 \\ 0 & W_{2} & 0 & 0 \\ 0 & 0 & \dots & 0 \\ 0 & 0 & 0 & W_{k} \end{matrix}|}

where

r_{S e A t t E \in R^{d}}

and

h = d / k

As shown in the left part of Figure 4, the dimension of each entity vector d is eight, and the dimension of each attribute subspace k is two, then the entity contains four attributes subspaces. As shown in the right part of Figure 4, when the dimension of each attribute subspace d is two and the dimension of each subspace k is four, the entity contains two attributes subspaces.

/media/item_content/202204/625f5ee97ba54electronics-11-01058-g004.png

Figure 4.

Different attribute subspace sizes under the same entity dimension. The dimension of each attribute subspace is set to 2 in the left and 3 in the right.

SeAttE realizes the division of knowledge graph attribute space by setting the max dimension of the attribute subspace. The model avoids a large number of parameter learning for attribute separations by setting the parameter of the maximum semantic space dimension.

4.2. Relation to Previous Tensor Factorization Models

RESCAL is the basic tensor decomposition model. Due to the tremendous amount of parameters of this model, the dimension of the entity cannot be well expanded. When the dimension of the attribute subspace of SeAttE satisfies $k = d$ , SeAttE is equivalent to RESCAL.

$r_{S e A t t E = |W_{1}|}$

where $k = d$ and $h = 1$ .

DisMult is the simplest tensor decomposition model, which diagonalizes all relation matrices. When the max dimension of the attribute subspace of SeAttE k is set to 1, then $W_{k}$ is a 1-dimensional matrix, that is, a numerical value. The relationship matrix is equivalent to the diagonal. Under these circumstances, SeAttE is equivalent to DisMult.

$\begin{matrix} r_{S e A t t E} & = |\begin{matrix} W_{1} & 0 & 0 & 0 & |\begin{matrix} 0 & W_{2} & 0 & 0 & |\begin{matrix} 0 & 0 & \dots & 0 & |\begin{matrix} 0 & 0 & 0 & W_{h} & |\begin{matrix} \end{matrix}| \end{matrix}| \end{matrix}| \end{matrix}| \end{matrix}| \\ = d i a g (W_{1}, W_{2}, \dots, W_{h} ()) \end{matrix}$

where $W_{k} \in R$ .

ComplEx imports complex representations to characterize symmetric and antisymmetric relations.

$\begin{matrix} d_{r} (s, o ()) & = Re (⟨w_{r}, e_{s}, e_{o} (⟨⟩)⟩) \\ = Re (\sum k = 1 K w_{r k e_{s k {\bar{e}}_{o k ()}}}) \\ = Re (w_{r}) Re (e_{s}) Re T (e_{o}) + Re (w_{r}) Im (e_{s}) Im T (e_{o}) \\ + Im (w_{r}) Re (e_{s}) Im T (e_{o}) - Im (w_{r}) Im (e_{s}) Re T (e_{o}) \\ = [Re (e_{s}) | | Im (e_{s}) [] W_{r} [Re (e_{o}) | | Im (e_{o}) {[]}^{T}]] \\ = e^{'} s W_{r} (e^{' ({(o)}^{T})}) \end{matrix}$

$W_{r} = [\begin{matrix} d i a g (Re (w_{r} () [\begin{matrix} d i a g (Im (w_{r} () [\begin{matrix} d i a g (- Im (w_{r} () [\begin{matrix} d i a g (Re (w_{r} () [\begin{matrix} \end{matrix}])) \end{matrix}])) \end{matrix}])) \end{matrix}])) \end{matrix}]$

where $e_{s}^{'}, e_{o}' \in R^{2 K}$ and $W_{r} \in R 2 K \times 2 K$ .

From the above formula, itwe can find that ComplEx is equivalent to RESCAL with $d = 2 k$

. The model performs a particular regularization for each relation matrix, which only retains the diagonal elements of the four sub-matrices of the matrix, and the remaining elements are set to 0. When the dimension of the attribute subspace of the SeAttE model k is set to 2, the relation matrix can also be expressed as the following.

\begin{matrix} r_{S e A t t E} & = |\begin{matrix} W_{1} & O & O & O & |\begin{matrix} O & W_{2} & O & O & |\begin{matrix} O & O & \dots & O & |\begin{matrix} O & O & O & W_{h} & |\begin{matrix}  \end{matrix}| \end{matrix}| \end{matrix}| \end{matrix}| \end{matrix}| \\ = |\begin{matrix} \begin{matrix} W_{11} \begin{matrix} W_{12} \begin{matrix} 0 \begin{matrix} 0 \begin{matrix} \begin{matrix} W_{13} \begin{matrix} W_{14} \begin{matrix} 0 \begin{matrix} 0 \begin{matrix} \begin{matrix} 0 \begin{matrix} 0 \begin{matrix} W_{21} \begin{matrix}  \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix} \end{matrix}| \\ = H * |\begin{matrix} d i a g (W_{11}, W_{21}, \dots, W_{h 1 () |\begin{matrix} d i a g (W_{12}, W_{22}, \dots, W_{h 2 () |\begin{matrix} |\begin{matrix} d i a g (W_{13}, W_{23}, \dots, W_{h 3 () |\begin{matrix} d i a g (W_{14}, W_{24}, \dots, W_{h 4 () |\begin{matrix}  \end{matrix}|}) \end{matrix}|}) \end{matrix}| \end{matrix}|}) \end{matrix}|}) \end{matrix}| \end{matrix}

where

H = h_{2_n + 1 \times h_{3_n + 2 \times \dots \times h_{n_2 n - 1}}}

is obtained by exchanging the i-th row and the k-th row of the identity matrix, that is, performing elementary row transformation on the matrix W. Where

G = g_{2_n + 1 \times g_{3_n + 2 \times \dots \times g_{n_2 n - 1}}}

is obtained by exchanging the i-th column and the k-th column of the identity matrix, that is, performing elementary column transformation on the matrix W.