Prerequisite chains are crucial to acquiring new knowledge efficiently. Many studies have been devoted to automatically identifying the prerequisite relationships between concepts from educational data. Though effective to some extent, these methods have neglected two key factors: most works have failed to utilize domain-related knowledge to enhance pre-trained language models, thus making the textual representation of concepts less effective; they also ignore the fusion of semantic information and structural information formed by existing prerequisites.
1. Introduction
With the popularity of online education platforms, accessing learning resources has become increasingly convenient; however, the problem of effectively and systematically learning from vast learning resources has become an issue of concern. Concepts are the smallest unit of learning for learners, and constructing the order of concept learning and organization is crucial for learning new knowledge. The prerequisite relationships between concepts can be used to help learners generate reliable learning paths [
1], and for some downstream tasks in the education field, such as knowledge tracing [
2] and cognitive diagnosis [
3].
‘Concept prerequisite’ refers to the idea that some basic concepts must be understood before tackling more complex or advanced topics: for instance, to comprehend the concept BERT in natural language processing, one should first master the concept Transformer; similarly, understanding multi-head attention and feed-forward network is necessary before mastering the concept Transformer. Concept prerequisite learning aims to establish a coherent learning sequence between concepts in resources from various sources: specifically, this involves identifying whether two concepts have a prerequisite relationship.
This task has attracted the interest of many researchers. Previous works [
4,
5,
6] have proposed handcrafted rules and features for learning concept prerequisites from knowledge graphs, the scientific corpus, and learner behavioral data respectively, and the literature [
7] utilizes link information to mine concept prerequisites from Wikipedia. By contrast, recent works have utilized machine learning methods to predict concept prerequisites. These approaches can be divided into two categories: classification-based methods and link-prediction-based methods. Classification-based methods, such as [
8,
9], mostly follow the text-matching framework, focusing on constructing feature vectors representing the matching relationship between two sentences, and using Siamese networks for prediction. Li et al. [
10] found that the BERT [
11] model’s performance in identifying concept prerequisite relationships was inferior to that of traditional pre-trained language models, and thus, few works have employed the BERT model to obtain concept embedding. Link-prediction-based methods, such as [
10,
12], typically construct resource–concept heterogeneous graphs, and apply variational graph autoencoder (VGAE) [
13] for prediction.
However, existing research has neglected two crucial factors, and thus, learning concept prerequisites remains challenging. Firstly, textual representation is obtained by a traditional pre-trained language model, which is highly reliant on the training corpus: it requires concepts to appear at certain times in training corpora, to obtain effective concept representations, and the representation is fixed according to the statistics of the training corpus. Secondly, the complementary effects of textual and structural information should be further explored: most existing approaches either use textual representations to initialize inputs for graph-based models, or structural representations as inputs for classifiers, which is not an effective way to fuse the two types of information.
2. Concept Prerequisite Prediction as Text Matching
Concept prerequisite prediction based on the classification perspective refers to classifying and determining the relationship between two concepts, similar to the text-matching task. Early research mainly relied on designing features and rules. Liang et al. [
7] proposed the reference distance model, based on the possibility of a prerequisite relationship between concepts measured by their link density. Liu et al. [
14] proposed classification, learning to rank, and the nearest-neighbor search method, to infer prerequisite relationships with a directed graph. Pan et al. [
15] first used representation learning to obtain hidden representations for concepts, and proposed seven features based on these representations, to infer relationships. Roy et al. [
9] proposed a supervised method, PREREQ, which used a neural network for concept prerequisite relationship recognition, using topic modeling and the paired latent Dirichlet allocation model, to obtain the latent representation of concepts, and prediction based on Siamese networks. Jia et al. [
8] considered the relationship between concepts and resources based on PREREQ, and also considered auxiliary tasks, extending this method to the weakly supervised learning setting. Li et al. [
16] applied a pre-trained language model to encode text, and utilized link information from web pages between concepts using a graph model.
Previous works based on the classification perspective have mainly referred to the text matching task, emphasizing the semantic information of concept text; however, the concept prerequisite relationship is directional, and has transitivity. These works did not utilize structural information formed by prerequisite relationships already well-established.
3. Concept Prerequisite Prediction as Link Prediction
Works that take concept prerequisite prediction as link prediction focus on predicting implicit relationships, by constructing a graph based on existing prerequisite relationships. Li et al. [
17] constructed a dataset called LectureBank, and proposed constructing a concept map with each concept in the dataset as a node, for the first time. They then applied a VGAE to learn concept prerequisites from a link prediction perspective; however, inferring solely from existing prerequisites is very limited: in most works, it mainly refers to prerequisites between concepts. Li et al. [
10] expanded the concept map into a resource–concept heterogeneous graph, and proposed an R-VGAE model, to consider the multiple relationships between two types of nodes: resource and concept. Li et al. [
18] further explored cross-domain concept prerequisite chain learning, using an optimized variational graph autoencoder. However, these models did not distinguish the importance of different nodes, when aggregating neighbor node information. Based on the resource–concept heterogeneous graph, Zhang et al. [
12] employed a multi-head attention mechanism and a gated fusion mechanism, to enhance the representation of concepts, and, finally, used a variational graph autoencoder to predict the premise relationships between concepts.
Research based on the link prediction perspective has mainly focused on modeling structures, thereby neglecting the textual semantic information of concepts. While [
10,
12] used pre-training models to obtain textual representations of concepts as the initial input of the graph model, they were all based on traditional pre-trained models, where the representation of each concept was fixed according to the training corpus.
4. Continual Pre-Training of Language Models
Most publicly available pre-trained language models are trained on general domain corpora (such as Wikipedia), resulting in poor performance when applied in specific domains or tasks. Recently, some studies have proposed pre-training language models on professional corpora. MathBERT [
19] created a mathematical vocabulary and continual pre-training on a large amount of mathematical text. OAG-BERT [
20] is pre-trained continually, based on the Open Academic Graph, and integrates heterogeneous entity knowledge. COMUS [
21] continually pre-trains language models for math problem understanding, with a syntax-aware memory network.
In addition to pre-training language models for specific fields, some works have also attempted to design task-oriented pre-training tasks for target applications, such as SentiLR [
22] for sentiment analysis, CALM [
23] for commonsense reasoning, and DAPO [
24] for dialog adaption. In order to tackle challenges such as the inability of pre-trained language models to connect with real-world situations, some works have proposed implicitly introducing knowledge, by designing pre-training tasks with knowledge constraints. ERNIE1.0 [
25] extended the basic unit of MLM from characters to word segments, and proposed two masking strategies: phrase-level and entity-level. SenseBERT [
26] introduced a semantic-level language model, which required the model to predict the hypernym corresponding to the masked word. ERICA [
27] designed two contrastive learning tasks, to improve the model’s understanding of document-level entities and relationships.
This entry is adapted from the peer-reviewed paper 10.3390/math11122780