Concept Prerequisite Learning with PTM and GNN

Concept Prerequisite Learning with PTM and GNN: Comparison

Please note this is a comparison between Version 2 by Rita Xu and Version 1 by Xin Tang.

Prerequisite chains are crucial to acquiring new knowledge efficiently. Many studies have been devoted to automatically identifying the prerequisite relationships between concepts from educational data. Though effective to some extent, these methods have neglected two key factors: most works have failed to utilize domain-related knowledge to enhance pre-trained language models, thus making the textual representation of concepts less effective; they also ignore the fusion of semantic information and structural information formed by existing prerequisites.

concept prerequisite relationships
pre-trained language model
relational graph convolutional networks

1. Introduction

With the popularity of online education platforms, accessing learning resources has become increasingly convenient; however, the problem of effectively and systematically learning from vast learning resources has become an issue of concern. Concepts are the smallest unit of learning for learners, and constructing the order of concept learning and organization is crucial for learning new knowledge. The prerequisite relationships between concepts can be used to help learners generate reliable learning paths ^[1], and for some downstream tasks in the education field, such as knowledge tracing ^[2] and cognitive diagnosis ^[3].

‘Concept prerequisite’ refers to the idea that some basic concepts must be understood before tackling more complex or advanced topics: for instance, to comprehend the concept BERT in natural language processing, one should first master the concept Transformer; similarly, understanding multi-head attention and feed-forward network is necessary before mastering the concept Transformer. Concept prerequisite learning aims to establish a coherent learning sequence between concepts in resources from various sources: specifically, this involves identifying whether two concepts have a prerequisite relationship.

This task has attracted the interest of many researchers. Previous works [4,5,6]^[4][5][6] have proposed handcrafted rules and features for learning concept prerequisites from knowledge graphs, the scientific corpus, and learner behavioral data respectively, and the literature ^[7] utilizes link information to mine concept prerequisites from Wikipedia. By contrast, recent works have utilized machine learning methods to predict concept prerequisites. These approaches can be divided into two categories: classification-based methods and link-prediction-based methods. Classification-based methods, such as [8^[8][9],9], mostly follow the text-matching framework, focusing on constructing feature vectors representing the matching relationship between two sentences, and using Siamese networks for prediction. Li et al. ^[10] found that the BERT ^[11] model’s performance in identifying concept prerequisite relationships was inferior to that of traditional pre-trained language models, and thus, few works have employed the BERT model to obtain concept embedding. Link-prediction-based methods, such as [10^[10][12],12], typically construct resource–concept heterogeneous graphs, and apply variational graph autoencoder (VGAE) ^[13] for prediction.

However, existing research has neglected two crucial factors, and thus, learning concept prerequisites remains challenging. Firstly, textual representation is obtained by a traditional pre-trained language model, which is highly reliant on the training corpus: it requires concepts to appear at certain times in training corpora, to obtain effective concept representations, and the representation is fixed according to the statistics of the training corpus. Secondly, the complementary effects of textual and structural information should be further explored: most existing approaches either use textual representations to initialize inputs for graph-based models, or structural representations as inputs for classifiers, which is not an effective way to fuse the two types of information.

2. Concept Prerequisite Prediction as Text Matching

Concept prerequisite prediction based on the classification perspective refers to classifying and determining the relationship between two concepts, similar to the text-matching task. Early research mainly relied on designing features and rules. Liang et al. ^[7] proposed the reference distance model, based on the possibility of a prerequisite relationship between concepts measured by their link density. Liu et al. ^[14] proposed classification, learning to rank, and the nearest-neighbor search method, to infer prerequisite relationships with a directed graph. Pan et al. ^[15] first used representation learning to obtain hidden representations for concepts, and proposed seven features based on these representations, to infer relationships. Roy et al. ^[9] proposed a supervised method, PREREQ, which used a neural network for concept prerequisite relationship recognition, using topic modeling and the paired latent Dirichlet allocation model, to obtain the latent representation of concepts, and prediction based on Siamese networks. Jia et al. ^[8] considered the relationship between concepts and resources based on PREREQ, and also considered auxiliary tasks, extending this method to the weakly supervised learning setting. Li et al. ^[16] applied a pre-trained language model to encode text, and utilized link information from web pages between concepts using a graph model. Previous works based on the classification perspective have mainly referred to the text matching task, emphasizing the semantic information of concept text; however, the concept prerequisite relationship is directional, and has transitivity. These works did not utilize structural information formed by prerequisite relationships already well-established.

3. Concept Prerequisite Prediction as Link Prediction

Works that take concept prerequisite prediction as link prediction focus on predicting implicit relationships, by constructing a graph based on existing prerequisite relationships. Li et al. ^[17] constructed a dataset called LectureBank, and proposed constructing a concept map with each concept in the dataset as a node, for the first time. They then applied a VGAE to learn concept prerequisites from a link prediction perspective; however, inferring solely from existing prerequisites is very limited: in most works, it mainly refers to prerequisites between concepts. Li et al. ^[10] expanded the concept map into a resource–concept heterogeneous graph, and proposed an R-VGAE model, to consider the multiple relationships between two types of nodes: resource and concept. Li et al. ^[18] further explored cross-domain concept prerequisite chain learning, using an optimized variational graph autoencoder. However, these models did not distinguish the importance of different nodes, when aggregating neighbor node information. Based on the resource–concept heterogeneous graph, Zhang et al. ^[12] employed a multi-head attention mechanism and a gated fusion mechanism, to enhance the representation of concepts, and, finally, used a variational graph autoencoder to predict the premise relationships between concepts. Research based on the link prediction perspective has mainly focused on modeling structures, thereby neglecting the textual semantic information of concepts. While [10,12]^[10][12] used pre-training models to obtain textual representations of concepts as the initial input of the graph model, they were all based on traditional pre-trained models, where the representation of each concept was fixed according to the training corpus.

4. Continual Pre-Training of Language Models

Most publicly available pre-trained language models are trained on general domain corpora (such as Wikipedia), resulting in poor performance when applied in specific domains or tasks. Recently, some studies have proposed pre-training language models on professional corpora. MathBERT ^[19] created a mathematical vocabulary and continual pre-training on a large amount of mathematical text. OAG-BERT ^[20] is pre-trained continually, based on the Open Academic Graph, and integrates heterogeneous entity knowledge. COMUS ^[21] continually pre-trains language models for math problem understanding, with a syntax-aware memory network. In addition to pre-training language models for specific fields, some works have also attempted to design task-oriented pre-training tasks for target applications, such as SentiLR ^[22] for sentiment analysis, CALM ^[23] for commonsense reasoning, and DAPO ^[24] for dialog adaption. In order to tackle challenges such as the inability of pre-trained language models to connect with real-world situations, some works have proposed implicitly introducing knowledge, by designing pre-training tasks with knowledge constraints. ERNIE1.0 ^[25] extended the basic unit of MLM from characters to word segments, and proposed two masking strategies: phrase-level and entity-level. SenseBERT ^[26] introduced a semantic-level language model, which required the model to predict the hypernym corresponding to the masked word. ERICA ^[27] designed two contrastive learning tasks, to improve the model’s understanding of document-level entities and relationships.

References

Changuel, S.; Labroche, N.; Bouchon-Meunier, B. Resources Sequencing Using Automatic Prerequisite-Outcome Annotation. ACM Trans. Intell. Syst. Technol. 2015, 6, 1–30.
Lu, Y.; Chen, P.; Pian, Y.; Zheng, V.W. CMKT: Concept Map Driven Knowledge Tracing. IEEE Trans. Learn. Technol. 2022, 15, 467–480.
Gao, W.; Liu, Q.; Huang, Z.; Yin, Y.; Bi, H.; Wang, M.; Ma, J.; Wang, S.; Su, Y. RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 501–510.
Manrique, R.; Nunes, B.P.; Mariño, O.; Cardozo, N.; Siqueira, S.W.M. Towards the Identification of Concept Prerequisites Via Knowledge Graphs. In Proceedings of the 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT), Maceio, Brazil, 15–18 July 2019; pp. 332–336.
Gordon, J.; Zhu, L.; Galstyan, A.; Natarajan, P.; Burns, G. Modeling Concept Dependencies in a Scientific Corpus. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016.
Chen, W.; Lan, A.S.; Cao, D.; Brinton, C.G.; Chiang, M. Behavioral Analysis at Scale: Learning Course Prerequisite Structures from Learner Clickstreams. In Proceedings of the International Conference on Educational Data Mining, Raleigh, NC, USA, 16–20 July 2018.
Liang, C.; Wu, Z.; Huang, W.; Giles, C.L. Measuring Prerequisite Relations Among Concepts. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, the Association for Computational Linguistics, Lisbon, Portugal, 17–21 September 2015; pp. 1668–1674.
Jia, C.; Shen, Y.; Tang, Y.; Sun, L.; Lu, W. Heterogeneous Graph Neural Networks for Concept Prerequisite Relation Learning in Educational Data. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 2036–2047.
Roy, S.; Madhyastha, M.; Lawrence, S.; Rajan, V. Inferring Concept Prerequisite Relations from Online Educational Resources. In Proceedings of the AAAI Conference on Artificial Intelligence, Waikiki, HI, USA, 27 January–1 February 2019; AAAI Press: Washington, DC, USA, 2019; pp. 9589–9594.
Li, I.; Fabbri, A.R.; Hingmire, S.; Radev, D.R. R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning. In Proceedings of the COLING, International Committee on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 1147–1157.
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MO, USA, 2–7 June 2019; pp. 4171–4186.
Zhang, J.; Lin, N.; Zhang, X.; Song, W.; Yang, X.; Peng, Z. Learning Concept Prerequisite Relations from Educational Data via Multi-Head Attention Variational Graph Auto-Encoders. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual, 21–25 February 2022; pp. 1377–1385.
Kipf, T.N.; Welling, M. Variational graph auto-encoders. arXiv 2016, arXiv:1611.07308.
Liu, H.; Ma, W.; Yang, Y.; Carbonell, J.G. Learning Concept Graphs from Online Educational Data. J. Artif. Intell. Res. 2016, 55, 1059–1090.
Pan, L.; Li, C.; Li, J.; Tang, J. Prerequisite Relation Learning for Concepts in MOOCs. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1447–1456.
Li, B.; Peng, B.; Shao, Y.; Wang, Z. Prerequisite Learning with Pre-trained Language and Graph Embedding Models. In Natural Language Processing and Chinese Computing, Proceedings of the 10th CCF International Conference, NLPCC 2021, Qingdao, China, 13–17 October 2021, Proceedings, Part II 10; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2021; Volume 13029, pp. 98–108.
Li, I.; Fabbri, A.R.; Tung, R.R.; Radev, D.R. What Should I Learn First: Introducing LectureBank for NLP Education and Prerequisite Chain Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Waikiki, HI, USA, 27 January–1 February 2019; AAAI Press: Washington, DC, USA, 2019; pp. 6674–6681.
Li, I.; Yan, V.; Li, T.; Qu, R.; Radev, D.R. Unsupervised Cross-Domain Prerequisite Chain Learning using Variational Graph Autoencoders. arXiv 2021, arXiv:2105.03505.
Shen, J.T.; Yamashita, M.; Prihar, E.; Heffernan, N.T.; Wu, X.; Lee, D. MathBERT: A Pre-trained Language Model for General NLP Tasks in Mathematics Education. arXiv 2021, arXiv:2106.07340.
Liu, X.; Yin, D.; Zheng, J.; Zhang, X.; Zhang, P.; Yang, H.; Dong, Y.; Tang, J. OAG-BERT: Towards a Unified Backbone Language Model for Academic Knowledge Services. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 3418–3428.
Gong, Z.; Zhou, K.; Zhao, X.; Sha, J.; Wang, S.; Wen, J. Continual Pre-training of Language Models for Math Problem Understanding with Syntax-Aware Memory Network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 22–27 May 2022; pp. 5923–5933.
Ke, P.; Ji, H.; Liu, S.; Zhu, X.; Huang, M. SentiLR: Linguistic Knowledge Enhanced Language Representation for Sentiment Analysis. arXiv 2019, arXiv:1911.02493.
Zhou, W.; Lee, D.; Selvam, R.K.; Lee, S.; Ren, X. Pre-training Text-to-Text Transformers for Concept-centric Common Sense. arXiv 2020, arXiv:2011.07956.
Li, J.; Zhang, Z.; Zhao, H.; Zhou, X.; Zhou, X. Task-specific Objectives of Pre-trained Language Models for Dialogue Adaptation. arXiv 2020, arXiv:2009.04984.
Sun, Y.; Wang, S.; Li, Y.; Feng, S.; Chen, X.; Zhang, H.; Tian, X.; Zhu, D.; Tian, H.; Wu, H. ERNIE: Enhanced Representation through Knowledge Integration. arXiv 2019, arXiv:1904.09223.
Levine, Y.; Lenz, B.; Dagan, O.; Ram, O.; Padnos, D.; Sharir, O.; Shalev-Shwartz, S.; Shashua, A.; Shoham, Y. SenseBERT: Driving Some Sense into BERT. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 4656–4667.
Qin, Y.; Lin, Y.; Takanobu, R.; Liu, Z.; Li, P.; Ji, H.; Huang, M.; Sun, M.; Zhou, J. ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Online, 1–6 August 2021; pp. 3350–3363.