Logical Reasoning Machine Reading Comperhension: Comparison
Please note this is a comparison between Version 1 by Tianyu Sun and Version 2 by Lindsay Dong.

Logical reasoning requires correct understanding of the logical relationships between different sentences, pointing out a positive example that enhances the reliability of a conclusion or a negative example that weakens the reliability of a conclusion. The need for this capability places higher demands on the performance of existing reading comprehension models since the inference capability of a large number of models relies heavily on entities and their numerical weights. 

  • machine reading comprehension
  • graph attention network
  • logical reasoning

1. Introduction

Artificial intelligence has deeply influenced people’s work and daily lives today. For instance, voice assistants like Siri and Cortana can now help users operate their devices, and ChatGPT, with its impressive capabilities, provides users with inspiration, references, and aids in decision-making. But the foundation of all these is that AI can correctly understand the requirements you express in natural language. This is closely related to the goal of machine reading comprehension tasks.
Machine reading comprehension (MRC) is a fundamental task in the field of natural language processing that requires models to respond to a given passage of text and related questions. Just as we assess human comprehension of a passage of text through reading comprehension tests, MRC can be used to assess a computer system’s ability to understand human language.
As one of the important research task in natural language processing, a large number of reading comprehension datasets have been proposed such as SQuAD [1], an extractive reading comprehension dataset based on Wikipedia. HotpotQA [2], a multi-hop reading comprehension dataset that requires extracting information from multiple distinct text passages. And DROP [3], a generative reading comprehension dataset that assesses discrete reasoning abilities, etc. As the datasets continue to evolve, their difficulty is gradually increasing. Since logical reasoning ability has long been considered as a key thinking ability of the human brain [4], this has also been recognized by many cutting-edge academics in the field. Several challenging multiple-choice logical reasoning reading comprehension datasets have been built, such as LogiQA [5] and ReClor [6].
Following Google’s proposal of the BERT [7] model, the method based on pre-trained language models, which can fully exploit and utilize the predictive information and prior knowledge obtained from massive training data, have achieved substantial performance gains in 11 downstream tasks in the natural language processing domain, including machine reading comprehension tasks. The Transformer architecture was originally proposed and designed to address sequence transformation and machine translation tasks [8]. Its encoding layer employs a self-attention mechanism and significantly improved performance compared to the RNN method. Subsequently, an increasing number of NLP tasks use methods based on pre-trained models, including named entity recognition [9][10][11][9,10,11], machine translation [12][13][12,13], and machine reading comprehension [14][15][16][17][18][14,15,16,17,18].
Logical reasoning requires correct understanding of the logical relationships between different sentences, pointing out a positive example that enhances the reliability of a conclusion or a negative example that weakens the reliability of a conclusion. The need for this capability places higher demands on the performance of existing reading comprehension models since the inference capability of a large number of models relies heavily on entities and their numerical weights. However, due to the complexity of logical reasoning machine reading comprehension problems, pre-trained language models still do not perform cautiously well on such tasks and struggle to reach the average human level.
In recent years, researchers have worked on designing specific model architectures for integrating logical structures, Jiao et al. [4] proposed the introduction of symbolic logic as data expansion into neural network models using self-supervised and contrastive learning. Wang et al. [17] proposed LReasoner, a contextual and data enhancement framework based on parsing logical expressions.
With the introduction of DAGN [18], utilizing graph structures to model the abstract logical relationships in logical reasoning tasks and employing computational methods like GNN or Graph Transform to simulate the reasoning process, a novel approach to addressing this task has been presented. After that, Li [16] and Ouyang et al. [19] also proposed implicit inference of logical information of articles using graph structures, and these approaches have made a certain degree of progress on various datasets. Previous research believes that an intuitive idea for identifying logical relationships between text units for this goal is to use discourse relations [20], such as words like “because” and “therefore” for cause-effect relationships, and “if” words to indicate hypothetical relations, and implicit logical relations brought about by punctuation. Modeling logical structure has proven to be one of the effective methods for enhancing logical reasoning for the widely used pre-trained models so far in the current reasoning task.

2. Logical Reasoning Machine Reading Comperhension

In recent years, with the success of pre-trained language models in NLP, many pre-trained language models (e.g., BERT [7], RoBERTa [21][25], XLNet [22][26], GPT-3 [23][27], etc.) have met or exceeded human performance on popular MRC datasets.
However, those MRC datasets are lacking, or just have a little of data examining logical reasoning abilities. For example, according to Sugawara and Aizawa et al. [24][28], there is no logical reasoning content in the MCTest [25][29] dataset, while only 1.2% of the SQuAD dataset requires logical reasoning to answer questions. Therefore, Yu et al. [6] proposed the ReClor dataset, which focuses on examining logical reasoning ability. A task related to logical inference MRC is Natural Language Inference (NLI), which requires the model to classify the logical relationships of given sentence pairs. However, the NLI task only considers three simple logical relations (implication, contradiction, and irrelevance) at the sentence level, whereas logical reasoning MRC is more challenging as it needs to predict multiple complex logical relations at the chapter level to determine the answer.
As shown in Table 1, the approaches for logical reasoning machine reading comprehension in recent years can be divided into the following categories:
Table 1.
Summary of related work.
Rule-Based Pre-Training Based Data Enhancement GNN Based
NatLog [26][30]

Stanford RTE [27][31]
LReasoner [17]

L-Datt [28][32]
MERIt [4]

LogiGAN [29][33]
DAGN [18]

AdaLoGN [16]

Logiformer [30][34]

LoCSGN [31][35]
The first category is the approaches from the pre-training perspective, based on heuristic rules to capture logical relations in large corpora, and design corresponding training tasks for these relations to secondary train the existing pre-trained language models, such as MERIt and LogiGAN [29][33]. MERIt [4] proposes to use rules based on a large amount of unlabeled textual data, modeled after the form of the logical inference MRC task, to construct data for self-supervised pre-training in contrast learning. LogiGAN first uses pre-specified logical indicators (e.g., “therefore”, “due to”, “we may infer that“) to identify logical inference phenomena from large-scale unlabeled text, and then masks the expressions that follow the logical indicators and trains the generative model to recover the masked expressions. The second category is the approaches from data enhancement perspective, which symbolically infers implicitly existing expressions based on logical equivalence laws and expands the given text to match the answers, such as LReasoner [17]. It proposes a logic-driven context extension framework that integrates three steps: logical identification to parse out logical expressions from context, logical extension to derive implicit expressions, and logical verbalization to predict the answer. The third category is the approaches which use predefined rules to construct a graph structure based on the content of the text and options. The nodes of the graph correspond to logical units in the text, i.e., meaningful sentences or text fragments, the edges of the graph represent the relationships between the logical units. By employing methods such as Graph Neural Networks (GNN) and Graph Transformers [32][36], the logical reasoning process is modeled, thereby enhancing the performance of logical reasoning. As the difficulty of the logical reasoning machine reading comprehension task continues to increase, merely focusing on the interaction between tokens at the sentence-level granularity is far from sufficient. Models need to establish relationships between sentences at a holistic level consisting of context, questions, and answers. However, logical relationships are difficult to extract as implicit structures hidden in the context, and the existing datasets are not labeled with logical structures. Therefore, DAGN proposed by Huang et al. [18] and Logiformer proposed by Xu et al. [30][34] both use graph structure to represent logical information in the context. DAGN uses discourse relations in PDTB2.0 [33][24] as separators to divide articles into multiple elementary discourse units (EDUs). The graph structure is obtained by using EDUs as nodes and discourse relations as edges, and the graph network is used to learn logical features of the text from EDUs to improve its reasoning ability. Currently graph neural networks (GNNs) are successfully used in logical reasoning tasks, but the node-to-node messaging in the model is still inadequate, resulting in a continued lack of adequate means of interaction between articles and options. To address the above challenges, AdaLoGN proposed by Li et al. [16] employs directed textual logical graphs and predefined logical relations, and makes these predefined relations to reason with each other based on certain rules, adaptively extending the already constructed discourse graphs in a relevant way so as to enhance symbolic reasoning capabilities. Logiformer proposed by Xu et al. [30][34] uses graph transformer to model the dependency relations in logical and syntactic graphs, respectively, and introduces the structural information of the graph by introducing the adjacency matrix corresponding to the graph into the attention computation process.  
Video Production Service