Commonsense Causal Reasoning: Comparison
Please note this is a comparison between Version 1 by Zhiyi Luo and Version 3 by Wendy Huang.

Commonsense causal reasoning is the process of understanding the causal dependency between common events or actions. Traditionally, it was framed as a selection problem. However, it cannot obtain enough candidates and needs more flexible causes (or effects) in many scenarios, such as causal-based QA problems. Thus, the ability to generate causes (or effects) is an important problem.

  • commonsense causal reasoning
  • seq2seq
  • generation
  • causality
  • feature
  • co-occurrence
  • neural

1. Introduction

Commonsense causal reasoning entails inferring causal relations between everyday events or actions. For example, the statement “Amanda feels hot” can reasonably cause the effect “Amanda turns on the fan”, indicating a causal link between the two events. Previous works [1][2][1,2] have proposed various causal strength metrics to rank these candidates and identify the most plausible alternative for the premise. However, these ranking models for commonsense causality rely on human-labeled candidates, which is impractical for many natural language processing (NLP) generation scenarios, such as question answering and dialog completion. To address this limitation, we reframe the commonsense causal reasoning task is reframed aas a generation problem. In this formulation, the model is presented with a cause (or effect) sentence as the premise and is tasked with generating reasonable effect (or cause) sentences as targets. TWe refer to the cause-to-effect inference process is referred to as “forward reasoning”. For instance, given the cause sentence “Amanda feels hot”, potential effect sentences could be “Amanda turns on the fan” or “Amanda takes off her coat”, among others. In contrast, “backward reasoning” treats the input sentence as an effect and attempts to infer the cause. An exemplary cause output for the effect “Amanda feels hot” might be “The air-conditioner stops working”.
Existing approaches for causal reasoning [1][2][3][4][5][1,2,3,4,5] have predominantly relied on a selection-based style. Although these methods can be adapted to support generation-based causal reasoning, the adaptation process involves automatically generating a candidate set for each premise and subsequently employing the selection-based methods to reason about causal relationships. However, these adapted methods suffer from two limitations. Firstly, incorporating an additional online step to construct an appropriate candidate set for each premise results in a considerable computational burden. The computational cost of this step can lead to inefficiencies in the reasoning process. Secondly, the adapted model is restricted to selecting targets exclusively from the predefined candidates, thereby diminishing its flexibility in handling diverse causal reasoning tasks. In recent years, sequence-to-sequence (seq2seq) learning has achieved tremendous success in various text generation applications, such as machine translation [6][7][8][9][10][6,7,8,9,10], summarization [11][12][13][14][15][16][11,12,13,14,15,16], language modeling [17][18][19][17,18,19], story generation [20][21][20,21], and dialogue [22][23][24][22,23,24]. Building upon this progress, this research introduces a novel model based on the convolutional sequence-to-sequence framework [25], equipped with the causal attention fusion mechanism to empower generation-based causal reasoning tasks.
Within the encoder-decoder sequence-to-sequence architecture, the attention mechanism is intentionally designed to capture the semantic dependencies between the current decoding hidden state and each of the encoding hidden states. These semantic dependencies, commonly interpreted as alignments in machine translation and summarization, also play an important role in causal reasoning by serving as causal dependencies. However, the sparsity and ambiguity of commonsense causalities embedded in texts raise concerns about whether the seq2seq model could learn a robust causal dependency model for causality generation. 

2. Methods for Commonsense Causal Reasoning

Commonsense causal reasoning involves capturing and understanding the causal dependencies among events and actions. The most commonly used dataset for this task is the Choice of Plausible Alternatives (COPA), which includes a premise and two alternatives, along with a prompt specifying the relation between them. Previous studies can be broadly categorized into three lines: feature based methods, co-occurrence based methods, and neural-based methods. All of these approaches are applied to COPA to determine which alternative conveys a more plausible cause or effect based on the prompt from the premise sentence. As for feature-based methods, Goodwind [3] proposes the COPACETIC system developed by the University of Texas at Dallas (UTD) for COPA. They take COPA as a classification problem and use features derived from varied datasets. For co-occurrence based methods, Roemmele [1] and Gordon [26] focus on lexical co-occurrence statistics gathered from story corpora. They use the Pointwise Mutual Information (PMI) statistic [27] to compute the frequency of two words co-occurring within the same context relative to their overall frequency. It is essential to note that this co-occurrence measure is order-sensitive. In contrast, Luo [2] proposes a framework that automatically harvests a network of causal-effect terms from a large web corpus referred to as the CausalNet. This framework identifies sequences matching lexical templates indicative of causality. In neural-based methods, Roemmele [28] introduces a neural encoder-decoder model capable of learning to predict relations between adjacent sequences in stories as a means of modeling causality. Dasgupta [29] adopts meta-reinforcement learning to solve a range of problems, each containing causal structure, using an RNN-based agent trained with model-free reinforcement learning (RL). Additionally, Yeo [30] addresses the problem of multilingual causal reasoning in resource-poor languages, constructing a new causality network (PSG) of cause-effect terms, targeting machine-translated English without any language-specific knowledge of resource-poor languages. Recently, neural-based sequence-to-sequence models have met a great success in machine translation and summarization [31][32][33][34][35][31,32,33,34,35], especially CNN seq2seq models [12][25][36][37][12,25,36,37], which are much faster than others. It shows that the CNN seq2seq models can obtain the latent relation between encoder input and decoder output. Inspired by this, we identify the causality generation problem is identified. .The proposed model employs a CNN seq2seq architecture combined with a causality attention mechanism. To facilitate training, we introduce a causality dataset consisting of cause-effect sequence pairs is introduced. The causality attention mechanism is a hybrid approach, incorporating both traditional attention and causal strength computed by CausalNet [2]. AWe further introduce an explicit switch probability 𝜆 is further introduced to o adjust the traditional attention distribution and causal strength distribution. This fusion attention mechanism enables the model to capture causality between texts, thereby allowing the generation of causal sentences from input sentences. To facilitate a comprehensive comparison among the aforementioned methods, we summarize the main objectives and pros and cons of each method are summarized iin Table 1.
Table 1.
Comparison of methods for commonsense causal reasoning.
Method Main Objective Pros Cons
Feature-based methods Causal/Non-causal classification Utilizes diverse feature datasets Limited to predefined features
Co-occurrence based Methods Causal strength computation Captures statistical dependencies Limited to lexical templates
Neural-based methods (previous) Causality prediction Provides a high level of accuracy for causal/non-causal classification Limited to acquiring causal pairs; Struggles with capturing complex causalities
Neural-based methods (ours) Causality generation Enables the generation of causal sentences; captures complex causalities leveraging external knowledge sources Limited to the word-based attention fusion mechanism
Video Production Service