Commonsense Causal Reasoning

Version	Summary	Created by	Modification	Content Size	Created at	Operation
1		Zhiyi Luo	--	1066	2023-12-11 06:01:04	\|
2	Format correct	Wendy Huang	Meta information modification	1066	2023-12-11 09:40:49	\| \|
3	format correct	Wendy Huang	Meta information modification	1066	2023-12-13 12:24:53	\|

This entry is adapted from the peer-reviewed paper 10.3390/math11234796

Commonsense causal reasoning is the process of understanding the causal dependency between common events or actions. Traditionally, it was framed as a selection problem. However, it cannot obtain enough candidates and needs more flexible causes (or effects) in many scenarios, such as causal-based QA problems. Thus, the ability to generate causes (or effects) is an important problem.

commonsense causal reasoning seq2seq generation causality feature co-occurrence neural

1. Introduction

Commonsense causal reasoning entails inferring causal relations between everyday events or actions. For example, the statement “Amanda feels hot” can reasonably cause the effect “Amanda turns on the fan”, indicating a causal link between the two events. Previous works ^[1]^[2] have proposed various causal strength metrics to rank these candidates and identify the most plausible alternative for the premise. However, these ranking models for commonsense causality rely on human-labeled candidates, which is impractical for many natural language processing (NLP) generation scenarios, such as question answering and dialog completion. To address this limitation, the commonsense causal reasoning task is reframed as a generation problem. In this formulation, the model is presented with a cause (or effect) sentence as the premise and is tasked with generating reasonable effect (or cause) sentences as targets. The cause-to-effect inference process is referred to as “forward reasoning”. For instance, given the cause sentence “Amanda feels hot”, potential effect sentences could be “Amanda turns on the fan” or “Amanda takes off her coat”, among others. In contrast, “backward reasoning” treats the input sentence as an effect and attempts to infer the cause. An exemplary cause output for the effect “Amanda feels hot” might be “The air-conditioner stops working”.

Existing approaches for causal reasoning ^[1]^[2]^[3]^[4]^[5] have predominantly relied on a selection-based style. Although these methods can be adapted to support generation-based causal reasoning, the adaptation process involves automatically generating a candidate set for each premise and subsequently employing the selection-based methods to reason about causal relationships. However, these adapted methods suffer from two limitations. Firstly, incorporating an additional online step to construct an appropriate candidate set for each premise results in a considerable computational burden. The computational cost of this step can lead to inefficiencies in the reasoning process. Secondly, the adapted model is restricted to selecting targets exclusively from the predefined candidates, thereby diminishing its flexibility in handling diverse causal reasoning tasks. In recent years, sequence-to-sequence (seq2seq) learning has achieved tremendous success in various text generation applications, such as machine translation ^[6]^[7]^[8]^[9]^[10], summarization ^[11]^[12]^[13]^[14]^[15]^[16], language modeling ^[17]^[18]^[19], story generation ^[20]^[21], and dialogue ^[22]^[23]^[24]. Building upon this progress, this research introduces a novel model based on the convolutional sequence-to-sequence framework ^[25], equipped with the causal attention fusion mechanism to empower generation-based causal reasoning tasks.

Within the encoder-decoder sequence-to-sequence architecture, the attention mechanism is intentionally designed to capture the semantic dependencies between the current decoding hidden state and each of the encoding hidden states. These semantic dependencies, commonly interpreted as alignments in machine translation and summarization, also play an important role in causal reasoning by serving as causal dependencies. However, the sparsity and ambiguity of commonsense causalities embedded in texts raise concerns about whether the seq2seq model could learn a robust causal dependency model for causality generation.

2. Methods for Commonsense Causal Reasoning

Commonsense causal reasoning involves capturing and understanding the causal dependencies among events and actions. The most commonly used dataset for this task is the Choice of Plausible Alternatives (COPA), which includes a premise and two alternatives, along with a prompt specifying the relation between them. Previous studies can be broadly categorized into three lines: feature based methods, co-occurrence based methods, and neural-based methods. All of these approaches are applied to COPA to determine which alternative conveys a more plausible cause or effect based on the prompt from the premise sentence.

As for feature-based methods, Goodwind ^[3] proposes the COPACETIC system developed by the University of Texas at Dallas (UTD) for COPA. They take COPA as a classification problem and use features derived from varied datasets. For co-occurrence based methods, Roemmele ^[1] and Gordon ^[26] focus on lexical co-occurrence statistics gathered from story corpora. They use the Pointwise Mutual Information (PMI) statistic ^[27] to compute the frequency of two words co-occurring within the same context relative to their overall frequency. It is essential to note that this co-occurrence measure is order-sensitive. In contrast, Luo ^[2] proposes a framework that automatically harvests a network of causal-effect terms from a large web corpus referred to as the CausalNet. This framework identifies sequences matching lexical templates indicative of causality.

In neural-based methods, Roemmele ^[28] introduces a neural encoder-decoder model capable of learning to predict relations between adjacent sequences in stories as a means of modeling causality. Dasgupta ^[29] adopts meta-reinforcement learning to solve a range of problems, each containing causal structure, using an RNN-based agent trained with model-free reinforcement learning (RL). Additionally, Yeo ^[30] addresses the problem of multilingual causal reasoning in resource-poor languages, constructing a new causality network (PSG) of cause-effect terms, targeting machine-translated English without any language-specific knowledge of resource-poor languages.

Recently, neural-based sequence-to-sequence models have met a great success in machine translation and summarization ^[31]^[32]^[33]^[34]^[35], especially CNN seq2seq models ^[12]^[25]^[36]^[37], which are much faster than others. It shows that the CNN seq2seq models can obtain the latent relation between encoder input and decoder output. Inspired by this, the causality generation problem is identified. The proposed model employs a CNN seq2seq architecture combined with a causality attention mechanism. To facilitate training, a causality dataset consisting of cause-effect sequence pairs is introduced. The causality attention mechanism is a hybrid approach, incorporating both traditional attention and causal strength computed by CausalNet ^[2]. An explicit switch probability

λ

is further introduced to adjust the traditional attention distribution and causal strength distribution. This fusion attention mechanism enables the model to capture causality between texts, thereby allowing the generation of causal sentences from input sentences.

To facilitate a comprehensive comparison among the aforementioned methods, the main objectives and pros and cons of each method are summarized in Table 1.

Table 1. Comparison of methods for commonsense causal reasoning.

Method	Main Objective	Pros	Cons
Feature-based methods	Causal/Non-causal classification	Utilizes diverse feature datasets	Limited to predefined features
Co-occurrence based Methods	Causal strength computation	Captures statistical dependencies	Limited to lexical templates
Neural-based methods (previous)	Causality prediction	Provides a high level of accuracy for causal/non-causal classification	Limited to acquiring causal pairs; Struggles with capturing complex causalities
Neural-based methods (ours)	Causality generation	Enables the generation of causal sentences; captures complex causalities leveraging external knowledge sources	Limited to the word-based attention fusion mechanism

References

Roemmele, M.; Bejan, C.A.; Gordon, A.S. Choice of plausible alternatives: An evaluation of commonsense causal reasoning. In Proceedings of the AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, Stanford, CA, USA, 21–23 March 2011; AAAI Press: Washington, DC, USA, 2011.
Luo, Z.; Sha, Y.; Zhu, K.Q.; Hwang, S.; Wang, Z. Commonsense causal reasoning between short texts. In Principles of Knowledge Representation and Reasoning: Proceedings of the 15th International Conference (KR-16), Cape Town, South Africa, 25–29 April 2016; AAAI Press: Washington, DC, USA, 2016; pp. 421–431.
Goodwin, T.; Rink, B.; Roberts, K.; Harabagiu, S.M. UTDHLT: COPACETIC system for choosing plausible alternatives. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics, Stroudsburg, PA, USA, 7–8 June 2012; pp. 461–466.
Jabeen, S.; Gao, X.; Andreae, P. Using asymmetric associations for commonsense causality detection. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, Gold Coast, Australia, 1–5 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 877–883.
Lester, B.; Al-Rfou, R.; Constant, N. The power of scale for parameter-efficient prompt tuning. arXiv 2021, arXiv:2104.08691.
Wang, J.; Hou, Y.; Liu, J.; Cao, Y.; Lin, C.Y. A statistical framework for product description generation. In Proceedings of the 8th International Joint Conference on Natural Language Processing, Taipei, Taiwan, 8 November 2017; Short Papers. Asian Federation of Natural Language Processing: Taipei, Taiwan, 2017; Volume 2, pp. 187–192.
Chen, Y.; Li, V.O.; Cho, K.; Bowman, S.R. A Stable and Effective Learning Strategy for Trainable Greedy Decoding. arXiv 2018, arXiv:1804.07915.
Song, K.; Tan, X.; He, D.; Lu, J.; Qin, T.; Liu, T.Y. Double path networks for sequence to sequence learning. arXiv 2018, arXiv:1806.04856.
Wu, F.; Fan, A.; Baevski, A.; Dauphin, Y.N.; Auli, M. Pay Less Attention with Lightweight and Dynamic Convolutions. arXiv 2019, arXiv:1901.10430.
Wang, W.; Jiao, W.; Hao, Y.; Wang, X.; Shi, S.; Tu, Z.; Lyu, M. Understanding and improving sequence-to-sequence pretraining for neural machine translation. arXiv 2022, arXiv:2203.08442.
Fan, A.; Grangier, D.; Auli, M. Controllable abstractive summarization. arXiv 2017, arXiv:1711.05217.
Liu, Y.; Luo, Z.; Zhu, K. Controlling length in abstractive summarization using a convolutional neural network. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 4110–4119.
Narayan, S.; Cohen, S.B.; Lapata, M. Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. arXiv 2018, arXiv:1808.08745.
Guo, J.; Xu, L.; Chen, E. Jointly masked sequence-to-sequence model for non-autoregressive neural machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Virtual, 5–10 July 2020; pp. 376–385.
Kouris, P.; Alexandridis, G.; Stafylopatis, A. Abstractive text summarization: Enhancing sequence-to-sequence models using word sense disambiguation and semantic content generalization. Comput. Linguist. 2021, 47, 813–859.
Joshi, A.; Fidalgo, E.; Alegre, E.; Fernández-Robles, L. DeepSumm: Exploiting topic models and sequence to sequence networks for extractive text summarization. Expert Syst. Appl. 2023, 211, 118442.
Baevski, A.; Auli, M. Adaptive input representations for neural language modeling. arXiv 2018, arXiv:1809.10853.
Song, K.; Tan, X.; Qin, T.; Lu, J.; Liu, T.Y. Mass: Masked sequence to sequence pre-training for language generation. arXiv 2019, arXiv:1905.02450.
Lin, S.C.; Yang, J.H.; Nogueira, R.; Tsai, M.F.; Wang, C.J.; Lin, J. Conversational question reformulation via sequence-to-sequence architectures and pretrained language models. arXiv 2020, arXiv:2004.01909.
Fan, A.; Lewis, M.; Dauphin, Y. Hierarchical neural story generation. arXiv 2018, arXiv:1805.04833.
Fan, A.; Lewis, M.; Dauphin, Y. Strategies for Structuring Story Generation. arXiv 2019, arXiv:1902.01109.
Miller, A.H.; Feng, W.; Fisch, A.; Lu, J.; Batra, D.; Bordes, A.; Parikh, D.; Weston, J. Parlai: A dialog research software platform. arXiv 2017, arXiv:1705.06476.
Dinan, E.; Roller, S.; Shuster, K.; Fan, A.; Auli, M.; Weston, J. Wizard of wikipedia: Knowledge-powered conversational agents. arXiv 2018, arXiv:1811.01241.
Zhao, J.; Mahdieh, M.; Zhang, Y.; Cao, Y.; Wu, Y. Effective sequence-to-sequence dialogue state tracking. arXiv 2021, arXiv:2108.13990.
Ott, M.; Edunov, S.; Baevski, A.; Fan, A.; Gross, S.; Ng, N.; Grangier, D.; Auli, M. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the NAACL-HLT 2019: Demonstrations, Minneapolis, MN, USA, 2–7 June 2019.
Gordon, A.S.; Bejan, C.A.; Sagae, K. Commonsense causal reasoning using millions of personal stories. In Proceedings of the Association for the Advancement of Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011; AAAI Press: Washington, DC, USA, 2011.
Church, K.W.; Hanks, P. Word association norms, mutual information, and lexicography. Comput. Linguist. 1990, 16, 22–29.
Roemmele, M.; Gordon, A. An encoder-decoder approach to predicting causal relations in stories. In Proceedings of the 1st Workshop on Storytelling, New Orleans, LA, USA, 7 June 2018; Association for Computational Linguistics: New Orleans, LA, USA, 2018; pp. 50–59.
Dasgupta, I.; Wang, J.X.; Chiappa, S.; Mitrovic, J.; Ortega, P.A.; Raposo, D.; Hughes, E.; Battaglia, P.; Botvinick, M.; Kurth-Nelson, Z. Causal Reasoning from Meta-reinforcement Learning. arXiv 2019, arXiv:1901.08162v1.
Yeo, J.; Wang, G.; Cho, H.; Choi, S.; Hwang, S. Machine-Translated Knowledge Transfer for Commonsense Causal Reasoning. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, LA, USA, 2–7 February 2018; pp. 2021–2028.
Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473.
Sutskever, I.; Vinyals, O.; Le, Q.V. Sequence to sequence learning with neural networks. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 3104–3112.
Paulus, R.; Xiong, C.; Socher, R. A Deep Reinforced Model for Abstractive Summarization. arXiv 2017, arXiv:1705.04304.
See, A.; Liu, P.J.; Manning, C.D. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL 2017, Vancouver, BC, Canada, 30 July–4 August 2017; Volume 1: Long Papers, pp. 1073–1083.
Nallapati, R.; Zhou, B.; dos Santos, C.N.; Gülçehre, Ç.; Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, CoNLL 2016, Berlin, Germany, 11–12 August 2016; pp. 280–290.
Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional sequence to sequence learning. In Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 6–11 August 2017; pp. 1243–1252.
Fan, A.; Grangier, D.; Auli, M. Controllable abstractive summarization. In Proceedings of the 2nd Workshop on Neural Machine Translation and Generation, NMT@ACL 2018, Melbourne, Australia, 20 July 2018; pp. 45–54.

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.

Upload a video for this entry

Information

Subjects: Computer Science, Artificial Intelligence

Contributors MDPI registered users' name will be linked to their SciProfiles pages. To register with us, please refer to https://encyclopedia.pub/register :

Zhiyi Luo

Yizhu Liu

Shuyun Luo

View Times: 233

Update Date: 13 Dec 2023

Table of Contents

Video Upload Options

Confirm

1. Introduction

2. Methods for Commonsense Causal Reasoning

References