Explainable AI (XAI) Explanation Techniques

Explainable AI (XAI) Explanation Techniques: Comparison

Please note this is a comparison between Version 1 by Saša Brdnik and Version 2 by Dean Liu.

Interest in artificial intelligence (AI) has been increasing rapidly over the past decade and has expanded to essentially all domains. Along with it grew the need to understand the predictions and suggestions provided by machine learning. Explanation techniques have been researched intensively in the context of explainable AI (XAI), with the goal of boosting confidence, trust, user satisfaction, and transparency.

Explainable Artificial Intelligence
learning analytics
XAI
XAI techniques

1. Explainable Artificial Intelligence

The term XAI is best described as “AI systems that can explain their rationale to a human user, characterize their strengths and weaknesses, and convey an understanding of how they will behave in the future” ^[1][6]. Research on XAI shows that introducing explanations to AI systems to illustrate their reasoning to end users can improve transparency, interpretability, understanding, satisfaction, and trust ^[2][3][4][5][7,8,9,10]. Observing the explainability techniques with relation to the machine learning models, Barredo et al. ^[6][11] presented a taxonomy that separates transparent models (such as decision trees, logistic regression, linear regression, and K-nearest neighbor) that are de facto explainable from models where post-hoc explainability has to be utilized (e.g., support vector machines, convolutional neural networks) to generate their explanations. Post-hoc explanations can be model-agnostic or model-specific. The former can be applied to any machine learning model with no regard to its inner process or representation, while the latter is related to the interpretation and understanding of a specific machine learning model. Various classifications exist for explanations in AI. They can be categorized mainly as global approaches, explaining the entire model, versus local approaches explaining an individual prediction; or as self-explainable models with a single structure versus post-hoc approaches explaining how a model produces its predictions without clarifying the structure of the model ^[6][7][11,12].

Common explainability approaches ^[6][7][11,12] include global explanations, which explain how different features/variables affect predictions within the model in question; feature relevance, which presents the computed relevance of each feature in the prediction process (simplified displays with a selection of the most important features are often used); and example-based explanations, which select a particular instance to explain the model, offering a more model-agnostic approach, which can be local or global. Additionally, local explanations are often used in systems for students and focus on a particular instance, independent of the higher-level general model. Comparison uses a selection of instances to explain the outcome of other instances on a local level. Counterfactual explanations describe a causal situation (i.e., formulated as “If X had not occurred, Y would not have occurred”) and explain and demonstrate the effects of small changes of feature values on the predicted output. Explanations by simplification use mentioned techniques to build a new similar yet simplified system (with reduced complexity but similar performance) based on the trained model to be explained. The aforementioned techniques for post-hoc explanations can include visualizations and text explanations. Their selection is conditioned by the type of machine learning model used for prediction.

Lim ^[8][13] presented a slightly different classification of ten explanation types, dividing them into model-independent and model-dependent explanation types. Model-independent explanations include input explanations, which inform users about the used input sensors and data sources, to ensure understanding of the explanation scope; output explanations inform users about all the possible outputs a system can produce; what explanations inform users of the system state in terms of output value; and what if explanations allow users to speculate about different outcomes by changing the set of user-set inputs. Model-dependent explanations, on the other hand, include why explanations, informing users why the output is derived from input values, possibly returning used conditions (rules); why not explanations, presenting users with information about why the alternative output was not produced based on the input; how to explanations, which provide explanation as to how the desired outcome is generally produced; and certainty explanations, which inform users about the certainty of the produced outcome.

Explanations within XAI lack standardization for their design, as well as their evaluation, as confirmed by literature reviews of the field ^[6][9][1,11]. Haque et al. ^[9][1] conducted a literature review of the XAI field and extracted major research themes as future research directions: XAI standardization (which includes developing comprehensive guidelines or standards for developing an XAI system), XAI visualization (focus on empirically measuring the explanation quality dimensions), and XAI effects (measuring user perceptions of the transparency, understandability, and usability of XAI systems). Additionally, Mohseni ^[10][14] recognized that the XAI design and evaluation methods should be adjusted based on the set goals of XAI research.

2. XAI in Education

AI systems are complex and, by default, suffer from bias and fairness issues. Explanations of AI were introduced in the field of human–computer interaction as a way to allow users to interact with systems that might be faulty in unexpected ways ^[11][15]. Explanations allow users to engage with AI systems in an informed manner and adapt their reliance based on the provided explanations ^[1][6]. Multiple studies have shown that introducing explanations in tutoring and e-learning systems increases students’ trust. Ooge et al. ^[5][10] observed changes in trust after introducing explanations in an e-learning platform for mathematics exercise recommendations. Explanations increased initial trust significantly when measured as a multidimensional construct (consisting of competence, benevolence, integrity, intention to return, and perceived transparency), while no changes were observed with one-dimensional measures. Conati et al. ^[12][16] presented students with personalized XAI hints within an intelligent tutoring system, evaluating their usefulness, intrusiveness, understanding, and trust. Providing students with explanations led to higher reported trust, while personalization improved their effectiveness further. The improvement in understanding of the explanations was related to students’ reading proficiency; students with high levels of reading proficiency benefited from explanations, while students with low levels did not. A study of XAI in education ^[7][12] analyzed the concepts of fairness, accountability, transparency, and ethics and proposed a framework for studying educational AI tools, including analysis of stakeholders, benefits, approaches, models, designs, and pitfalls.

Displays that aggregate different indicators about learners, learning processes, and/or learning context into visualizations can be categorized as learning analytics (LA) ^[13][17]. A systematic review of LA dashboard creation ^[14][18] showed that most dashboards (75%) are developed for teachers and that less focus is put on solutions targeted at learners. Additionally, only two observed propositions provided feedback or warnings to users, and only four papers used multiple data sources, indicating that this is an opportunity for future research. It is important to note that LA does not necessarily include AI. In the core literature ^[15][19], LA is defined as the “analysis and representation of data about learners in order to improve learning”. It can be conducted using traditional statistical methods or other data analysis approaches without the involvement of AI. Predictive modeling, the base functionality of many LA systems, is not that different from a traditional teacher recognizing which students are struggling in their class and providing them extra help or direction during the semester. The cost of LA utilization is derived from its functionalities; firstly, the predictions and analyses displayed in LA systems are based on estimations and probabilities, which many users fail to understand correctly ^[5][14][15][10,18,19]. Making decisions based on wrongly understood probabilities is problematic, especially if the output triggers other actions, or self-regulated learning, without the teacher’s involvement ^[15][19]. Additionally, there are challenges with privacy, data quality, availability, and fitness of data used in LA solutions in education ^[16][20]. On the other hand, there are many benefits of utilizing LA, mainly the improvement of the learning process based on the data available. Furthermore, students can improve their perceptions of the activity and have their personalized analyses available in more depth than a teacher could provide to each student during their limited time ^[15][19]. Overview of the trends in education systems ^[17][3] has shown that AI has been recognized as a trend in the educational setting, as more and more AI systems are used in LA, learning management systems, and educational data mining ^[16][20]. Some of the most common uses of AI ^[18][21] include use cases for profiling and prediction, assessment and evaluation, adaptive systems and personalization, and intelligent tutoring systems. Along with AI models, interpretable machine learning and XAI have been gaining interest in LA systems, as they offer a better understanding of the predictive modeling ^[16][20]. The trend of including AI in education has resulted in the development of the term artificial intelligence in education (AIEd). This field overlaps with LA. The main benefits of introducing AI in education and in the LA field ^[19][22] can be summarized with the development of intelligent agents, personalized learning systems, or environments and visualizations that offer deeper understanding than the classic non-AI analyses.

Related work on predicting students’ course achievement used logs from virtual learning environments ^[20][23] along with demographic data ^[21][24] and grades ^[22][25] in their prediction models. The need for the interpretability of the complex models used in education mining data techniques has been highlighted ^[23][26], and explanations of the model’s predictions have been introduced slowly, by ^[24][27] offering verbal explanations (i.e., “Evaluation is Pass because the number of assessments is high”), and by ^[5][10] offering verbal and visual explanations to students. In a related study, Conijn et al. ^[25][28] analyzed the effects of explanations of an automated essay scoring system on students’ trust and motivation in the context of higher education. The results indicated there is no one-size-fits-all explanation for different stakeholders and in different contexts.

3. Measuring Trust and Satisfaction

Various elements can be observed for measuring the effectiveness of an explanation; namely, user satisfaction, trust assessment, mental models, task performance, correctability ^[1][6], and fairness ^[26][29]. ReThis searchertudy is focused on the first two measures. Researchers followed the definition of trust as provided by Lee ^[27][30], defining it as “an attitude that an agent will achieve an individual’s goal in a situation characterised by uncertainty and vulnerability”. Many scales for assessing trust are presented in the scientific literature, and many of them were created with interpersonal (human-to-human) trust in mind. A considerable research gap is still reported in the studies, focusing on human–AI trust ^[4][28][9,31]. Vereschak et al. ^[28][31] surveyed existing methods to empirically investigate trust in AI-assisted decision-making systems. This overview of 83 papers shows a lack of standardization in measuring trust and considerable variability in the study designs and the measures used for their assessment. Most of the observed studies used questionnaires designed to assess trust in automation (i.e., ^{[29][30][31][32]}[32,33,34,35]). Numerous factors have been shown to increase users’ trust ^[33][36]. Transparency has gained much attention, highlighting the need for explanations that make the systems’ reasoning clear to humans. However, trust has been found to increase when the reasoning for the AI system’s decision is provided and to decrease when information on sources of uncertainty is shared with the user ^[4][9].

Explanations cannot be evaluated without measuring the user’s satisfaction with the provided explanation, which Hoffman ^[34][5] defines as “the degree to which users feel that they understand the AI system or process being explained to them. It is a contextualised, a posteriori judgment of explanations”. A similar study measuring trust, explanation satisfaction, and mental models with different types of explanations has been conducted in the case of self-driving cars ^[35][37]. The study reported the lowest user satisfaction with causal explanations and the highest levels of trust with intentional explanations, while mixed explanations led to the best functional understanding of the system. Related evaluation of understandability, usefulness, trustworthiness, informativeness, and satisfaction with explanations, generated with popular XAI methods (LIME ^[36][38], SHAP ^[37][39], and Partial Dependence Plots or PDP ^[38][40]) was conducted by ^[39][41], reporting higher satisfaction with global explanations with novice users compared to local feature explanations. Comparing the popular methods, PDP performed best on all evaluated criteria.

Comparing levels of explanation satisfaction and trust between different groups of users can be conducted based on various user characteristics. Level of experience and age are (along with personality traits) two of the major user characteristics recognized to affect user performance and preferences in general human–computer interaction. Although the scale from novice to expert is continuous, there is no universally accepted classification and definition of users’ level of experience and/or knowledge ^[40][42]. Level of experience is recognized as “the relative amount of experience of user segments of the user population” ^[41][43]. In higher education, groups of students can be distinguished based on the amount of ECTS (European Credit Transfer and Accumulation System) points they acquired during their studies. ECTS credits express the volume of learning based on the defined learning outcomes and their associated workload ^[42][44].