Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

AI tools for healthcare have sparked debate around adoption of the technology.
Explainable AI (XAI) is seen as a way to make AI devices more transparent and trustworthy.
Some have expressed concerns about the reliability of XAI techniques, particularly feature attribution methods.
Feature importance can be used reliably when low-level features come with a clear semantics, such as tabular data like Electronic Health Records (EHRs).

Artificial Intelligence (AI) and model complexity have increased, leading to a surge of interest in explainable AI (XAI).
XAI is particularly important in safety-critical domains such as healthcare.
XAI has already been used to improve diagnosis and prognosis of diseases.
There are a variety of techniques for XAI, which can be grouped into local vs. global, and model-specific vs. model-agnostic approaches.
Feature attribution methods are popular XAI techniques, which assign a measure of how much each feature contributes to the model output.
Despite enthusiasm for XAI, there is no consensus on its reliability.
Feature attribution methods can be unreliable due to a lack of semantic match between explanations and human understanding.
Semantic match can be obtained reliably for data types with clear semantics, such as tabular data.

Feature attribution methods present themselves as heat maps or colored overlays
Intuitively, highlighted regions comprise pixels which were considered ‘important’ by the model
What look like plausible explanations at first may turn out to be ungrounded or spurious
Humans are unable to attribute meaning to a sub-symbolic encoding of information
Need a systematic way to translate sub-symbolic representations to human-understandable ones
Overlaying the heatmap to an image encourages us to use our visual intuition as translation, but this is an ill-advised one
Feature attribution methods may be potentially misleading and bring no clear added value

Low-level and high-level features can be distinguished to understand when semantic match works and when it fails.
Post-hoc local feature attribution can be used on low-level features when they have a predefined translation.
Semantic match allows users to engage with explanations and decide if they are agreeable.
High-level features can be highlighted in image data, but without semantic match, users cannot trust the machine’s internal representation.

Reviewed reliability problem of feature attribution methods
Proposed to diagnose issue with semantic match diagram
Without clear meaning and translation, semantic match cannot be obtained for high-level feature importance
Current methods for feature attribution may not be appropriate for unstructured data
Structured data may still benefit from feature attribution
Humans need to exercise oversight to spot failure modes of ML applications
Explanations can still fail to deliver on their promise
Need to build explanations in the clinician’s language