Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • AI tools for healthcare have sparked debate around adoption of the technology.
  • Explainable AI (XAI) is seen as a way to make AI devices more transparent and trustworthy.
  • Some have expressed concerns about the reliability of XAI techniques, particularly feature attribution methods.
  • Feature importance can be used reliably when low-level features come with a clear semantics, such as tabular data like Electronic Health Records (EHRs).

Paper Content

Introduction

  • Artificial Intelligence (AI) and model complexity have increased, leading to a surge of interest in explainable AI (XAI).
  • XAI is particularly important in safety-critical domains such as healthcare.
  • XAI has already been used to improve diagnosis and prognosis of diseases.
  • There are a variety of techniques for XAI, which can be grouped into local vs. global, and model-specific vs. model-agnostic approaches.
  • Feature attribution methods are popular XAI techniques, which assign a measure of how much each feature contributes to the model output.
  • Despite enthusiasm for XAI, there is no consensus on its reliability.
  • Feature attribution methods can be unreliable due to a lack of semantic match between explanations and human understanding.
  • Semantic match can be obtained reliably for data types with clear semantics, such as tabular data.

The criticism on local feature attribution methods

  • Feature attribution methods present themselves as heat maps or colored overlays
  • Intuitively, highlighted regions comprise pixels which were considered ‘important’ by the model
  • What look like plausible explanations at first may turn out to be ungrounded or spurious
  • Humans are unable to attribute meaning to a sub-symbolic encoding of information
  • Need a systematic way to translate sub-symbolic representations to human-understandable ones
  • Overlaying the heatmap to an image encourages us to use our visual intuition as translation, but this is an ill-advised one
  • Feature attribution methods may be potentially misleading and bring no clear added value

Distinguishing low-and high-level features

  • Images are unstructured data
  • High-level features in images can be attributed meaning
  • Structured data has clear meaning for low-level features
  • High-level features in structured data behave similarly to images
  • Heatmaps need a semantic match diagram to extract information

Saving feature importance for low-level features

  • Low-level and high-level features can be distinguished to understand when semantic match works and when it fails.
  • Post-hoc local feature attribution can be used on low-level features when they have a predefined translation.
  • Semantic match allows users to engage with explanations and decide if they are agreeable.
  • High-level features can be highlighted in image data, but without semantic match, users cannot trust the machine’s internal representation.

Discussion

  • Reviewed reliability problem of feature attribution methods
  • Proposed to diagnose issue with semantic match diagram
  • Without clear meaning and translation, semantic match cannot be obtained for high-level feature importance
  • Current methods for feature attribution may not be appropriate for unstructured data
  • Structured data may still benefit from feature attribution
  • Humans need to exercise oversight to spot failure modes of ML applications
  • Explanations can still fail to deliver on their promise
  • Need to build explanations in the clinician’s language