Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Generative transformer models have become complex and can process multiple input modalities.
Current methods for explaining their predictions require a lot of extra memory and are difficult to use in production.
AtMan provides explanations of generative transformer models with almost no extra cost.
AtMan manipulates the attention mechanisms of transformers to produce relevance maps.
AtMan uses a parallelizable token-based search method based on cosine similarity.
AtMan outperforms current state-of-the-art methods and is suitable for large model inference deployments.

Generalizing beyond single-task solutions using large-scale transformer-based language models is gaining attention
Transformers are the state-of-the-art method in NLP and CV
Transformers have demonstrated remarkable performance on multimodal modes
Necessity to better understand the reasons behind transformer predictions
“Scale is all you need” assumption of transformers results in large and complex architectures
XAI methods for transformers work by propagating gradients back through the model
XAI idea of perturbation is more memory-efficient
Proposed explainability method ATMAN visualizes important aspects of given image
ATMAN bridges relevance propagation and perturbations
ATMAN reduces number of required perturbations and does not require additional memory
ATMAN outperforms current state-of-the-art based on gradients
ATMAN allows one to study generative model predictions
ATMAN nullifies memory overhead and outperforms competitors on several benchmarks

Explainability of AI systems is a still ambiguously defined term
XAI methods are expected to show some level of relevance on the input with respect to the computed result of an algorithm
Explainability in CV is usually evaluated by mapping the relevance maps to a pixel level
NLP explanations are usually mixed with more complex philosophical interpretations
XAI methods can be divided into the classes of perturbation and gradient analysis
ATMAN is a multi-modal XAI method that extends the concept to the cosine neighborhood in the embedding space

ATMAN shifts the perturbation space from the raw input space to the embedded token space.
ATMAN reduces the dimensionality of possible perturbations to a single scaling factor per token.
ATMAN does not manipulate the value matrix of attention blocks.
ATMAN manipulates the attention entries at the positions of the corresponding input sequence tokens.
ATMAN can amplify or suppress concepts of the prompt.
ATMAN can suppress or amplify the influence of a token on the model’s output.
ATMAN measures and visualizes the distribution shift as explainability.

Suppressing single tokens works well when the entire entropy responsible to produce the target token occurs only once
Redundant information is prominent in the field of CV
Cosine similarity in the embedding space gives a good correlation distance estimator
Correlated token suppression suppresses all redundant information corresponding to a particular input token at once

ATMAN achieves competitive results compared to previous XAI for transformers in language and vision domain
ATMAN scales efficiently and can be applied to large-scale AR models
Common metrics used are mean average precision (mAP) and recall (mAR)
XAI on generative tasks formulated using Stanford Question Answering (QA) Dataset
ATMAN outperforms all previous approaches in terms of mean average precision and interquartile recall
ATMAN can be lifted to explanation of paragraphs
ATMAN produces more human text explanations
ATMAN produces reasonable and similar output to Chefer
ATMAN computes explanations at almost no extra memory cost
ATMAN can be applied to large-scale transformer-based models

Proposed ATMAN, a modality-agnostic perturbation-based XAI method for transformer networks
Reduces complex issue of finding proper perturbations to single scaling factor per token
Outperforms current approaches relying on gradient computation
Memory-efficient, enabling utilization for large models
Reduces overall noise on generated explanation, but undesirable artifacts remain
Scaling explainability with model size should be further studied
Evaluated on classification tasks, open-vocabulary tasks, and multimodal transformer architectures
Collect original cross-entropy score of target tokens, then suppress one token at a time and track changes in cross-entropy score
Manipulating attention scores of single token steers model’s prediction into different contextual direction
Correlated token suppression of ATMAN enhances explainability in image domain
Evaluated on SQuAD dataset and OpenImages VQA benchmark
Generated open-vocabulary prediction with autoregressive model
Evaluated on 27.871 samples with average context sequence length of 144 tokens and average label coverage of 56%