Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.


  • Generative transformer models have become complex and can process multiple input modalities.
  • Current methods for explaining their predictions require a lot of extra memory and are difficult to use in production.
  • AtMan provides explanations of generative transformer models with almost no extra cost.
  • AtMan manipulates the attention mechanisms of transformers to produce relevance maps.
  • AtMan uses a parallelizable token-based search method based on cosine similarity.
  • AtMan outperforms current state-of-the-art methods and is suitable for large model inference deployments.

Paper Content

Explainability through attention maps

  • Generalizing beyond single-task solutions using large-scale transformer-based language models is gaining attention
  • Transformers are the state-of-the-art method in NLP and CV
  • Transformers have demonstrated remarkable performance on multimodal modes
  • Necessity to better understand the reasons behind transformer predictions
  • “Scale is all you need” assumption of transformers results in large and complex architectures
  • XAI methods for transformers work by propagating gradients back through the model
  • XAI idea of perturbation is more memory-efficient
  • Proposed explainability method ATMAN visualizes important aspects of given image
  • ATMAN bridges relevance propagation and perturbations
  • ATMAN reduces number of required perturbations and does not require additional memory
  • ATMAN outperforms current state-of-the-art based on gradients
  • ATMAN allows one to study generative model predictions
  • ATMAN nullifies memory overhead and outperforms competitors on several benchmarks
  • Explainability of AI systems is a still ambiguously defined term
  • XAI methods are expected to show some level of relevance on the input with respect to the computed result of an algorithm
  • Explainability in CV is usually evaluated by mapping the relevance maps to a pixel level
  • NLP explanations are usually mixed with more complex philosophical interpretations
  • XAI methods can be divided into the classes of perturbation and gradient analysis
  • ATMAN is a multi-modal XAI method that extends the concept to the cosine neighborhood in the embedding space

Single token attention manipulation

  • ATMAN shifts the perturbation space from the raw input space to the embedded token space.
  • ATMAN reduces the dimensionality of possible perturbations to a single scaling factor per token.
  • ATMAN does not manipulate the value matrix of attention blocks.
  • ATMAN manipulates the attention entries at the positions of the corresponding input sequence tokens.
  • ATMAN can amplify or suppress concepts of the prompt.
  • ATMAN can suppress or amplify the influence of a token on the model’s output.
  • ATMAN measures and visualizes the distribution shift as explainability.

Correlated token attention manipulation

  • Suppressing single tokens works well when the entire entropy responsible to produce the target token occurs only once
  • Redundant information is prominent in the field of CV
  • Cosine similarity in the embedding space gives a good correlation distance estimator
  • Correlated token suppression suppresses all redundant information corresponding to a particular input token at once

Empirical evaluation

  • ATMAN achieves competitive results compared to previous XAI for transformers in language and vision domain
  • ATMAN scales efficiently and can be applied to large-scale AR models
  • Common metrics used are mean average precision (mAP) and recall (mAR)
  • XAI on generative tasks formulated using Stanford Question Answering (QA) Dataset
  • ATMAN outperforms all previous approaches in terms of mean average precision and interquartile recall
  • ATMAN can be lifted to explanation of paragraphs
  • ATMAN produces more human text explanations
  • ATMAN produces reasonable and similar output to Chefer
  • ATMAN computes explanations at almost no extra memory cost
  • ATMAN can be applied to large-scale transformer-based models


  • Proposed ATMAN, a modality-agnostic perturbation-based XAI method for transformer networks
  • Reduces complex issue of finding proper perturbations to single scaling factor per token
  • Outperforms current approaches relying on gradient computation
  • Memory-efficient, enabling utilization for large models
  • Reduces overall noise on generated explanation, but undesirable artifacts remain
  • Scaling explainability with model size should be further studied
  • Evaluated on classification tasks, open-vocabulary tasks, and multimodal transformer architectures
  • Collect original cross-entropy score of target tokens, then suppress one token at a time and track changes in cross-entropy score
  • Manipulating attention scores of single token steers model’s prediction into different contextual direction
  • Correlated token suppression of ATMAN enhances explainability in image domain
  • Evaluated on SQuAD dataset and OpenImages VQA benchmark
  • Generated open-vocabulary prediction with autoregressive model
  • Evaluated on 27.871 samples with average context sequence length of 144 tokens and average label coverage of 56%