Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Generative transformer models have become complex and can process multiple input modalities.
- Current methods for explaining their predictions require a lot of extra memory and are difficult to use in production.
- AtMan provides explanations of generative transformer models with almost no extra cost.
- AtMan manipulates the attention mechanisms of transformers to produce relevance maps.
- AtMan uses a parallelizable token-based search method based on cosine similarity.
- AtMan outperforms current state-of-the-art methods and is suitable for large model inference deployments.
Paper Content
Explainability through attention maps
- Generalizing beyond single-task solutions using large-scale transformer-based language models is gaining attention
- Transformers are the state-of-the-art method in NLP and CV
- Transformers have demonstrated remarkable performance on multimodal modes
- Necessity to better understand the reasons behind transformer predictions
- “Scale is all you need” assumption of transformers results in large and complex architectures
- XAI methods for transformers work by propagating gradients back through the model
- XAI idea of perturbation is more memory-efficient
- Proposed explainability method ATMAN visualizes important aspects of given image
- ATMAN bridges relevance propagation and perturbations
- ATMAN reduces number of required perturbations and does not require additional memory
- ATMAN outperforms current state-of-the-art based on gradients
- ATMAN allows one to study generative model predictions
- ATMAN nullifies memory overhead and outperforms competitors on several benchmarks
Related work
- Explainability of AI systems is a still ambiguously defined term
- XAI methods are expected to show some level of relevance on the input with respect to the computed result of an algorithm
- Explainability in CV is usually evaluated by mapping the relevance maps to a pixel level
- NLP explanations are usually mixed with more complex philosophical interpretations
- XAI methods can be divided into the classes of perturbation and gradient analysis
- ATMAN is a multi-modal XAI method that extends the concept to the cosine neighborhood in the embedding space
Single token attention manipulation
- ATMAN shifts the perturbation space from the raw input space to the embedded token space.
- ATMAN reduces the dimensionality of possible perturbations to a single scaling factor per token.
- ATMAN does not manipulate the value matrix of attention blocks.
- ATMAN manipulates the attention entries at the positions of the corresponding input sequence tokens.
- ATMAN can amplify or suppress concepts of the prompt.
- ATMAN can suppress or amplify the influence of a token on the model’s output.
- ATMAN measures and visualizes the distribution shift as explainability.
Correlated token attention manipulation
- Suppressing single tokens works well when the entire entropy responsible to produce the target token occurs only once
- Redundant information is prominent in the field of CV
- Cosine similarity in the embedding space gives a good correlation distance estimator
- Correlated token suppression suppresses all redundant information corresponding to a particular input token at once
Empirical evaluation
- ATMAN achieves competitive results compared to previous XAI for transformers in language and vision domain
- ATMAN scales efficiently and can be applied to large-scale AR models
- Common metrics used are mean average precision (mAP) and recall (mAR)
- XAI on generative tasks formulated using Stanford Question Answering (QA) Dataset
- ATMAN outperforms all previous approaches in terms of mean average precision and interquartile recall
- ATMAN can be lifted to explanation of paragraphs
- ATMAN produces more human text explanations
- ATMAN produces reasonable and similar output to Chefer
- ATMAN computes explanations at almost no extra memory cost
- ATMAN can be applied to large-scale transformer-based models
Conclusion
- Proposed ATMAN, a modality-agnostic perturbation-based XAI method for transformer networks
- Reduces complex issue of finding proper perturbations to single scaling factor per token
- Outperforms current approaches relying on gradient computation
- Memory-efficient, enabling utilization for large models
- Reduces overall noise on generated explanation, but undesirable artifacts remain
- Scaling explainability with model size should be further studied
- Evaluated on classification tasks, open-vocabulary tasks, and multimodal transformer architectures
- Collect original cross-entropy score of target tokens, then suppress one token at a time and track changes in cross-entropy score
- Manipulating attention scores of single token steers model’s prediction into different contextual direction
- Correlated token suppression of ATMAN enhances explainability in image domain
- Evaluated on SQuAD dataset and OpenImages VQA benchmark
- Generated open-vocabulary prediction with autoregressive model
- Evaluated on 27.871 samples with average context sequence length of 144 tokens and average label coverage of 56%