Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Recent advancements in text-to-image diffusion have motivated the study of erasing specific concepts from model weights.
  • We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher.
  • We benchmark our method against previous approaches that remove sexually explicit content and demonstrate its effectiveness.
  • We conduct experiments erasing five modern artists from the network and conduct a user study to assess the human perception of the removed styles.
  • Our approach can remove concepts from a diffusion model permanently rather than modifying the output at the inference time.

Paper Content

Introduction

  • Recent text-to-image generative models have high image quality and seemingly infinite generation capabilities
  • Some concepts learned by the model are undesirable, including copyrighted content and pornography
  • We propose an approach for selectively removing a single concept from a text-conditional model’s weights after pretraining
  • Our method does not require retraining, which is prohibitive for large models
  • Our approach directly removes the concept from the model’s parameters, making it safe to distribute its weights
  • We conduct a user study to test the impact of erasure on user perception of the removed artist’s style in output images
  • We also test our method on erasure of complete object classes
  • Previous work to avoid undesirable image output in generative models has taken two main approaches: dataset removal and post-hoc modification
  • Image cloaking is another approach to protecting images from imitation by large models
  • Model editing is a lightweight method to alter the behavior of large-scale generative models
  • Memorization and unlearning aim to modify a model to behave as if particular training data had not been present
  • Energy-based composition can be used to reduce undesirable output of language models and vision generators
  • Our work introduces score composition as a source of unsupervised training data to teach a fine-tuned model to erase an undesired concept

Background

Denoising diffusion models

  • Diffusion models are a type of generative model that learn the distribution space by gradually removing noise.
  • The model predicts noise at each time step which is used to generate an intermediate denoised image.

Latent diffusion models

  • Latent diffusion models (LDM) operate in a lower dimensional latent space of a pretrained variational autoencoder.
  • Noise is added to the encoded latent during training.
  • Classifier-free guidance is used to regulate image generation.

Method

  • Goal of method is to erase concepts from text-to-image diffusion models using its own knowledge and no additional data
  • Approach involves editing pre-trained diffusion U-Net model weights to remove a specific style or concept
  • Draws inspiration from classifier-free guidance method and score-based composition
  • Objective is to learn the score of conditional model
  • Exploits model’s knowledge of concept to synthesize training samples, eliminating need for data collection
  • Training uses several instances of diffusion model, with one set of parameters frozen while training the other set of parameters to erase the concept

Importance of parameter choice

  • Applying the erasure objective depends on the subset of parameters that are fine-tuned.
  • Cross-attention parameters serve as a gateway to the prompt and directly depend on the text of the prompt.
  • Non-cross-attention parameters contribute to a visual concept even if the concept is not mentioned in the prompt.

Experiments

  • Trained models for 1000 gradient update steps with batch size of 1 and learning rate 1e-5 using Adam optimizer
  • ESD-x fine-tunes cross-attention, ESD-u fine-tunes unconditional weights of U-Net module
  • Baseline methods: SD, SLD, SD-Neg-Prompt

Artistic style removal

  • Conducted a user study to measure human perception of effectiveness of removed style
  • Collected 40 images of art from each artist using Google Image Search
  • Composed 40 generic text prompts to invoke artist’s style
  • Evaluated images from edited diffusion models, baseline models, and similar human artist
  • Dataset of 1000 images
  • Participants asked to estimate confidence level of experimental image being created by same artist
  • 13 total participants, average of 170 responses per participant
  • Evaluated effectiveness of ESD-x method for removing style of 5 modern artists
  • Assessed amount of interference introduced by ESD-x compared to other baseline methods
  • Findings indicate AI-duplicates rated higher than similar genuine artwork
  • ESD-x, SLD, and SD-Neg-Prompt all decrease perceived artistic style
  • Users most likely to consider images generated using ESD-x to be genuine artwork

Explicit content removal

  • Recent works have addressed the challenge of NSFW content restriction.
  • Retraining the models on filtered data can be expensive and still capable of generating nudity.
  • ESD-u is used to erase “nudity” and has a more significant effect in erasing nudity.
  • Image fidelity and CLIP score are used to measure the quality and specificity of the model.

Object removal

  • Investigated extent to which method can erase object classes from model
  • Prepared 10 ESD-u models, each removing one class from subset of ImageNet classes
  • Measured effect of removing targeted and untargeted classes
  • Generated 500 images of each class using base Stable Diffusion and fine-tuned models
  • Evaluated results by examining top-1 predictions of pretrained Resnet-50 Imagenet classifier
  • Results show approach effectively removes targeted classes, but some classes are more difficult to remove
  • Accuracy of untargeted classes remains high, but some interference

Limitations

  • Our method is more effective than baseline approaches for erasing targeted visual concepts
  • Erasing large concepts can create a trade-off between complete erasure and interference with other visual concepts
  • Erasing entire object classes can fail, erasing only particular distinctive attributes
  • Erasing entire object classes can create interference with other classes

Conclusion

  • Proposes an approach for eliminating concepts from text-to-image generation models
  • Does not require manipulating large datasets or expensive training
  • Removes concept directly from model weights
  • Efficacy demonstrated in 3 applications
  • Successfully removes explicit content
  • Can be used to remove artistic styles
  • Human study conducted to measure perception of artistic removal effect
  • Versatile, can be applied to concrete object classes
  • Good image fidelity performance compared to other methods
  • Can cleanly erase many object concepts from a model