Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Recent advancements in text-to-image diffusion have motivated the study of erasing specific concepts from model weights.
We propose a fine-tuning method that can erase a visual concept from a pre-trained diffusion model, given only the name of the style and using negative guidance as a teacher.
We benchmark our method against previous approaches that remove sexually explicit content and demonstrate its effectiveness.
We conduct experiments erasing five modern artists from the network and conduct a user study to assess the human perception of the removed styles.
Our approach can remove concepts from a diffusion model permanently rather than modifying the output at the inference time.

Paper Content

Introduction

Recent text-to-image generative models have high image quality and seemingly infinite generation capabilities
Some concepts learned by the model are undesirable, including copyrighted content and pornography
We propose an approach for selectively removing a single concept from a text-conditional model’s weights after pretraining
Our method does not require retraining, which is prohibitive for large models
Our approach directly removes the concept from the model’s parameters, making it safe to distribute its weights
We conduct a user study to test the impact of erasure on user perception of the removed artist’s style in output images
We also test our method on erasure of complete object classes

Previous work to avoid undesirable image output in generative models has taken two main approaches: dataset removal and post-hoc modification
Image cloaking is another approach to protecting images from imitation by large models
Model editing is a lightweight method to alter the behavior of large-scale generative models
Memorization and unlearning aim to modify a model to behave as if particular training data had not been present
Energy-based composition can be used to reduce undesirable output of language models and vision generators
Our work introduces score composition as a source of unsupervised training data to teach a fine-tuned model to erase an undesired concept

Background

Denoising diffusion models

Diffusion models are a type of generative model that learn the distribution space by gradually removing noise.
The model predicts noise at each time step which is used to generate an intermediate denoised image.

Latent diffusion models

Latent diffusion models (LDM) operate in a lower dimensional latent space of a pretrained variational autoencoder.
Noise is added to the encoded latent during training.
Classifier-free guidance is used to regulate image generation.

Method

Goal of method is to erase concepts from text-to-image diffusion models using its own knowledge and no additional data
Approach involves editing pre-trained diffusion U-Net model weights to remove a specific style or concept
Draws inspiration from classifier-free guidance method and score-based composition
Objective is to learn the score of conditional model
Exploits model’s knowledge of concept to synthesize training samples, eliminating need for data collection
Training uses several instances of diffusion model, with one set of parameters frozen while training the other set of parameters to erase the concept

Importance of parameter choice

Applying the erasure objective depends on the subset of parameters that are fine-tuned.
Cross-attention parameters serve as a gateway to the prompt and directly depend on the text of the prompt.
Non-cross-attention parameters contribute to a visual concept even if the concept is not mentioned in the prompt.

Experiments

Trained models for 1000 gradient update steps with batch size of 1 and learning rate 1e-5 using Adam optimizer
ESD-x fine-tunes cross-attention, ESD-u fine-tunes unconditional weights of U-Net module
Baseline methods: SD, SLD, SD-Neg-Prompt

Artistic style removal

Conducted a user study to measure human perception of effectiveness of removed style
Collected 40 images of art from each artist using Google Image Search
Composed 40 generic text prompts to invoke artist’s style
Evaluated images from edited diffusion models, baseline models, and similar human artist
Dataset of 1000 images
Participants asked to estimate confidence level of experimental image being created by same artist
13 total participants, average of 170 responses per participant
Evaluated effectiveness of ESD-x method for removing style of 5 modern artists
Assessed amount of interference introduced by ESD-x compared to other baseline methods
Findings indicate AI-duplicates rated higher than similar genuine artwork
ESD-x, SLD, and SD-Neg-Prompt all decrease perceived artistic style
Users most likely to consider images generated using ESD-x to be genuine artwork

Explicit content removal

Recent works have addressed the challenge of NSFW content restriction.
Retraining the models on filtered data can be expensive and still capable of generating nudity.
ESD-u is used to erase “nudity” and has a more significant effect in erasing nudity.
Image fidelity and CLIP score are used to measure the quality and specificity of the model.

Object removal

Investigated extent to which method can erase object classes from model
Prepared 10 ESD-u models, each removing one class from subset of ImageNet classes
Measured effect of removing targeted and untargeted classes
Generated 500 images of each class using base Stable Diffusion and fine-tuned models
Evaluated results by examining top-1 predictions of pretrained Resnet-50 Imagenet classifier
Results show approach effectively removes targeted classes, but some classes are more difficult to remove
Accuracy of untargeted classes remains high, but some interference

Limitations

Our method is more effective than baseline approaches for erasing targeted visual concepts
Erasing large concepts can create a trade-off between complete erasure and interference with other visual concepts
Erasing entire object classes can fail, erasing only particular distinctive attributes
Erasing entire object classes can create interference with other classes

Conclusion

Proposes an approach for eliminating concepts from text-to-image generation models
Does not require manipulating large datasets or expensive training
Removes concept directly from model weights
Efficacy demonstrated in 3 applications
Successfully removes explicit content
Can be used to remove artistic styles
Human study conducted to measure perception of artistic removal effect
Versatile, can be applied to concrete object classes
Good image fidelity performance compared to other methods
Can cleanly erase many object concepts from a model

Link to paper#

Abstract#

Paper Content#

Introduction#

Related works#

Background#

Denoising diffusion models#

Latent diffusion models#

Method#

Importance of parameter choice#

Experiments#

Artistic style removal#

Explicit content removal#

Object removal#

Limitations#

Conclusion#

Link to paper

Abstract

Paper Content

Introduction

Related works

Background

Denoising diffusion models

Latent diffusion models

Method

Importance of parameter choice

Experiments

Artistic style removal

Explicit content removal

Object removal

Limitations

Conclusion