Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Proposes a method to personalize a CLIP-conditioned diffusion model
  • Guides the generative process towards custom aesthetics
  • Validated with qualitative and quantitative experiments
  • Uses recent stable diffusion model and aesthetically-filtered datasets
  • Code released on GitHub

Paper Content

Arxiv:2209.12330v1 [cs.cv] 25 sep 2022

  • Aim to provide user personalization to diffusion models
  • Focus on learning custom objects from few images
  • Alternative approach for personalization of text-to-image diffusion models
  • Goal is to guide generative process towards custom aesthetics defined by user
  • User chooses textual prompt to guide generation
  • Represent aesthetic preferences of user with average of visual embeddings of images
  • Measure agreement between CLIP representation of prompt and user preferences
  • Perform gradient descent with respect to CLIP text encoder weights
  • Only modify weights of CLIP text encoder
  • Benefits include agnostic to diffusion model, computationally cheap, and user only needs to store one aesthetic embedding per set of images

Conclusion