Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Proposes a method to personalize a CLIP-conditioned diffusion model
- Guides the generative process towards custom aesthetics
- Validated with qualitative and quantitative experiments
- Uses recent stable diffusion model and aesthetically-filtered datasets
- Code released on GitHub
Paper Content
Arxiv:2209.12330v1 [cs.cv] 25 sep 2022
- Aim to provide user personalization to diffusion models
- Focus on learning custom objects from few images
- Alternative approach for personalization of text-to-image diffusion models
- Goal is to guide generative process towards custom aesthetics defined by user
- User chooses textual prompt to guide generation
- Represent aesthetic preferences of user with average of visual embeddings of images
- Measure agreement between CLIP representation of prompt and user preferences
- Perform gradient descent with respect to CLIP text encoder weights
- Only modify weights of CLIP text encoder
- Benefits include agnostic to diffusion model, computationally cheap, and user only needs to store one aesthetic embedding per set of images