Abstract

While image generation machine learning models are capable of generating images responsive to text and/or image prompts, it is difficult to precisely control local image features of the generated images. High-level features such as color, shape, etc. can be controlled via input prompts, but using a deeper, more precise text description often does not lead to output images that capture details of the text description. In domain-specific use cases, e.g., garment try-on, this limitation means that output images do not match the requested detail in a query. This disclosure describes techniques to generate images that match an input query such as a text description or an input image with high accuracy by using a domain-specific text and/or image embedding space that is matched to the specific application, e.g., image generation for garment try-on. Query embeddings are computed for input queries and close matches to the query embeddings within the embedding space are identified. The identified embeddings are used to fine-tune the image generation model. The same embeddings or additional embeddings may be computed during the fine-tuning. The fine-tuned model is used to generate output images for the application.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS