Inventor(s)

D ShinFollow

Abstract

Some photo library applications provide a conversational interface that enables users to retrieve photos that match their query. However, current interfaces are not multi-turn and are limited to retrieval and cannot automatically perform generation tasks. This disclosure describes an interactive, conversational interface for a digital photo library that allows users to explore their photos using natural language and receive generative, insightful answers. The exploration is multi-turn and multimodal, and is powered by a multimodal large language model (LLM). With user permission, the LLM is employed to interpret both the user's query (e.g., entered as free-form text) and the content of their photos in sequence.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS