Abstract

Some photo library applications provide a conversational interface that enables users to retrieve photos that match their query. However, current interfaces are not multi-turn and are limited to retrieval and cannot automatically perform generation tasks. This disclosure describes an interactive, conversational interface for a digital photo library that allows users to explore their photos using natural language and receive generative, insightful answers. The exploration is multi-turn and multimodal, and is powered by a multimodal large language model (LLM). With user permission, the LLM is employed to interpret both the user's query (e.g., entered as free-form text) and the content of their photos in sequence.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Shin, D, "Interactive Photo Exploration Agent with Chained Language Model Calls", Technical Disclosure Commons, (December 08, 2025)
https://www.tdcommons.org/dpubs_series/8995

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Interactive Photo Exploration Agent with Chained Language Model Calls

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Interactive Photo Exploration Agent with Chained Language Model Calls

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information