Abstract
Organizing photos into folders, albums, or other collections requires human input and is tedious. This disclosure describes the use of a large visual model and a large language model to automatically identify and recommend appropriate folders (or photo buckets) for photographs. With user permission, a photograph (in a user’s library) is provided as input to a large visual machine-learning model which generates a text summary of the photograph. The text summary is provided to a large language model (LLM) which is tasked with performing a semantic match of the text summary to (user-specified) folder names to generate recommendations to organize the image under one or more folders, or to suggest creation of a new folder. While the foregoing description refers to separate LVM and LLM, folder name identification can also be performed by a single multimodal model that takes a photo and existing folder names as input, and outputs a suggested folder for the photo.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Shin, D, "Automatically Photo Library Folder Organization Using Photo Visual Semantics", Technical Disclosure Commons, (July 25, 2024)
https://www.tdcommons.org/dpubs_series/7233