Defensive Publications Series

Semantic Photo Search with Compositional Spatial Reasoning

Abstract

Photo search technologies can identify the content of an image but cannot answer queries relating to the spatial relationships between entities in an image. This disclosure describes computational architectures and techniques for spatial compositional photo search and retrieval. Subjective, spatially rich natural language queries are translated into strict geometric and relational constraints, which are evaluated efficiently against a large corpus of pre-computed spatial indices. Upon ingestion, an image is processed to generate a highly compressed, pre-computed spatial scene graph G = (V, E, C), where V is a set of multimodal entities and bounding boxes; E is a set of directed spatial relational edges; and C is a set of semantic background and compositional global features. When a user submits a natural language query, a large language model (LLM) parser translates the unstructured text into a structured, machine-readable spatial layout specification. A retrieval engine evaluates candidate spatial scene graphs against the generated spatial layout specification.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Yakar, Tamar and Labzovsky, Ilia, "Semantic Photo Search with Compositional Spatial Reasoning", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10133

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Semantic Photo Search with Compositional Spatial Reasoning

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Semantic Photo Search with Compositional Spatial Reasoning

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information