Abstract

Many online platforms permit various entities to post multimedia content, e.g., text together with visual content. For example, merchants in online stores, digital maps, etc. can add merchant posts to their account to highlight their merchandise and special offers. Matching text queries provided by users against only the text portion of the multimedia content fails to take into account the visual portion (image) of the merchant post. This disclosure describes the use of dual encoders that allow matching user query embeddings against embeddings obtained from textual content and embeddings obtained from the visual content of the same multimedia post to obtain respective relevance scores. Search quality is improved by incorporating information from different modalities. A multimodal ranker is used to rank merchant posts based on both the text and image relevance scores and on post metadata such as freshness, user reviews for the post, etc. The dual encoders can be trained using human-labeled as well as LLM-generated data.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Ray, Saswati and Nguyen, Huy Thong, "Relevance Determination for Multimedia Posts Using a Multimodal Ranker", Technical Disclosure Commons, (November 21, 2024)
https://www.tdcommons.org/dpubs_series/7582

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Relevance Determination for Multimedia Posts Using a Multimodal Ranker

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Relevance Determination for Multimedia Posts Using a Multimodal Ranker

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information