Abstract

This disclosure describes techniques of information retrieval from text-based corpuses by combining dense and sparse embeddings into a single, composite, dense embedding. Documents from the corpus most relevant to a user query can be found by using the composite dense embedding to natively run nearest-neighbor searches using existing tools. The techniques retain information from both sparse and dense embeddings and provide a straightforward, mathematically sound way of combining them. The composite dense embedding, when used in retrieving information in response to a user query, performs well even when the query includes a proper name or a rare term, and can process the subtleties of natural language while executing natively on existing embedding matching tools.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Stavarache, Eric and Weissenberger, Felix, "Hybrid Sparse-Dense Embedding Search", Technical Disclosure Commons, (March 20, 2024)
https://www.tdcommons.org/dpubs_series/6804

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Hybrid Sparse-Dense Embedding Search

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Hybrid Sparse-Dense Embedding Search

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information