Abstract
Traditional dual encoder models, when trained with contrastive loss, may not consistently calibrate the distance between a query and relevant documents in the embedding space. This can lead to varying cutoff distances for relevancy across different queries. The disclosed technology addresses these and other limitations by introducing a hybrid dual encoder architecture, herein referred to as LinHyDE, which integrates a scoring head with a linear Multi-Layer Perceptron (MLP). This MLP generates calibrated scoring embeddings from the retrieval embeddings, and the overall model is trained with a combined loss function that incorporates both retrieval and scoring losses with adjustable weights. This approach provides calibrated scoring that can be leveraged for enhanced re-ranking and thresholding of retrieved candidates.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Dua, Sahil; Moiseev, Fedor; Van Cleeff, Pascal; and Dong, Zhe, "Enhanced Dual Encoder with Retrieval and Scoring Loss (LinHyDE)", Technical Disclosure Commons, (October 03, 2025)
https://www.tdcommons.org/dpubs_series/8675