Abstract

Traditional dual encoder models, when trained with contrastive loss, may not consistently calibrate the distance between a query and relevant documents in the embedding space. This can lead to varying cutoff distances for relevancy across different queries. The disclosed technology addresses these and other limitations by introducing a hybrid dual encoder architecture, herein referred to as LinHyDE, which integrates a scoring head with a linear Multi-Layer Perceptron (MLP). This MLP generates calibrated scoring embeddings from the retrieval embeddings, and the overall model is trained with a combined loss function that incorporates both retrieval and scoring losses with adjustable weights. This approach provides calibrated scoring that can be leveraged for enhanced re-ranking and thresholding of retrieved candidates.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS