Abstract
The present disclosure provides a method of an automated way to calculate and evaluate quality factor of metadata provided to Retrieval Augmented Generation (RAG) pipelines for ensuring data quality of RAG pipelines used in LLM apps for Generative Artificial Intelligence (AI) application. The method includes capturing metadata of vectors/ embeddings while receiving vectors/ embeddings, applying Anomaly Detection models to generate threshold for expected metadata, applying quality score based filtering based on metadata on the Approximate Nearest Neighbour (ANN) search while generating output response in response to user query, providing output response based on the filtered vectors
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
PANDEY, KAUTUK, "AN ADVANCED WAY TO ENSURE DATA QUALITY OF RETRIEVAL AUGMENTED GENERATION (RAG) PIPELINES", Technical Disclosure Commons, (September 11, 2024)
https://www.tdcommons.org/dpubs_series/7339