Abstract
Proposed herein is a Large Language Model (LLM) fine-tuning methodology, referred to as a Quantized Low-Rank Adaptation-Blend (QLoRA-Blend) methodology that enables small LLMs to outperform larger state-of-the-art models with minimal financial investment. By integrating multiple, domain-specific QLoRA adaptations using Spherical Linear Interpolation (SLERP), the QLoRA-Blend fine-tuning technique achieves superior accuracy and efficiency in Retrieval-Augmented Generation (RAG) systems.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Davidson, Nicholas Robert; Sattiraju, Siva Kanth; and Gangadharaiah, Umesh, "MULTI-STAGE FINE-TUNING PROCESS FOR OPTIMIZING SMALL LLMS IN RAG APPLICATIONS", Technical Disclosure Commons, (June 06, 2024)
https://www.tdcommons.org/dpubs_series/7085