MULTI-STAGE FINE-TUNING PROCESS FOR OPTIMIZING SMALL LLMS IN RAG APPLICATIONS
Abstract
Proposed herein is a Large Language Model (LLM) fine-tuning methodology, referred to as a Quantized Low-Rank Adaptation-Blend (QLoRA-Blend) methodology that enables small LLMs to outperform larger state-of-the-art models with minimal financial investment. By integrating multiple, domain-specific QLoRA adaptations using Spherical Linear Interpolation (SLERP), the QLoRA-Blend fine-tuning technique achieves superior accuracy and efficiency in Retrieval-Augmented Generation (RAG) systems.
This paper has been withdrawn.