Abstract

Proposed herein is a Large Language Model (LLM) fine-tuning methodology, referred to as a Quantized Low-Rank Adaptation-Blend (QLoRA-Blend) methodology that enables small LLMs to outperform larger state-of-the-art models with minimal financial investment. By integrating multiple, domain-specific QLoRA adaptations using Spherical Linear Interpolation (SLERP), the QLoRA-Blend fine-tuning technique achieves superior accuracy and efficiency in Retrieval-Augmented Generation (RAG) systems.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS