Abstract

Large Language Model (LLM) deployment approaches are proposed herein that may facilitate the deployment of LLMs for mobile, edge, and low-resource hardware by enabling unprecedented accuracy and speed in real-time applications. By leveraging advanced techniques, such as selective pruning, a fine-tuning methodology referred to herein as a Quantized Low-Rank Adaptation-Blend (QLoRA-Blend) fine-tuning, and efficient quantization, deployment approaches proposed herein may deliver top-tier artificial intelligence (AI) performance at a level unmatched by existing deployment strategies, directly on consumer-grade hardware.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Davidson, Nicholas Robert; Sattiraju, Siva Kanth; and Gangadharaiah, Umesh, "DEPLOYMENT OF SMALL LLMS ON CONSUMER-GRADE HARDWARE FOR EDGE COMPUTING", Technical Disclosure Commons, (June 06, 2024)
https://www.tdcommons.org/dpubs_series/7086

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

DEPLOYMENT OF SMALL LLMS ON CONSUMER-GRADE HARDWARE FOR EDGE COMPUTING

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

DEPLOYMENT OF SMALL LLMS ON CONSUMER-GRADE HARDWARE FOR EDGE COMPUTING

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information