Abstract

Aligning a large language model (LLM) to individual preferences is difficult to perform at scale. This disclosure describes techniques that leverage direct preference optimization (DPO) and low-rank adaptation (LoRA) to enable scalable alignment of artificial intelligence (AI) models. With user permission, user’s edits to suggestions from the model are obtained. The context of the user’s written data or interaction with the LLM is obtained. The user’s edits serve as training data to contextually fine-tune the model using LoRA and DPO. Training data is created as a side product of the user's tasks assisted by the LLM. The training data is uncontrived since it is based on real usage by human users. User feedback is naturally and imperceptibly obtained, without requiring the user to upvote/downvote an LLM response. The techniques result in nuanced and bespoke AI agents that can represent a user in a manner that hews more closely to their unique style.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Ganesh, Saravanan; Li, Yi; Hu, Yunfei; Kallarackal, Krystal; Nguyen, Kelvin; and Lin, Chu-Cheng, "Personalizing AI Models Using Low-rank Adaptation and Direct Preference Optimization", Technical Disclosure Commons, (May 04, 2025)
https://www.tdcommons.org/dpubs_series/8088

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Personalizing AI Models Using Low-rank Adaptation and Direct Preference Optimization

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Personalizing AI Models Using Low-rank Adaptation and Direct Preference Optimization

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information