Abstract

Personalized large language models often exhibit defects such as the overuse of irrelevant user information, unnatural data formatting, and intrusive or repetitive name usage. These issues typically arise from a failure to prioritize relevant context or to integrate user data naturally into responses. A data distillation pipeline is disclosed to address these limitations by generating high-quality supervised fine-tuning examples. The method utilizes a “persona bank” to generate user prompts and profiles, followed by the generation of multiple responses guided by specific personalization principles. These guided responses are evaluated alongside a baseline unguided output using a side-by-side comparison. Training data is only retained when the principle-guided output demonstrates a clear improvement over the baseline. This process ensures the model learns to handle personal data judiciously, leading to more natural, contextually relevant, and less intrusive interactions without the need for extensive manual prompt engineering.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS