Defensive Publications Series

METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR DIFFICULTY-DRIVEN TEMPORAL ADAPTATION OF DIRECT PREFERENCE OPTIMIZATION

Abstract

Methods, systems, and computer program products are provided for difficulty-driven, temporal adaption of direct preference optimization of a large language machine learning model. An example method includes receiving pairs of responses associated with text prompts, determining a probability margin score for a pair of responses, computing a measure of semantic similarity associated with the pair of responses, computing a difficulty score for the pair of responses based on the probability margin score, computing a time-dependent temperature parameter for the pair of responses based on a minimum time-dependent temperature parameter, a maximum time-dependent temperature parameter, a time parameter associated with training a large language machine learning model, and the difficulty score for the first pair of responses, calculating a measure loss for the pair of responses based on the time-dependent temperature parameter for the first pair of responses, and updating the large language machine learning model based on the measure loss.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Wu, Ziwei; Islam, Rashidul; and Cai, Yiwei, "METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR DIFFICULTY-DRIVEN TEMPORAL ADAPTATION OF DIRECT PREFERENCE OPTIMIZATION", Technical Disclosure Commons, (May 29, 2026)
https://www.tdcommons.org/dpubs_series/10277

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR DIFFICULTY-DRIVEN TEMPORAL ADAPTATION OF DIRECT PREFERENCE OPTIMIZATION

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR DIFFICULTY-DRIVEN TEMPORAL ADAPTATION OF DIRECT PREFERENCE OPTIMIZATION

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information