Defensive Publications Series

Attractor Surgery for Post-Training Repair of Collusive Pricing Policies via Targeted Q-Entry Editing

AnonymousFollow

Abstract

Disclosed techniques perform post-training repair of reinforcement-learning pricing agents by targeted editing of action-value information to eliminate high-price attractor cycles. A frozen policy graph is constructed from a greedy policy derived from Q-values, and the graph is decomposed to identify attractor cycles. Each cycle is scored using an elevation metric relative to benchmark prices and compared to a threshold to classify high-price attractors. For states on a high-price cycle, the greedy collusive action is replaced with a one-shot best-response action by demoting Q(s,a_collusive) and promoting Q(s,a_BR) by an editing margin that flips the argmax, while leaving all other Q-entries unchanged. The policy graph is globally re-enumerated and re-verified after edits, and iterations continue until no high-price attractor remains. Outputs may include edited Q-values and a repair report documenting edits and verification results.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Anonymous, "Attractor Surgery for Post-Training Repair of Collusive Pricing Policies via Targeted Q-Entry Editing", Technical Disclosure Commons, (June 29, 2026)
https://www.tdcommons.org/dpubs_series/10615

Download

COinS

Technical Disclosure Commons

Defensive Publications Series

Attractor Surgery for Post-Training Repair of Collusive Pricing Policies via Targeted Q-Entry Editing

Abstract

Creative Commons License

Recommended Citation

Browse

Search

Submit

Additional Information

Technical Disclosure Commons

Defensive Publications Series

Attractor Surgery for Post-Training Repair of Collusive Pricing Policies via Targeted Q-Entry Editing

Inventor(s)

Abstract

Creative Commons License

Recommended Citation

Share

Browse

Search

Submit

Additional Information