Abstract
The techniques introduced here provide a reinforcement-based dynamic token injection system for guiding Large Language Model (LLM) reasoning during inference. A Small Language Model (SLM), using rewards for output quality and penalties for excessive length, may monitor the LLM reasoning output and selectively inject tokens that may prompt further LLM reasoning or conclude the LLM reasoning earlier.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Payani, Ali; Lee, Myungjin; and Kompella, Ramana, "DYNAMIC TOKEN INJECTION FOR ENHANCED LANGUAGE MODEL REASONING USING REINFORCEMENT LEARNING", Technical Disclosure Commons, (June 16, 2026)
https://www.tdcommons.org/dpubs_series/10466