Abstract

The techniques introduced here provide a reinforcement-based dynamic token injection system for guiding Large Language Model (LLM) reasoning during inference. A Small Language Model (SLM), using rewards for output quality and penalties for excessive length, may monitor the LLM reasoning output and selectively inject tokens that may prompt further LLM reasoning or conclude the LLM reasoning earlier.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS