Abstract

Interactions between large language models and external tools in multi-turn conversations often face limitations due to the black-box nature of many tools. Because tool behaviors can change or may not be fully captured during model training, the model may struggle to accurately track tool states or effectively refine outputs across multiple exchanges.

An inference-time technique is disclosed to model tool behavior and track state transitions. The method involves monitoring tool inputs and outputs during a conversation to build a representation of the tool’s state. Reflection steps are integrated into the reasoning process to evaluate how a tool is being invoked relative to user instructions. Based on this evaluation, tool modifier programs—such as adjusted prompts or code wrappers—are generated to steer tool behavior. This allows the system to adapt to tool faults by deciding whether to edit existing outputs or reset the tool state. The primary purpose is to improve the coherence and accuracy of tool-assisted responses in extended interactions without requiring model retraining.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS