Abstract

Per-agent and per-tenant large-language-model (LLM) spend is hard to predict, attribute, and govern. The prevailing industry practice is post-hoc observation: token meters, billing exports, and span-level cost-attribution dashboards report what was spent after it was spent. The cost figure, however, never re-enters the agent's own decision loop. As a consequence the agent continues to select the same expensive model and the same long context until a hard spend cap trips, at which point the agent stops abruptly — a cost cliff rather than graceful degradation.

This publication describes a per-persona token budget tracker that converts cost into a closed-loop control input. The system maintains a provider-agnostic pricing table spanning multiple language-model and speech-model providers, each with a per-model input/output unit cost. On every model invocation it records and prices an outcome record (token counts, model identifier, source channel), aggregates cost over per-agent hourly/daily windows, computes a rolling forecast of window-end and month-end spend at the current rate, and emits an early-warning alert when a configurable threshold (e.g. 80% of budget) is crossed. The defining feature is a normalized remaining-budget signal supplied to the agent's strategy-selection subsystem: when the signal falls below a configurable level, the selector biases its choice toward strategies labeled as low-cost — a smaller model, a shorter retrieved context, fewer tool calls, or a cheaper voice provider. This degrades quality smoothly as budget is exhausted rather than producing a hard stop.

The characterizing property — and the contribution this document places into the public domain — is that cost is a control input to strategy selection, not merely an observation. The repository accompanying this whitepaper includes a clean-room runnable reference implementation, a prior-art landscape, a data model, worked examples, and a plan for an open-source reference application. It is published defensively to keep the technique freely practiceable and to bar later patenting of the same subject matter by others.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS