Abstract

An autonomous agent backed by a large language model (LLM) frequently receives inputs in bursts: two hundred emails delivered at once, a flood of webhook events, a queue of chat messages that arrived during a network partition. The dominant cost in processing such a burst is triage — the act of classifying each input into a type and a priority so the agent knows what to do with it. When triage itself invokes the LLM, a burst of N inputs produces N LLM calls before any useful work begins, a thundering-herd that spikes token spend and latency in lock-step with arrival rate. This disclosure describes a two-tier attention queue that decouples burst absorption from burst cost. A bounded in-memory hot tier (default capacity 50) holds items awaiting cognitive focus; when it overflows, items spill to an unbounded persistent cold tier backed by a durable store. The defining invariant is amortized, exactly-once triage: each item is triaged at most once, at intake, and its assigned type and priority are persisted with the item so they survive every subsequent memory↔disk transition. Overflow to the cold tier carries the triage result with the payload and never re-triages; promotion back to the hot tier reads the stored classification rather than recomputing it. Dequeue is memory-first: the hot tier is drained before the cold tier, exploiting cache and context warmth. The combination yields burst absorption without proportional LLM spend, a steady-state processing rate bounded by worker throughput rather than arrival rate, and a hard ceiling on triage cost equal to the number of distinct items rather than the number of queue transitions. We present the architecture, the exactly-once triage state machine, the data model, an original clean-room reference implementation, a worked 200-email scenario, failure-mode analysis, framework alignment, an evaluation methodology, and a set of inventive claims. This document is published defensively to keep the technique freely practiceable.

Creative Commons License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS