Abstract
Retrieval-augmented generation (RAG) pipelines increasingly insert a language-model (LLM) re-ranking stage between candidate retrieval and answer synthesis, because LLM re-rankers materially improve the ordering of retrieved passages. But LLM re-rankers are slow, costly, rate-limited, and intermittently unavailable. The naive integration — retrieve, then await the LLM re-ranker, then return — promotes the slow, failure-prone stage onto the critical path. End-to-end retrieval latency, availability, and cost all become hostage to the LLM provider: one provider timeout or one tripped spend cap converts a sub-100 ms retrieval into a multi-second failure or an outage.
This publication describes a hybrid ranking architecture that inverts the dependency. A deterministic lexical stage (TF-IDF-style scoring with title and recency boosts) is the critical path and always returns a fully-ordered result within a tight latency budget. The LLM re-ranking stage is structurally a non-critical-path enhancement: it is invoked only on a bounded top-K subset of the already-ordered deterministic output; it is governed by a per-call time and budget bound; and — the central novelty — on any failure (bridge unavailability, timeout, malformed re-ranker output, or budget breach) the system fails open, returning the deterministic ranking unchanged and without surfacing the failure to the caller. The caller's contract is a valid, ordered result on every invocation. The LLM can only ever improve the ordering of the top-K window; it can never degrade availability, blow the latency budget, or propagate an error.
This document provides the system architecture, the fail-open state machine, the top-K and budget cost-bounding mechanics, a data model, a clean-room reference implementation that reduces the invention to practice, a worked end-to-end example, a security and failure-mode analysis, framework mappings, an evaluation methodology, and an enumerated set of inventive claims (one independent and fourteen dependent). It is published defensively to keep the technique freely practiceable and to bar later patenting by others.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Assuncao, gustavo matthew, "Hybrid RAG Ranking with Graceful LLM Fallback", Technical Disclosure Commons, (June 29, 2026)
https://www.tdcommons.org/dpubs_series/10583