Abstract
This document discloses a mechanism, at a security gateway that mediates an AI agent's use of tools served over the Model Context Protocol (MCP), for governing the *tool surface* an imported MCP server advertises — as opposed to only inspecting the traffic that passes through it. At the time an MCP server is imported and approved, the gateway, through its own gated broker path, fetches the server's complete advertised surface (its tool list and prompt list, including each tool's name, human-readable description, parameter schema, and declared return shape), computes a content hash over that complete surface, and records it in an append-only ledger **bound to (a) a per-server cryptographic workload identity (a SPIFFE identity assigned to the server at import) and (b) the identity of the operator who approved it.** The recorded hash is then *pinned*: every later advertisement of the surface and every tool invocation is checked against the pin. The disclosed contribution is not the bare act of hashing a manifest and re-checking it (which is known background); it is the combination of four co-operating properties. First, **provenance binding**: the pin is bound to the server's workload identity, and tool names are projected into the agent's capability catalogue qualified by that identity (`server::tool`), so a second server cannot shadow a first server's tool by re-advertising the same bare name. Second, **block-and-re-approve enforcement** rather than detect-and-log: a surface change does not merely raise an alert — it transitions the server to a *pending-re-approval* state that **blocks invocation** until an operator re-approves, converting post-approval mutation ("rug-pull") from something observed into something stopped. Third, **structural-diff-as-authority**: the re-approval trigger is computed from a *structural* diff of the advertised surface (a validation-rule-aware comparison of schemas and declared shapes, plus a comparison of description text), so that a *semantically meaningful* change — a new tool, a widened or re-typed parameter, a rewritten description, a changed return shape — forces re-approval, while a change with no authority-relevant effect can be classified as non-mutating, making the enforcement precise rather than brittle. Fourth, **import-time semantic-intent screening**: before an operator is asked to approve a surface, the advertised descriptions and schemas are passed through a semantic-intent classifier that flags injection-shaped or instruction-bearing intent embedded in the metadata, so that *day-one* poison (present at first import, which hashing alone cannot catch) is surfaced to the operator at approval time. The mechanism makes a malicious or compromised imported MCP server non-catastrophic with respect to its tool surface, complementing — and distinct from — runtime inspection of the tool's *results*.
**Keywords:** Model Context Protocol; MCP security; tool poisoning; rug-pull; tool-surface mutation; manifest pinning; capability envelope; tool-surface hash; block-and-re-approve; structural diff; schema diff; semantically-meaningful change; SPIFFE identity; workload identity; provenance binding; tool-name shadowing; namespace qualification; append-only ledger; approval provenance; semantic-intent classifier; prompt injection in tool metadata; day-one poison; agentic AI security; security gateway; Open Policy Agent; OPA; re-approval workflow; capability projection.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Rosado, Tiago, "Provenance-Bound Capability-Envelope Pinning of an MCP Tool Surface with Block-and-Re-Approve Enforcement on Semantically-Meaningful Structural Change and Import-Time Semantic-Intent Screening", Technical Disclosure Commons, ()
https://www.tdcommons.org/dpubs_series/10604