LLM-in-the-loop scope generation: prompt caching, structured outputs, and the deterministic fallback ladder
John C. Thomas
Founder, BlueWave Projects
The AI scope generator in BlueWave Projects is the single most-used feature in the product. A contractor opens an iPhone, walks a room, drops the scan into the portal with a couple of photos and a note, and gets back a phase-by-phase scope of work — labor, materials, contingency, tax gross-up — in about 60 seconds.
Under the hood it's the most over-engineered feature in the whole system. Most of that work is *not* the LLM call. It's the deterministic rails around the LLM call. Here's the full pipeline, told without marketing.
The input: parametric RoomPlan, not pixels
iOS' RoomPlan returns a parametric JSON of the captured room — walls, openings, fixtures, objects, with positions and dimensions. Not a point cloud. Not a mesh. A structured tree of named primitives. That structure is the only reason scope generation is possible at AI-affordable cost.
The iOS app uploads:
The server flattens RoomPlan into a text fixture: "Room: 12' 4" × 9' 8", 92 sqft. Wall A: window opening 4' 2" × 3' 0" centered. Door: 2' 8" interior, north wall." Photos get described by Claude with vision in a single pre-pass and the descriptions are cached.
Prompt design: pinned context, fluid tail
The full system prompt is around 8K tokens and pinned with cache_control: ephemeral. It contains:
That structure means parts 1-4 (≈ 7.8K tokens) hit the cache on every call. Only the last ~300-500 tokens cost full input rates. At Claude API pricing today that brings the per-call input cost down by roughly 90%.
Structured outputs: Pydantic schemas, not regex
Every scope call uses Anthropic's tool-use / structured-output mode with a Pydantic v2 schema. The model gets a tool definition that exactly matches the schema:
class LineItem(BaseModel):
phase: Phase # enum: demo | framing | electrical | plumbing | finish
description: str
labor_low: float
labor_high: float
material_low: float
material_high: float
contingency_pct: float
class Scope(BaseModel):
summary: str
phases: list[Phase]
items: list[LineItem]
tax_gross_up_pct: floatThe model can't return malformed JSON. It can't omit required fields. It can't return a phase that isn't in the enum. Pydantic validates on receipt, raises on mismatch, and the retry loop knows what to send back to the model.
This is the single biggest cost saving in the pipeline. Before structured outputs we lost ~8% of calls to bad JSON. Now it's effectively zero.
The deterministic fallback ladder
The first version of the scope generator was "call Claude, return the result." It was 95% magic and 5% disaster. The disasters were the part that got the bug reports.
Now there's a four-rung ladder:
Rung 1: Schema validation. Output passes Pydantic? Ship.
Rung 2: Retry with the validation error. Pydantic raised on field X? Send the error back to Claude with "fix this field, leave the rest alone" and try again. Maximum two retries.
Rung 3: Retry with a smaller context. The model can flake when the prompt is at the edge of its attention. Truncate the worked examples from 4 to 2, retry once.
Rung 4: Template + narrative inserts. If three model attempts have failed, fall back to a deterministic phase-skeleton (demo, framing, electrical, plumbing, finish) with template ranges *for this tenant's pricing context* and ask Claude only for the human-readable summary and per-phase rationale. Ranges come from data; prose comes from the model. Worst-case scope still ships.
In production we hit rung 1 on ~94% of calls, rung 2 on ~5%, rung 3 on <1%, and rung 4 on virtually nothing. The ladder exists so the failure mode is "less personalized scope," never "no scope, sorry, try again."
Multi-tenant prompt isolation
Every call is scoped to a tenant. The tenant's pricing context is fetched fresh per call, the tenant_id is in the system prompt, and the output is written to a tenant-scoped table. There is no single shared cache key that could leak one tenant's pricing into another tenant's output. The cache key is (system_prompt_version, tenant_id), not just system_prompt_version.
This sounds paranoid. It's necessary. Hawaii has fewer than 5,000 active GCs. If two competitors land on the same tenant on the same shared cache, it would be a *small* number of people who get hurt — but it would be a problem you'd never recover from.
Cost + latency numbers in production
Today's averages over the last 1,000 calls:
A contractor's time billed at $120/hr would value an 8-minute saving at $16. The scope generator runs at ~$0.04. The economics are obscene in our favor and we sell the feature on time-saved, not cost-of-token.
What I'd tell another engineer building this
The scope generator looks like magic on a 60-second demo. Most of what makes it work is the unmagical machinery on either side of the Claude call.
More from BlueWave
RoomPlan vs Matterport vs Polycam: which one belongs in your contractor's toolkit?
8 min
Hawaii complianceHawaii GET tax for contractors: how the §237-13(3)(B) sub-deduction actually works
6 min
WorkflowHow to scope a renovation in 60 seconds (and why your hand-written estimate keeps losing jobs)
5 min