School/RAG & Evals — Ship the Stack/Module 1 — Baseline RAG
3/4
Wave 930 minintermediate

Build It Yourself

Clone the repo, run the pipeline, pass the ship gate.

Build It Yourself

The reading is over. Now you build.

The build environment for this track lives in github.com/portofcams/bluewave-school — a separate repository with the FastAPI backend, the baseline RAG primitives, the seed corpus, and the automated ship gate. You clone it, set up a local environment, and run it against the fixtures.

Key Concept

Why a separate repo? Because the build environment is a real Python app with a real RAG pipeline — Chroma, embeddings, Claude calls. It does not belong inside a Next.js marketing/school site. The repo is the thing you actually run; these lessons are the thing you actually read.

What You Will Do

Five commands. If your environment is ready, this takes twenty minutes end to end.

1. Clone

bash
git clone git@github.com:portofcams/bluewave-school.git
cd bluewave-school

2. Add Your Keys

bash
cp .env.example .env

Edit .env and fill in:

  • ANTHROPIC_API_KEY — your Anthropic key for Claude Sonnet 4.6
  • VOYAGE_API_KEY OR OPENAI_API_KEY — at least one embedding provider
  • ADMIN_PASSWORD — any string; this gates the /ask and /corpus pages

3. Set Up

bash
scripts/setup.sh

This creates a Python 3.12 virtualenv, installs all dependencies (chromadb, anthropic, voyageai, fastapi, etc.), and runs npm install for the Astro frontend.

4. Ingest the Seed Corpus

bash
scripts/seed-ingest.sh

This ingests the bundled demo corpus — a handful of markdown files covering Anthropic docs, stack docs, and Hawaii prevailing wage rules. You will query against this first, before switching to your own documents.

5. Run the Ship Gate

bash
scripts/ship-gate.sh

This runs the Module 1 ship gate against five fixture questions. You should see five PASS lines. If any fail, read the error carefully — the gate is loud on purpose.

Pro Tip

If the ship gate fails with "No embedding provider configured," your .env is not loaded. Check that source .env or set -a; source .env; set +a runs before the script — the provided scripts handle this automatically.

Browse It

Start the dev server:

bash
scripts/dev.sh

The FastAPI app comes up on port 8010, the Astro frontend on port 4321.

  • Open http://localhost:4321/login and enter the ADMIN_PASSWORD you set
  • Navigate to /ask and ask: "What is the default TTL for prompt caching in the Anthropic API?"
  • Inspect the answer, the cited sources, and the metrics (retrieval_ms, generation_ms, cost_usd)

You just ran a full RAG pipeline end to end. Congratulations — this is the worst version of it you will ever ship.

Ingest Your Own Corpus

The seed corpus is a demo. Your own documents are the real thing.

bash
python -m app.rag.cli ingest /path/to/your/docs --corpus mine

Then query against corpus=mine in the /ask page or via CLI:

bash
python -m app.rag.cli ask "your question here" --corpus mine

The Boss Challenge

Once the ship gate passes and you have ingested your own corpus, run the boss challenge:

bash
python content/module-1-baseline/ship-gate.py \
  --learner-module reference.module_1 \
  --fixtures content/module-1-baseline/fixtures-boss.jsonl

The boss has ten adversarial questions across five categories: cross-chunk synthesis, definitional drift, multi-hop, needle-in-haystack, and negation. Baseline RAG typically scores between 30% and 50%. Write down your number. It is the scoreboard.

Watch Out

Do not skip the boss challenge. The score is the only thing that makes future modules honest. When Module 3 claims reranking improves retrieval, you check it against the exact same boss on the exact same corpus. No baseline number, no comparison, no learning.

If You Get Stuck

  • Ship gate output is the first thing to read — errors are structured
  • The data/costs.jsonl file shows every API call and its cost
  • Open a GitHub issue on the bluewave-school repo with the ship-gate output pasted in

You Are Done With Module 1

Once your ship gate passes 5/5 and you have recorded a boss-challenge baseline, Module 1 is complete. Module 2 starts with: "your chunking is wrong in three specific ways."

Exercises

0/3
Quiz+5 XP

Why is the build environment in a separate repository instead of embedded in the school?

Quiz+5 XP

What does the Module 1 ship gate check?

Reflection+15 XP

After running the boss challenge against your own corpus, record your score and pick one of the ten questions the baseline got wrong. What category of failure was it (cross-chunk / drift / multi-hop / needle / negation) and why did the baseline fail?

Hint: Be specific. "Retrieval was bad" is not enough — was the right chunk missing from top-k, was it present but the LLM ignored it, did the query embed to the wrong region?