Why We Start Dumb
Why your first RAG should be the simplest thing that barely works, and why that is the whole point.
Why We Start Dumb
This track has one job. By the end of the six modules, you ship a Retrieval-Augmented Generation system over your own documents that is measurably better than the one you build in Module 1. The baseline has to exist before anything else, because it is the number you are going to beat.
So Module 1 is deliberately unimpressive.
RAG (Retrieval-Augmented Generation) is a pattern where an LLM answers questions by first retrieving relevant passages from your documents, then generating an answer grounded in what was retrieved. It is how you get accurate answers from a model about information the model was not trained on.
The Naive Version
Fixed-size chunks. One embedding model. Cosine top-k. Stuff everything into a Claude prompt and ask it to cite sources. No reranker. No query rewriter. No evaluation harness. No caching. Nothing that would make a good demo.
You will watch this RAG give a confidently wrong answer before the hour is out. That is the lesson. Every later module is motivated by one of the failures you see here. Skip the baseline and you will spend the rest of the track adding techniques without knowing which problem each one solves.
The baseline will be wrong a lot. That is by design. If your baseline is already "good enough," you will never feel the forces that motivate chunking strategies, rerankers, query rewriting, and proper evals. The track falls apart without a crappy starting point.
What You Will Ship By the End
By Module 6, you will have:
- A working RAG pipeline over your own corpus
- Documented retrieval metrics (Recall@5, MRR, judge score) for five different configurations
- An evaluation harness that regression-tests future changes
- A cost log showing every dollar spent, per call
- A deployment running behind a reverse proxy with observability
Module 1 will have: a pipeline that answers some questions correctly. That is it.
What Makes This Track Different
Most AI courses teach RAG as a recipe. You read about chunking, embeddings, and rerankers, then you are supposed to believe they matter.
This track teaches RAG as a failure investigation. You build the simplest thing, run it against a boss challenge designed to break it, and feel the failures. Then you add each technique — chunking strategy, hybrid search, reranking, query transformation, evals — because a specific failure forced you to.
Reading about a technique and building the technique are not the same learning event. The reading tells you what exists. Building tells you when to reach for it. This track skips the first and insists on the second.
What You Need Before You Start
- Comfort reading Python and running it locally
- A corpus of your own documents you actually want to query (anything .md or .txt to start)
- API keys for Anthropic (generation) and either Voyage or OpenAI (embeddings)
- An hour. Just one, to get the baseline running.
The build environment for this track lives in a separate repository: github.com/portofcams/bluewave-school. You clone it locally, run the pipeline against your own documents, and progress through modules by passing automated ship gates. Each lesson here walks you through what to build; the repo is where you actually build it.
The Boss Challenge
At the end of Module 1, your baseline will face a boss challenge: ten adversarial questions designed to break the naive pipeline. Cross-chunk synthesis. Definitional drift. Multi-hop reasoning. Needle-in-haystack. Negation.
Your baseline should score below fifty percent on the boss. That number becomes the scoreboard every future module has to beat.
Ready to build the worst RAG you will ever be proud of?
Exercises
0/3What does RAG stand for, and what is the core idea?
Why is the Module 1 RAG deliberately unimpressive?
Pick a topic you know well — your job, a hobby, a field you have read deeply in. If you built a RAG over documents in that topic, what are two questions where the baseline would probably get the right answer, and two where it would probably fail?
Hint: Think about where the answer lives in one obvious place (baseline-friendly) versus where it requires synthesizing across multiple sources or understanding jargon drift (baseline-hostile).