School/AI Foundations/Understanding AI
2/3
Wave 112 minbeginner

How Large Language Models Work

Peek under the hood of ChatGPT and Claude — no PhD required.

How Large Language Models Work

You don't need to be a data scientist to understand LLMs. Here's the simplified version.

Training: Reading the Internet

An LLM is trained by reading billions of pages of text — books, websites, articles, code, conversations. During training, it learns patterns: what words tend to follow other words, how sentences are structured, what facts are commonly stated.

It doesn't memorize pages verbatim. Instead, it builds a statistical model of language — a compressed understanding of how humans communicate.

The Transformer Architecture

The key innovation is called attention. When processing your prompt, the model pays "attention" to which words relate to other words. In the sentence "The cat sat on the mat because it was tired," the model learns that "it" refers to "the cat," not "the mat."

This ability to track relationships across long passages is what makes modern AI so capable.

How It Generates Responses

When you send a message, the AI:

  1. 1.Tokenizes your input (breaks it into pieces called tokens)
  2. 2.Processes the tokens through dozens of layers of the neural network
  3. 3.Predicts the most likely next token
  4. 4.Repeats — generating one token at a time until the response is complete

Each token generation considers the entire conversation so far. That's why AI can stay on topic across long chats.

Important Mental Model

The AI doesn't "know" things the way you do. It has learned statistical associations. When it says "Paris is the capital of France," it's not recalling a fact from memory — it's producing the tokens that most naturally follow the pattern of the conversation.

This is why AI can be confidently wrong (hallucination) — the statistically likely response isn't always the correct one.

Parameters and Model Size

  • GPT-4: Estimated ~1.7 trillion parameters
  • Claude: Undisclosed but similar scale
  • Parameters are the "knobs" the model adjusts during training

More parameters generally means more capability, but also more cost to run.

Exercises

0/3
Quiz+5 XP

What is a "token" in the context of LLMs?

Quiz+5 XP

Why do AI models sometimes "hallucinate" (state incorrect things confidently)?

Prompt Challenge+15 XP

Ask an AI to explain how it generates text, then compare its answer to what you learned in this lesson. Did it get anything wrong or oversimplified?

Hint: Pay attention to whether it claims to "understand" or "think" — those are simplifications.