How Large Language Models Work
Peek under the hood of ChatGPT and Claude — no PhD required.
How Large Language Models Work
You don't need to be a data scientist to understand LLMs. Here's the simplified version.
Training: Reading the Internet
An LLM is trained by reading billions of pages of text — books, websites, articles, code, conversations. During training, it learns patterns: what words tend to follow other words, how sentences are structured, what facts are commonly stated.
It doesn't memorize pages verbatim. Instead, it builds a statistical model of language — a compressed understanding of how humans communicate.
The Transformer Architecture
The key innovation is called attention. When processing your prompt, the model pays "attention" to which words relate to other words. In the sentence "The cat sat on the mat because it was tired," the model learns that "it" refers to "the cat," not "the mat."
This ability to track relationships across long passages is what makes modern AI so capable.
How It Generates Responses
When you send a message, the AI:
- 1.Tokenizes your input (breaks it into pieces called tokens)
- 2.Processes the tokens through dozens of layers of the neural network
- 3.Predicts the most likely next token
- 4.Repeats — generating one token at a time until the response is complete
Each token generation considers the entire conversation so far. That's why AI can stay on topic across long chats.
Important Mental Model
The AI doesn't "know" things the way you do. It has learned statistical associations. When it says "Paris is the capital of France," it's not recalling a fact from memory — it's producing the tokens that most naturally follow the pattern of the conversation.
This is why AI can be confidently wrong (hallucination) — the statistically likely response isn't always the correct one.
Parameters and Model Size
- •GPT-4: Estimated ~1.7 trillion parameters
- •Claude: Undisclosed but similar scale
- •Parameters are the "knobs" the model adjusts during training
More parameters generally means more capability, but also more cost to run.
Exercises
0/3What is a "token" in the context of LLMs?
Why do AI models sometimes "hallucinate" (state incorrect things confidently)?
Ask an AI to explain how it generates text, then compare its answer to what you learned in this lesson. Did it get anything wrong or oversimplified?
Hint: Pay attention to whether it claims to "understand" or "think" — those are simplifications.