Tokens, Context Windows & Temperature
The three numbers that control everything about AI behavior.
Tokens, Context Windows & Temperature
These three concepts control how AI works. Understanding them makes you a dramatically better AI user.
Tokens: The Currency of AI
A token is a piece of text — roughly 3/4 of a word in English. The sentence "I love artificial intelligence" is about 5 tokens.
Why tokens matter:
- •Cost: API pricing is per token (input + output)
- •Speed: More tokens = slower responses
- •Limits: Every model has a maximum token limit
Quick estimates:
- •1 token ≈ 4 characters in English
- •100 tokens ≈ 75 words
- •1,000 tokens ≈ 750 words (about 1.5 pages)
Context Window: AI's Working Memory
The context window is the total number of tokens the AI can "see" at once — your prompt plus its response combined.
| Model | Context Window |
|---|---|
| GPT-3.5 | 16K tokens (~12K words) |
| GPT-4o | 128K tokens (~96K words) |
| Claude 3.5 | 200K tokens (~150K words) |
| Gemini 1.5 | Up to 1M tokens |
When you exceed the context window, the AI literally can't see the beginning of your conversation anymore. It's like having a conversation where the other person forgets what you said 20 minutes ago.
Pro tip: Start new conversations for new topics. Don't let context get polluted.
Temperature: Creativity vs Precision
Temperature controls randomness in responses (0.0 to 2.0):
- •0.0: Deterministic — same input gives same output. Best for facts, code, data.
- •0.7: Balanced — the default for most tools. Good for general use.
- •1.0+: Creative — more varied and unexpected. Good for brainstorming, fiction.
- •2.0: Maximum chaos — often incoherent.
Most chat interfaces don't let you set temperature directly, but API users and developers should know this.
Exercises
0/3Approximately how many words can fit in Claude's 200K token context window?
What temperature setting would you use for generating a creative short story?
When you exceed the context window, the AI loses the ability to see the _______ of your conversation.