Tokens, Context Windows & Temperature

The three numbers that control everything about AI behavior.

Tokens, Context Windows & Temperature

These three concepts are the invisible machinery behind every AI interaction. You do not need to memorize the technical details, but understanding what they are -- and how they affect your results -- will make you a dramatically better AI user. Think of this lesson as learning the controls of a car before you start driving.

Key Concept

Tokens, context windows, and temperature are the three numbers that govern every AI conversation. Tokens determine cost and length. The context window determines how much the AI can "remember." Temperature determines how creative or precise the output is.

Tokens: The Currency of AI

A token is a piece of text -- roughly 3/4 of a word in English. The sentence "I love artificial intelligence" is about 5 tokens. Longer or more unusual words get split into multiple tokens, while short common words are often a single token.

Why tokens matter in practice:

Cost: API pricing is per token (input + output). A lengthy conversation can cost real money if you are using the API directly.
Speed: More tokens means slower responses. If you have ever noticed the AI taking longer on a complicated request, token count is usually the reason.
Limits: Every model has a maximum token limit for a single interaction. Hit the ceiling, and the AI starts "forgetting" the start of your conversation.

Quick estimates to keep in your head:

1 token is roughly 4 characters in English
100 tokens is roughly 75 words
1,000 tokens is roughly 750 words (about 1.5 pages of a document)
A typical back-and-forth conversation uses 2,000-5,000 tokens

Example

Imagine you paste a 20-page contract into Claude and ask it to summarize the key terms. That contract is roughly 13,000 tokens of input. Claude's response might be another 500 tokens. Total: around 13,500 tokens consumed from the context window for that single exchange. If you then ask follow-up questions, each new message adds more tokens to the running total.

Context Window: AI's Working Memory

The context window is the total number of tokens the AI can "see" at once -- your entire conversation history (every message you sent and every response the AI gave) all has to fit inside this window.

Model	Context Window
GPT-3.5	16K tokens (~12K words)
GPT-4o	128K tokens (~96K words)
Claude 3.5	200K tokens (~150K words)
Gemini 1.5	Up to 1M tokens

When you exceed the context window, the AI literally cannot see the beginning of your conversation anymore. It is like having a conversation where the other person gradually forgets everything you said earlier. You might reference something from your first message, and the AI will have no idea what you are talking about.

Pro Tip

Start new conversations for new topics. Do not let context get polluted with unrelated back-and-forth. If you have been going back and forth on a marketing email and then want to switch to a coding question, open a fresh chat. You will get better results because the AI is not distracted by irrelevant context, and you will conserve your token budget.

Temperature: Creativity vs Precision

Temperature controls the randomness of the AI's word choices. It is a number that typically ranges from 0.0 to 2.0, and it fundamentally changes the character of the output.

0.0: Deterministic -- same input gives the same output every time. Best for facts, code, and data extraction where you want consistency.
0.7: Balanced -- the default setting for most chat tools. Good for general use where you want helpful, varied responses.
1.0+: Creative -- more varied and unexpected word choices. Good for brainstorming, creative fiction, generating alternative phrasings.
2.0: Maximum randomness -- often produces incoherent or bizarre output. Rarely useful in practice.

Most chat interfaces (ChatGPT, Claude, Gemini) do not let you set temperature directly -- they use a sensible default behind the scenes. But if you use the API or tools like Cursor or the OpenAI Playground, you will see temperature as a setting you can adjust.

Watch Out

Low temperature does not mean "more accurate." It means "more predictable." An AI can be consistently wrong at temperature 0.0 just as easily as it can be at 1.0. Temperature controls randomness in word selection, not factual accuracy. Always verify important claims regardless of the temperature setting.

Exercises

0/3

Quiz+5 XP

Approximately how many words can fit in Claude's 200K token context window?

Quiz+5 XP

What temperature setting would you use for generating a creative short story?

Fill in the Blank+5 XP

When you exceed the context window, the AI loses the ability to see the _______ of your conversation.

Previous Lesson

The AI Tool Landscape

Next Lesson

Your First AI Conversation