What is a large language model, really?
Strip the hype. Learn what an LLM actually does, token by token.
What is a large language model, really?
When people say "AI" in 2026, they almost always mean a large language model. It is not magic. It is not a brain. It is a function that takes a sequence of tokens and produces a probability distribution over the next token. That's it. Everything else — the sparkling UX, the persuasive answers, the spooky moments of competence — is built on top of that single operation.
The three things worth internalizing
- Tokens, not words. Language models see text as pieces. "Scholarus" might be three tokens. Emoji and code get tokenized differently. Your bill and your latency are measured in tokens.
- Next-token prediction scales surprisingly far. Train a big enough network on enough text to minimize next-token loss and you get behaviors that look like reasoning, summarization, translation — without anyone programming them in.
- Context is everything. The model has no memory between calls. Anything it "knows" about your task has to be in the prompt, the system message, or a tool result you feed back in.
What this means for you
If you build with LLMs, most of your craft is about shaping the input so the model's statistical best guess lines up with what you actually want. That's why prompt engineering isn't going away, even as models get better — the interface between your intent and the model's behavior is always there.
The next lesson zooms out: how does a model like this get trained in the first place, and why does that matter when you're choosing between GPT, Claude, Gemini, or an open-source checkpoint?