How LLMs Actually Work

A Visual Guide for Beginners

1. It's Not a Brain, It's a Predictor

The Concept

The
bird
flew
in
the
sky.
10%
soup
85%
sky
5%
car
"Imagine you’re texting a friend and you type 'I’m going to the...' Your phone suggests 'store,' 'movies,' or 'gym.' That is exactly what a Large Language Model does. It doesn't 'know' facts; it calculates the probability of what word comes next based on billions of examples."

2. Words Become Numbers

Tokenization & Training

Tokenization
Chat
8921
GPT
330
learns
1045
Training
Internet Data
📚
🌐
💻
📝
📚
🌐
Ocean is
"Computers can't read. First, they chop text into 'Tokens' and turn them into numbers. Then, they play a game of 'Fill in the Blank' billions of times on text from the internet (books, Wikipedia, code) until they master human grammar and logic."

3. The Secret Sauce: Attention

Context Awareness

The
animal
didn't
cross
the
street
because
it
was
tired.
"If I say 'The animal didn't cross the street because **it** was too tired'... what is 'it'? The street? No. The model uses 'Attention' to look back at the whole sentence, spot the word 'tired,' and realize 'it' must be the animal. This is how it stays on topic."

4. Human Tuning (RLHF)

Making it Helpful

Prompt: "Write a reply"
OPTION A
"DO IT YOURSELF!"
👎
OPTION B
"Here is the info."
👍
👆
"Why isn't the model rude or crazy like the raw internet? Humans graded its homework. We gave thumbs up to helpful answers and thumbs down to toxic ones. This feedback loop (RLHF) teaches the model to be polite and helpful."