How LLMs Actually Work - A Visual Guide

1. It's Not a Brain, It's a Predictor

The Concept

The

bird

flew

in

the

sky.

10%

soup

85%

sky

5%

car

"Imagine you’re texting a friend and you type 'I’m going to the...' Your phone suggests 'store,' 'movies,' or 'gym.' That is exactly what a Large Language Model does. It doesn't 'know' facts; it calculates the probability of what word comes next based on billions of examples."

2. Words Become Numbers

Tokenization & Training

Tokenization

Chat

8921

GPT

330

learns

1045

Training

Internet Data

📚
🌐
💻
📝
📚
🌐

Ocean is

"Computers can't read. First, they chop text into 'Tokens' and turn them into numbers. Then, they play a game of 'Fill in the Blank' billions of times on text from the internet (books, Wikipedia, code) until they master human grammar and logic."

3. The Secret Sauce: Attention

Context Awareness

The

animal

didn't

cross

the

street

because

it

was

tired.

"If I say 'The animal didn't cross the street because **it** was too tired'... what is 'it'? The street? No. The model uses 'Attention' to look back at the whole sentence, spot the word 'tired,' and realize 'it' must be the animal. This is how it stays on topic."

4. Human Tuning (RLHF)

Making it Helpful

Prompt: "Write a reply"

OPTION A

"DO IT YOURSELF!"

👎

OPTION B

"Here is the info."

👍

👆

"Why isn't the model rude or crazy like the raw internet? Humans graded its homework. We gave thumbs up to helpful answers and thumbs down to toxic ones. This feedback loop (RLHF) teaches the model to be polite and helpful."