AI School: Fundamentals

How Do Large Language Models Actually Work?

A clear, visual explanation of how AI systems like ChatGPT and Claude generate text. No PhD required - just analogies, diagrams, and honest explanations of what these systems can and cannot do.

Start Learning AI Glossary

The Basics

What is a Large Language Model?

A Large Language Model (LLM) is a type of artificial intelligence trained to understand and generate human language. Think of it as a very sophisticated autocomplete system.

When you type on your phone and it suggests the next word, that is a tiny language model. LLMs like GPT-4 and Claude are the same idea, but trained on billions of text examples and with billions of adjustable parameters.

The key insight: LLMs do not truly understand language. They are extremely good at predicting what text should come next based on patterns they learned during training.

The Prediction Machine Analogy

Imagine you read millions of books and conversations. After a while, you would get very good at predicting what words typically come next in any sentence.

Example:

"The cat sat on the..."

You would probably guess "mat" because that pattern appears frequently in English text.

LLMs do this at massive scale, considering all the context to make remarkably good predictions about what text should come next.

Step by Step

How Text Generation Works

When you ask an LLM a question, here is what actually happens behind the scenes. Click each step to learn more.

User Input

You type a question or prompt

Details

The LLM receives your text as a string of characters. This could be a question, instruction, or any text you want it to respond to.

Example

"What is the capital of France?"

Tokenization

Text is broken into tokens

Details

The model cannot read letters directly. It splits text into "tokens" - pieces that might be words, parts of words, or single characters.

Example

["What", " is", " the", " capital", " of", " France", "?"]

Neural Processing

Tokens flow through the neural network

Details

Each token becomes a number (embedding) that travels through billions of mathematical operations across many layers of the network.

Example

Billions of matrix multiplications and attention calculations

Next Token Prediction

Model predicts the most likely next token

Details

Based on everything it learned during training, the model calculates probability scores for every possible next token.

Example

"Paris" (92% likely), "The" (3%), "France" (2%)...

Token Selection

A token is chosen and added to the response

Details

The model selects a token (influenced by temperature settings) and this becomes part of the response. Process repeats until complete.

Example

"The capital of France is Paris."

Understanding Tokens

Tokens are the building blocks of LLM processing. Here is how the sentence "Hello, how are you?" might be tokenized:

Tokenization Example

1// Example tokenization
2Input: "Hello, how are you?"
3
4Tokens: ["Hello", ",", " how", " are", " you", "?"]
5
6Token IDs: [9906, 11, 703, 527, 499, 30]
7
8// Each token maps to a number the model can process
9// Notice: spaces often attach to the following word

Different models use different tokenization schemes. Some split more aggressively (more tokens), others keep larger chunks together. This affects both cost and capability.

Key Concept

Temperature: Controlling Creativity

When the LLM predicts the next token, it does not always pick the most likely option. The temperature setting controls how much randomness to allow.

Low temperature means picking the most probable tokens almost every time (safe, predictable). High temperature means sometimes choosing less likely tokens (creative, unpredictable).

Real world impact: If you ask the same question twice with high temperature, you will get different answers. With low temperature, answers will be nearly identical.

The model almost always picks the highest probability token. Responses are consistent and predictable.

Best for: Factual questions, coding, data extraction

A balance between creativity and consistency. Some variety while staying mostly on topic.

Best for: General conversation, explanations, most tasks

Lower probability tokens have a better chance of being selected. More creative but less predictable.

Best for: Creative writing, brainstorming, generating variety

Memory Limits

The Context Window

LLMs have a limited "memory" for each conversation. This is the context window - how many tokens the model can consider at once.

Small Context

~6,000 words. Good for simple Q&A but struggles with long documents or conversations.

128K

Standard (GPT-4, Llama 3)

~100,000 words. Can handle long documents, extended conversations, and complex context.

200K+

Large (Claude, Gemini)

~150,000+ words. Can process entire books, large codebases, or very long research papers.

Why Context Window Matters

Longer context = more information the model can reference
When exceeded, oldest messages are "forgotten"
Larger context windows typically cost more per query
Your input + the response both count toward the limit

Important Limitations

Why LLMs Make Mistakes

LLMs are powerful but not perfect. Understanding their limitations helps you use them effectively.

Training Data Gaps

The model may have incomplete or outdated information. It was trained on data up to a certain date and may not know recent events.

Pattern Matching Gone Wrong

LLMs work by recognising patterns. Sometimes they generate plausible-sounding text that follows learned patterns but is factually incorrect.

No Real Understanding

LLMs do not truly "understand" information like humans do. They predict likely text sequences without verifying factual accuracy.

What is a "Hallucination"?

When an LLM generates confident-sounding but factually incorrect information, we call it a "hallucination". This is not the AI lying - it is generating text that follows learned patterns without verifying facts.

Example hallucination:

"The Sydney Opera House was designed by Frank Lloyd Wright and completed in 1959."

(Actually designed by Jorn Utzon, completed 1973)

How to Reduce Hallucinations

Use RAG to ground responses in verified data
Ask for sources and verify them independently
Use lower temperature for factual queries
Provide context within your prompt when possible

Model Landscape

Different LLMs Compared

Not all LLMs are the same. Here is a neutral comparison of leading models as of 2025.

Feature	GPT-4	Claude	Llama 3	Gemini
Open Source
Context Window	128K tokens	200K tokens	128K tokens	1M+ tokens
Best For	General tasks	Analysis, writing	Self-hosting	Multimodal
Provider	OpenAI	Anthropic	Meta	Google
API Available

Open Source Models

Models like Llama can be downloaded and run on your own hardware. You control the data, there are no API costs per query, but you need technical expertise and hardware.

Best for: Privacy-sensitive use cases, high volume applications, customization

Closed Source Models

Models like GPT-4 and Claude are accessed through APIs. Easy to use, always up-to-date, but you pay per query and data leaves your systems.

Best for: Quick integration, smaller volumes, wanting the latest capabilities

Summary

Key Takeaways

LLMs predict text one token at a time based on learned patterns

They do not truly "understand" - they recognise statistical patterns in language

Temperature controls how creative vs deterministic the output is

Context window limits how much text the model can consider at once

Hallucinations happen because LLMs generate plausible text, not verified facts

Different LLMs have different strengths - there is no single "best" model

FAQ

Frequently Asked Questions

What does LLM stand for?

LLM stands for Large Language Model. "Large" refers to the billions of parameters (adjustable values) in the neural network. "Language Model" describes its function: predicting and generating human language.

How do LLMs learn?

LLMs learn through a process called training, where they read billions of text examples from the internet, books, and other sources. During training, they adjust billions of internal parameters to get better at predicting what text comes next. This process requires massive computing power and can take weeks or months.

What is the difference between GPT and an LLM?

GPT (Generative Pre-trained Transformer) is a specific type of LLM created by OpenAI. LLM is the general category that includes GPT, Claude, Llama, Gemini, and many others. It is like how "car" is the category and "Tesla" is a specific brand.

Why do LLMs sometimes make things up?

LLMs do not actually "know" facts - they predict likely text based on patterns learned during training. When they encounter questions about topics not well-covered in training data, or when patterns are ambiguous, they generate plausible-sounding but incorrect text. This is called "hallucination".

What is a context window?

The context window is how much text an LLM can "see" at once. It includes your input plus the response being generated. A 128K token context window means the model can process roughly 100,000 words at once. Larger context windows allow for longer conversations and documents.

Can LLMs learn from conversations?

Standard LLMs do not learn from individual conversations - they only use the training data they were initially trained on. However, they do remember the current conversation within the context window. Some systems use techniques like fine-tuning or RAG to give LLMs access to updated information.

What is the difference between open source and closed source LLMs?

Open source LLMs (like Llama) make their model weights publicly available, allowing anyone to download, modify, and run them. Closed source LLMs (like GPT-4 and Claude) only offer access through APIs - you cannot see or modify the underlying model.

Continue Learning

Now that you understand how LLMs work, explore these related topics.

What is RAG?

Learn how Retrieval-Augmented Generation gives LLMs access to your business data.

Read Guide

AI Glossary

Searchable reference for all AI terminology you might encounter.

Browse Terms

Chatbot Architecture

Understand how modern AI chatbots are built and how to choose the right approach.

Explore

Free consultation

Ready to Put LLMs to Work for Your Business?

Now that you understand how LLMs work, let us show you how they can transform your customer service. Free consultation with zero jargon.

Get Free Assessment View Our AI Solutions

30-day money-back+44 333 041 3357No contracts

How Do Large Language Models Actually Work?

What is a Large Language Model?

The Prediction Machine Analogy

How Text Generation Works

User Input

Details

Example

Tokenization

Details

Example

Neural Processing

Details

Example

Next Token Prediction

Details

Example

Token Selection

Details

Example

Understanding Tokens

Temperature: Controlling Creativity

Low Temperature (0.0 - 0.3)

Medium Temperature (0.5 - 0.7)

High Temperature (0.8 - 1.0+)

The Context Window

Small Context

Standard (GPT-4, Llama 3)

Large (Claude, Gemini)

Why Context Window Matters

Why LLMs Make Mistakes

Training Data Gaps

Pattern Matching Gone Wrong

No Real Understanding

What is a "Hallucination"?

How to Reduce Hallucinations

Different LLMs Compared

Open Source Models

Closed Source Models

Key Takeaways

Frequently Asked Questions

What does LLM stand for?

How do LLMs learn?

What is the difference between GPT and an LLM?

Why do LLMs sometimes make things up?

What is a context window?

Can LLMs learn from conversations?

What is the difference between open source and closed source LLMs?

Continue Learning

What is RAG?

AI Glossary

Chatbot Architecture

Ready to Put LLMs to Work for Your Business?