Chapter 8

Generative AI & Large Language Models

The technology behind ChatGPT, image generators, and the new wave of creative AI — how it works and how to use it responsibly.

13 min read

Generative AI refers to systems that create new content — text, images, music, code, and more. Large Language Models (LLMs) like GPT-4, Claude, and Gemini are the most visible examples, but the field also includes image generators (DALL-E, Stable Diffusion) and multimodal models that handle text and images together.

How LLMs work

At their core, LLMs are next-token predictors trained on massive text datasets. Given a sequence of tokens, the model outputs a probability distribution over what comes next. The magic comes from scale: billions of parameters, trained on trillions of tokens, produce emergent capabilities like reasoning, coding, and creative writing.

Pretraining — the model reads vast amounts of internet text and learns language patterns.
Fine-tuning — the model is refined on curated data for specific tasks or safety.
RLHF (Reinforcement Learning from Human Feedback) — human preferences guide the model toward helpful, harmless responses.

Scale drives capability

GPT-3 has 175 billion parameters. GPT-4 is believed to be much larger. Each parameter is a learned number that helps the model predict the next token better.

Prompt engineering

The way you phrase your input (prompt) dramatically affects the output quality. A few proven techniques:

Be specific — "Explain gradient descent in 3 sentences for a beginner" beats "explain gradient descent".
Provide examples — show the model the format you want (few-shot prompting).
Assign a role — "You are an expert Python tutor" sets the tone and expertise level.
Chain of thought — ask the model to "think step by step" for complex reasoning tasks.

python

from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a concise AI tutor. Respond in bullet points."},
        {"role": "user", "content": "What are the 3 most common activation functions and when to use each?"},
    ],
    temperature=0.3,  # lower = more focused and deterministic
)
print(response.choices[0].message.content)

Structured prompting with the OpenAI API.

Image generation

Models like Stable Diffusion and DALL-E generate images from text descriptions. They work by learning to reverse a noise-adding process: start with pure noise and iteratively denoise it, guided by the text prompt, until a coherent image emerges.

python

from openai import OpenAI
client = OpenAI()

response = client.images.generate(
    model="dall-e-3",
    prompt="A friendly robot teaching a classroom of students about neural networks, digital art",
    size="1024x1024",
    n=1,
)
print(response.data[0].url)

Generate an image with the OpenAI API.

RAG — Retrieval Augmented Generation

LLMs have a knowledge cutoff and can hallucinate. RAG fixes this by retrieving relevant documents from your own data before generating a response. The model gets real facts as context, producing more accurate and up-to-date answers.

1Index your documents — split them into chunks and store their embeddings in a vector database.
2Retrieve — when a user asks a question, find the most relevant chunks by similarity search.
3Generate — pass the retrieved chunks to the LLM as context along with the question.

RAG vs. fine-tuning

RAG is best when your data changes frequently (docs, knowledge bases). Fine-tuning is better for teaching the model a specific style or domain expertise that rarely changes.

AI agents and tool use

The latest frontier is AI agents — LLMs that can take actions in the real world. Instead of just generating text, they can call APIs, search the web, run code, and chain multiple steps together to accomplish complex tasks autonomously.

Function calling — the model decides when to invoke an external tool (calculator, database, API).
Planning — the model breaks a complex goal into sub-tasks and executes them in sequence.
Memory — agents can store and recall information across conversations.

Responsible AI

With great power comes great responsibility. Generative AI raises important ethical considerations:

Bias — models can reflect and amplify biases present in their training data.
Misinformation — generated content can be convincing but factually wrong.
Privacy — models may memorize and reproduce private information from training data.
Copyright — the legal status of AI-generated content is still evolving.
Environmental cost — training large models requires significant compute and energy.

Always verify

Never deploy AI-generated content in high-stakes scenarios without human review. Use AI as a copilot, not an autopilot.

PreviousComputer Vision