Chapter 4

Neural Networks Fundamentals

How artificial neurons work, how layers stack into networks, and how training with backpropagation actually adjusts the weights.

12 min read

Neural networks are the models behind deep learning. They are loosely inspired by the brain: many simple units (neurons) connected together can approximate remarkably complex functions.

The artificial neuron

A single neuron takes several inputs, multiplies each by a weight, adds a bias, and passes the result through an activation function. That is the entire idea:

python

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

inputs  = np.array([0.5, 0.9, 0.2])
weights = np.array([0.4, 0.7, 0.1])
bias    = -0.3

z = np.dot(inputs, weights) + bias
output = sigmoid(z)
print(round(output, 3))  # -> 0.633

A neuron from scratch with NumPy.

Why activation functions?

Without a non-linear activation (ReLU, sigmoid, tanh), stacking layers would just collapse into a single linear function. Non-linearity is what lets networks learn complex patterns.

From neurons to layers

Neurons are organized into layers. Data flows from the input layer, through one or more hidden layers, to the output layer. "Deep" learning simply means there are many hidden layers.

Input layer — one node per feature.
Hidden layers — where representations are learned.
Output layer — produces the prediction (a class or a number).

Training: how the network learns

Training repeats a simple loop. A loss function measures how wrong the prediction is; backpropagation computes how each weight contributed to that error; and gradient descent nudges every weight in the direction that reduces the loss.

1Forward pass — feed inputs through the network to get a prediction.
2Compute loss — compare the prediction to the true label.
3Backward pass — use calculus (the chain rule) to find each weight’s gradient.
4Update — adjust weights slightly against the gradient. Repeat.

python

import torch
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(3, 8),   # 3 inputs -> 8 hidden units
    nn.ReLU(),
    nn.Linear(8, 1),   # 8 hidden -> 1 output
    nn.Sigmoid(),
)

loss_fn = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

x = torch.rand(4, 3)          # 4 samples, 3 features
y = torch.tensor([[1.], [0.], [1.], [0.]])

for epoch in range(100):
    optimizer.zero_grad()
    pred = model(x)
    loss = loss_fn(pred, y)
    loss.backward()           # backpropagation
    optimizer.step()          # gradient descent update

A small network in PyTorch.

The learning rate matters

Too high and training diverges; too low and it crawls. Values like 0.01 or 0.001 are common starting points — always experiment.

Common architectures

CNNs (Convolutional Neural Networks) — excel at images.
RNNs / LSTMs — built for sequences like text and time series.
Transformers — the architecture behind modern large language models.

You now have the mental model for deep learning: neurons compute weighted sums, layers stack them, and training gradually tunes millions of weights to minimize error. In the final chapter we put these ideas to work on real tasks.

PreviousMachine Learning Basics Next Practical Examples