Neural Networks Fundamentals
How artificial neurons work, how layers stack into networks, and how training with backpropagation actually adjusts the weights.
Neural networks are the models behind deep learning. They are loosely inspired by the brain: many simple units (neurons) connected together can approximate remarkably complex functions.
The artificial neuron
A single neuron takes several inputs, multiplies each by a weight, adds a bias, and passes the result through an activation function. That is the entire idea:
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
inputs = np.array([0.5, 0.9, 0.2])
weights = np.array([0.4, 0.7, 0.1])
bias = -0.3
z = np.dot(inputs, weights) + bias
output = sigmoid(z)
print(round(output, 3)) # -> 0.633Why activation functions?
Without a non-linear activation (ReLU, sigmoid, tanh), stacking layers would just collapse into a single linear function. Non-linearity is what lets networks learn complex patterns.
From neurons to layers
Neurons are organized into layers. Data flows from the input layer, through one or more hidden layers, to the output layer. "Deep" learning simply means there are many hidden layers.
- Input layer — one node per feature.
- Hidden layers — where representations are learned.
- Output layer — produces the prediction (a class or a number).
Training: how the network learns
Training repeats a simple loop. A loss function measures how wrong the prediction is; backpropagation computes how each weight contributed to that error; and gradient descent nudges every weight in the direction that reduces the loss.
- 1Forward pass — feed inputs through the network to get a prediction.
- 2Compute loss — compare the prediction to the true label.
- 3Backward pass — use calculus (the chain rule) to find each weight’s gradient.
- 4Update — adjust weights slightly against the gradient. Repeat.
import torch
import torch.nn as nn
model = nn.Sequential(
nn.Linear(3, 8), # 3 inputs -> 8 hidden units
nn.ReLU(),
nn.Linear(8, 1), # 8 hidden -> 1 output
nn.Sigmoid(),
)
loss_fn = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
x = torch.rand(4, 3) # 4 samples, 3 features
y = torch.tensor([[1.], [0.], [1.], [0.]])
for epoch in range(100):
optimizer.zero_grad()
pred = model(x)
loss = loss_fn(pred, y)
loss.backward() # backpropagation
optimizer.step() # gradient descent updateThe learning rate matters
Too high and training diverges; too low and it crawls. Values like 0.01 or 0.001 are common starting points — always experiment.
Common architectures
- CNNs (Convolutional Neural Networks) — excel at images.
- RNNs / LSTMs — built for sequences like text and time series.
- Transformers — the architecture behind modern large language models.
You now have the mental model for deep learning: neurons compute weighted sums, layers stack them, and training gradually tunes millions of weights to minimize error. In the final chapter we put these ideas to work on real tasks.