← All simulations · Pillar 8: Brains made of math

Activation functions

What it is

Inside a neuron, two things happen. First it adds up its inputs: z = weight × input + bias. Then it runs that number through an activation function — a little switch that decides how strongly the neuron “fires.” Common switches are ReLU (off until the signal turns positive), sigmoid (a soft slide from 0 to 1), and tanh (a soft slide from −1 to +1).

Go deeper: the activation is what makes a network non-linear. Without it, every layer just adds and scales — and stacking adds-and-scales only gives you another straight line. The bend in the activation is what lets layers combine into curves, bumps, and the rich shapes a real model needs. The weight controls how sharp the bend is; the bias slides it left or right to set the switch-on point.

Why care

The activation is the small ingredient that turns a pile of linear math into something that can learn curvy, real-world patterns. Choosing it well also helps training: ReLU is cheap and rarely gets “stuck,” which is a big reason deep networks took off. Every layer you saw bend the boundary in the last sim did it through an activation exactly like these.

The idea, intuitively

Think of a light switch on a dimmer. A plain wire (no switch) just passes whatever comes in. A switch can stay off until there’s enough signal, then turn on — gently (sigmoid, tanh) or sharply (ReLU). Once you have a switch that bends, you can line up two of them to make a bump, and bumps can be stacked to trace any shape at all. A plain wire can never do that — two wires in a row are still just a wire.

Peek at the data first

There’s no dataset here — the “data” is the shape of each function. Here is the same signal z sent through each activation, so you can see how differently they each respond before you start bending them.

Try it

Pick an activation, then drag Weight to make the bend sharper and Bias to slide where it switches on (the dashed line marks that point). Now tick Stack two switches → a bump: sigmoid and tanh make a clean bump, ReLU a plateau — but pick Line (none) and the “bump” collapses to a flat line. That’s the whole reason networks need an activation.

Where it shows up

Every hidden layer. ReLU (and its cousins) sits inside almost every deep network you interact with, from image models to language models.
Output choices. Sigmoid turns a score into a yes/no probability; its big sibling softmax spreads a choice across many options.
Avoiding dead ends. Picking the right activation keeps the loss’s slope flowing back through deep stacks so training doesn’t stall.

Where it came from

The earliest neuron, McCulloch–Pitts (1943), used a hard step — on or off. Smooth switches like the sigmoid and tanh became standard with backpropagation in the 1980s because their gentle slopes are easy to learn from. The ReLU (rectified linear unit), used in neuroscience models earlier, was shown by Nair and Hinton (2010) and others to make deep networks train far better — and it has powered most of deep learning ever since.

Try it in code

In the Studio you choose each layer’s activation with kind — swap tanh for relu and watch training change:

data = load "fruits"

net = make_network
  layer input  from data
  layer hidden size 4 kind relu
  layer output size 4 kind softmax
end

train_network net, on: data, rounds: 30, speed: 0.6
plot_training net

Open it in the Studio ▶

Check your understanding

What two steps happen inside a neuron, in order?
Why can’t a network made of only “Line (none)” layers learn a curvy pattern?
What does the bias slider change about ReLU’s switch?