← All simulations · Pillar 8: Brains made of math

CNN filters on images

What it is

A picture is just a grid of numbers — one per pixel. A convolutional filter is a tiny grid of weights (here 3×3) that you slide across the image. At every spot it multiplies the pixels underneath by its weights and adds them up, producing one number. Do that everywhere and you get a new little map — a feature map — that lights up wherever the filter’s pattern appears.

Go deeper: this slide-multiply-add is called a convolution. Because the same small filter is reused at every position, it only has a handful of weights yet can find its pattern anywhere in the image. A real convolutional neural network (CNN) stacks many such filters; early layers find edges, later layers combine those into corners, shapes, and eventually whole objects.

Why care

Convolution is why computers got good at seeing. Instead of treating every pixel as an unrelated input, a CNN looks for small local patterns and reuses the same detectors across the whole picture — far fewer weights, and the pattern is found no matter where it sits. This one idea powers handwriting readers, medical-image tools, photo tagging, and self-driving perception.

The idea, intuitively

Imagine a tiny stencil cut to match “a vertical edge.” You glide it over the picture; it glows brightly only where the picture really has a vertical edge under it. Swap the stencil for “a horizontal edge” and a different part of the image glows. Each filter is a question (“is my pattern here?”), and the feature map is the answer drawn out across the whole image.

Peek at the data first

The image is a small grid of 0s and 1s — a hand-drawn 7. The filter is an even smaller grid of weights. Here are the raw pixels so you can see the picture is really just numbers before the filter slides over it.

Try it

Pick a filter, then drag Slide the filter across to move the 3×3 window (the outline) over the 7. The right-hand map fills in as you go — bright cells are strong matches. Try Vertical edges then Horizontal edges and watch the bright spots jump. Tick Highlight the strongest hits to outline where the edge really lives.

Where it shows up

Reading handwriting. CNNs on digit images (like the famous MNIST set) were one of deep learning’s first big wins.
Medical imaging. Filters that find edges and textures help flag tumors or fractures in scans.
Everyday vision. Photo search, face grouping, and self-driving cameras all lean on stacks of learned convolutional filters.

Where it came from

The idea traces to Hubel and Wiesel’s 1960s discovery that cells in the visual cortex respond to small edges at specific orientations. Kunihiko Fukushima’s Neocognitron (1980) built that into a layered model, and Yann LeCun and colleagues made it trainable with backpropagation in LeNet (1989–1998) to read handwritten digits. The approach exploded in 2012 when AlexNet won the ImageNet contest, kicking off the modern deep-learning era.

Try it in code

Spectra keeps images out of the language for safety, but the network you build is the same family of stacked, trainable layers — convolution is just a special, weight-sharing layer:

data = load "fruits"

net = make_network
  layer input  from data
  layer hidden size 8 kind relu
  layer output size 4 kind softmax
end

train_network net, on: data, rounds: 30, speed: 0.6
plot_training net

Open it in the Studio ▶

Check your understanding

What three steps happen each time the filter lands on a new patch?
Why does the same filter use so few weights yet work anywhere in the image?
Why do the bright spots move when you switch from a vertical to a horizontal edge filter?