← All simulations · Pillar 1: Numbers & pictures

Histograms

What it is

A histogram is the picture of a single column of numbers. You chop the range into equal-width slots called bins and count how many values fall into each, then draw a bar for every bin. Tall bars mark the common values, short bars the rare ones — so the bumps and gaps reveal the shape of the data at a glance.

Go deeper: a histogram is not a bar chart of categories — the bars touch because they cover a continuous range, and the bin width is a choice you make. Pick too few bins and neighbouring values get lumped together until the shape flattens into a blob; pick too many and each bar holds only a value or two, so real structure drowns in jitter. The right width lets the true distribution — a single peak, two peaks, a long tail — stand out.

Why care

Before any model runs, you look at the shape of your data — and the histogram is how. It exposes skew, gaps, outliers and whether there are really two groups hiding in one column. It also shows what a lone average can hide: a long tail can drag the mean far from the typical value, a trap behind many misleading statistics.

The idea, intuitively

Here are 40 kids and how many minutes each read today. Slide the bin count and watch the same numbers re-pour into the bars: too few and it’s a shapeless mound, too many and it’s a spiky mess. In between, a clear picture appears — a big pile of light readers on the left and a thin tail of bookworms stretching right. Mark the average and the middle to see that tail at work.

Peek at the data first

It’s just one column of numbers — minutes read — the same single-column shape Spectra’s describe_data would summarise before you chart it.

Try it

Drag Number of bins from a few wide bars up to many narrow ones and watch the shape appear, smear, and shatter. Tick Mark the average and the middle to drop in the mean (red) and median (green) lines and see the long tail pull the average to the right of the middle value.

Where it shows up

Exploring data. The first chart most analysts draw, to spot skew, gaps and outliers.
Spotting two groups. Two humps in one column often mean two hidden populations mixed together.
Images & signals. Brightness histograms drive photo auto-contrast; the same idea tunes audio and sensor data.

Where it came from

The word and modern form were popularised by the statistician Karl Pearson around 1891, though binned tallies of data go back much further — to John Graunt’s 1662 tables of London mortality and beyond. The bin-width question Pearson raised is still studied today.

Try it in code

In the Studio, plot_distribution bins a column into a histogram for you — the same pour-into-bins move you just slid:

data = load "weather_town"
describe_data data
plot_distribution data, x: "temperature", bins: 8

Open it in the Studio ▶

Check your understanding

What is a “bin,” and why do a histogram’s bars touch each other?
What goes wrong with too few bins? With too many?
Why does a long right tail pull the average above the middle value?