← All simulations · Pillar 4: Making predictions

Residuals: the misses a line leaves behind

What it is

No line passes through every dot. A residual is one dot’s miss: how far it sits above or below the line. Residuals are how we measure a fit — and the famous method least squares is named after what it does to them: it makes the residual squares as small as possible.

Go deeper: turn each miss into a square whose side is the miss, so its area is the miss times itself. Add up every square and you get the one number least-squares minimises. Squaring is why a single far-off dot pulls the line so hard — its square is huge.

Why care

Residuals are the honest report card of any prediction. Scientists and engineers barely glance at the line itself — they study the residuals, because that is where a model’s mistakes hide. A pattern left in the residuals is a warning: the model missed something real.

The idea, intuitively

Think of the line as a guess and each residual as the leftover error. If the guess truly caught the trend, the leftovers should look like plain random scatter — sometimes a bit high, sometimes a bit low, with no shape. If the leftovers still slope or curve, the line missed part of the story. The little residual plot below makes that leftover pattern easy to see.

Peek at the data first

As always, look at the data first. Here are the records — how much water a plant got each day and how tall it grew — with a quick summary of each column, just like Spectra’s describe_data.

Try it

Tilt the line and watch each orange stick — the miss — grow or shrink. Tick Show each miss as a square to see the squares least-squares cares about. Then tick Snap to the best line: the squares reach their smallest, and in the residual plot the misses settle into an even, patternless scatter.

Where it shows up

Checking any model. Whether the prediction is a price, a temperature, or a test score, people plot the residuals to see if the model can be trusted.
Spotting the wrong shape. A U-shaped residual plot says “a straight line wasn’t enough — the real trend curves.”
Catching outliers. One giant residual flags a surprising data point worth a closer look.

Where it came from

Squaring the misses is the core of least squares, published by Adrien-Marie Legendre in 1805 and by Carl Friedrich Gauss in 1809. Reading the leftover misses to judge a fit grew into a whole field — regression diagnostics — in the twentieth century, as statisticians realised the residuals often say more than the line.

Try it in code

The Studio fits the best line; plotting the fit over the data lets you eyeball the residuals as the gaps between each dot and the line:

data = load "houses"
model = make_model "regressor"
train_model model, on: data, predict: "price", using: ["size"]

say "Each dot's gap from this line is its residual."
plot_data data, x: "size", y: "price", line: model

Open it in the Studio ▶

Check your understanding

Why does squaring a miss punish a far-off dot so much more than a close one?
What should a residual plot look like when the line has caught the trend?
If the residual plot is U-shaped, what does that tell you to try instead of a straight line?