← All simulations · Pillar 4: Making predictions

Linear regression: let the computer fit it

What it is

In the line of best fit you dragged a line by hand until it sat snugly among the dots. That is slow, and your eye can only get close. Real machine learning lets the computer find the best line — and the surprise is how: it doesn’t nudge and check, nudge and check. It calculates the exact answer in one step.

Go deeper: a line is y = m·x + b. There is exactly one pair (m, b) that makes the total miss as small as possible, and a short pair of formulas spits it straight out of the data. That method is least-squares linear regression.

Why care

This is one of the most-used calculations in all of science and business: the computer can fit a trend line to thousands of points instantly and identically every time. Once you trust the line, you can predict a value you have never seen — a price, a score, a measurement.

The idea, intuitively

Think of it like averaging, but for a relationship instead of a single number. To find the average of some numbers you add them up and divide — one calculation, no searching. The best-fit line works the same way: the computer adds up a few totals from the data (how big x is, how big y is, how they move together) and a formula turns those totals directly into the slope and the height. No hunting required.

Peek at the data first

As always, look at the data before modelling. Here are the records — hours a student studied and the exam score they got — with a quick summary of each column, just like Spectra’s describe_data.

Try it

Set a rough guess with the slope and height sliders — get it as close as you can by eye. Then press Auto-fit ▶ and watch your line glide to the exact best line the computer calculated. Try a wild guess and fit again: it always lands in the very same place.

Where it shows up

Where it came from

The exact-answer method is least squares, first published by Adrien-Marie Legendre in 1805. Carl Friedrich Gauss said he had used it earlier and published his own version in 1809, sparking a famous priority dispute; historians usually credit both. Because the answer comes from a formula rather than a search, the same data always gives the same line — one reason the method has lasted more than two centuries.

Try it in code

In the Studio, train_model on a "regressor" runs this same calculation for you — no sliders, just the best line:

data = load "houses"
model = make_model "regressor"
train_model model, on: data, predict: "price", using: ["size"]

say "A 1500 sqft house: about", predict(model, size: 1500)
plot_data data, x: "size", y: "price", line: model

Open it in the Studio ▶

Check your understanding