← All simulations · Pillar 4: Predicting numbers
Multiple features
What it is
One clue can take a prediction only so far. Multiple features means giving the model several clues at once — here, a house’s size and its number of rooms — and letting it weigh each one. The model is still a straight-line predictor, just in more dimensions: it adds up each clue times its own weight, plus a starting amount, to guess the price.
Go deeper: with two clues the formula is
price ≈ start + a·size + b·rooms. Training picks the weights
a and b that make the guesses miss by the least overall. We can’t draw a line
through 3-D data on a flat page, so instead we plot predicted vs. actual: each dot’s
across-position is the true price and its up-position is the model’s guess. A perfect prediction
lands exactly on the diagonal — the closer the cloud hugs that line, the better the model.
Why care
Almost nothing in the real world is decided by a single number. A price, a grade, a risk — each depends on many things at once. Knowing how to add clues (and how to tell a helpful clue from a useless one) is the everyday craft of building a good predictor. It’s also where a crucial lesson hides: more clues only help if they actually carry signal.
The idea, intuitively
Start with only “Size” and the predicted-vs-actual dots scatter off the diagonal — size alone leaves a lot unexplained. Switch on “Rooms” and the dots snap onto the line as the error plunges: two real clues together pin down the price. Then add the “lucky number” clue — pure noise — and watch the test error tick back up. A clue with no real link to the answer can’t help on new houses, and may even hurt.
Peek at the data first
Each row is a house with a few clues and its price. Size and rooms really drive the price; the
“lucky number” is a made-up decoy with no real link to it — the same labelled shape
Spectra’s describe_data would show before training.
Try it
Tick clues on and off and watch the predicted-vs-actual cloud. Start with Size only, then add Rooms and see the dots snap onto the diagonal as the test error drops. Finally add the lucky-number clue and watch the test error climb back up — a clue with no signal can’t help, and often hurts.
Where it shows up
- Pricing & valuation. Homes, cars, insurance — many measured features combine into one predicted number.
- Feature selection. Real projects spend much of their time deciding which clues to keep; useless ones add cost and can degrade the model.
- Every bigger model. Trees, forests and neural networks all take many features in at once — this is the simplest version of that idea.
Where it came from
Fitting a line to data by minimizing squared error — least squares — was published by Adrien-Marie Legendre in 1805, with Carl Friedrich Gauss claiming earlier use. Extending it to several clues at once gives multiple linear regression, formalized through the 1900s by statisticians like Ronald Fisher, and it remains the bedrock of predicting numbers from many inputs.
Try it in code
In the Studio, just list more columns in using: to add clues — and ask
importance which ones the model actually leans on:
data = load "houses" train, test = split data, hold_out: 20% model = make_model "regressor" train_model model, on: train, predict: "price", using: ["size", "rooms"] check model, with: test importance model show_model model
Check your understanding
- Why does adding “Rooms” to “Size” pull the dots onto the diagonal?
- Why does the “lucky number” clue make the test error worse, not better?
- What does it mean for a dot to sit exactly on the diagonal line?