← All simulations · Pillar 4: Predicting numbers

Over-fitting

What it is

Over-fitting is when a model learns the practice data too well — including its random noise — and so does worse on new data it has never seen. Give a model enough flexibility and it can twist itself to pass through every single training dot. That looks perfect on the dots it studied, but it has memorized the answer key instead of learning the real pattern.

Go deeper: the opposite mistake is under-fitting — a model too stiff to capture the real shape (a straight line through a curved trend). Between too-stiff and too-wiggly sits a sweet spot. The only honest way to find it is to hide some data, train on the rest, and check the error on what was hidden. Training error always keeps falling as you add flexibility, so it can’t be trusted; the held-out test error tells the truth.

Why care

Over-fitting is the single most common way machine-learning projects go wrong. A model that scores 99% on its own practice data and then fails in the real world has almost always over-fit. Spotting it — and choosing the right amount of flexibility — is a core skill behind every trustworthy model, and it’s exactly why we hold out a test set in the first place.

The idea, intuitively

Slide the flexibility dial. At the low end the curve is a stiff line that misses the bend — both errors are high. Raise it and the curve eases into the real shape; the test error drops to its lowest. Keep going and the curve starts wiggling to touch every studied dot: training error shrinks toward zero, but test error climbs back up. Same data, same model family — only the flexibility changed.

Peek at the data first

Each dot is just an input and an output, with some held back for testing. There’s a smooth true pattern behind them, but it’s hidden — the model only ever sees the noisy dots, the same kind of split Spectra’s split makes before training.

Try it

Drag Flexibility from a stiff line up to a wild wiggle. Watch the two error bars: training error (blue) only ever falls, while test error (red) dips to a sweet spot and then rises — that rise is over-fitting. Turn on Show the hidden true pattern to see the green curve the dots really came from, and watch a too-wiggly fit veer away from it.

Where it shows up

Every kind of model. Decision trees grown too deep, neural networks trained too long, curves with too many bends — all can memorize instead of generalize.
Choosing complexity. Picking a model size, a tree depth, or how long to train is really picking where to sit on this flexibility dial.
Trusting a score. A great score on training data means little; the held-out score is what predicts real-world performance.

Where it came from

The danger of over-complex models is old — it echoes Occam’s razor, the centuries-old preference for the simplest explanation that fits. As statistics and machine learning grew, this became the formal bias–variance trade-off, and holding out a test set (and cross-validation) became the standard defense. It’s why “train on some, test on the rest” is the first rule of honest modeling.

Try it in code

In the Studio, the cure for over-fitting is to hold data out and check on it — a high training score with a low test score is the tell:

data  = load "fruits"
train, test = split data, hold_out: 20%

model = make_model "tree"
train_model model, on: train, predict: "type", using: ["sweetness", "size"]

check model, with: train   # looks great...
check model, with: test    # ...the honest score

Open it in the Studio ▶

Check your understanding

Why does training error keep falling even when the model is getting worse?
What does the lowest point of the test-error curve tell you?
Why is a wiggly curve that hits every training dot usually a bad sign, not a good one?