← All simulations · Pillar 5: Making decisions

Ensemble voting

What it is

An ensemble combines many so-so models into one strong one by letting them vote. A single voter here is right about 60% of the time — barely better than a coin flip. But gather several different voters and take the majority answer, and the crowd is right far more often than any one of them. It’s the “wisdom of the crowd.”

Go deeper: the magic only works when the voters are diverse — they make different mistakes, so their errors cancel out when you add them up. If every voter thinks the same way (clones), they all make the same mistakes and the crowd is no better than one. That’s why a random forest deliberately trains each tree on a different slice of the data: to keep them disagreeing in useful ways.

Why care

Many of the most reliable models in everyday use — the ones predicting prices, spam, or risk — are ensembles. The idea that a crowd of weak guessers can beat a single expert is both surprising and incredibly practical, and it shows up far beyond AI (juries, elections, prediction markets).

The idea, intuitively

Think of guessing the number of jellybeans in a jar. Any one person is usually way off — but the average of a whole class is often eerily close. Each wrong guess leans a different way, so they cancel out. Ensemble voting is that same trick applied to predictions.

Peek at the data first

There’s nothing to type — the numbers come straight from probability (safety by design). What matters is how many voters there are, how good each one is, and whether they’re diverse.

Try it

Slide How many voters up and watch the crowd’s accuracy climb above a single voter’s. Change Each voter’s skill to see how even weak voters add up. Then tick Make the voters all think alike and watch the boost disappear.

Where it shows up

Random forests. A crowd of different decision trees voting — a workhorse of machine learning.
Boosting (XGBoost). Many small models added up, each fixing the last one’s mistakes.
Competitions. Winning solutions almost always blend several models together.

Where it came from

The maths dates to Condorcet’s Jury Theorem (1785): if each juror is more likely right than wrong and they decide independently, bigger juries are more reliable. Centuries later, Leo Breiman turned the idea into machine learning with bagging (1996) and the random forest (2001), and ensembles have dominated practical ML ever since.

Try it in code

A random forest is an ensemble of trees. In Spectra you just ask for one — the crowd-voting happens inside:

data = load "students"
train, test = split data, hold_out: 30%

model = make_model "forest"
train_model model, on: train, predict: "result", using: ["hours_studied", "sleep_hours", "attendance"], trees: 12
check model, with: test

Open it in the Studio ▶

Check your understanding

How can a crowd of 60%-right voters be much more than 60% right together?
Why does the boost vanish when the voters all think alike?
How does a random forest keep its trees diverse?