← All simulations · Pillar 5: Making decisions

Naive Bayes

What it is

Is this pet a cat or a dog? You have two clues: how much it weighs and how long it is. Naive Bayes treats each clue as a little vote. It already knows what cats and dogs usually weigh and measure, so each clue says “this looks more cat” or “more dog” — and the method combines those votes into one belief.

Go deeper: for every clue, the model learned a bell curve per class — the typical value and how much it varies. Reading the height of each curve at your value gives the chance of seeing that value for a cat vs a dog. Multiply those chances together, multiply by the prior (how common each animal is to begin with), and scale so the two add to 100%. That last number is the belief.

Why care

Naive Bayes is fast, tiny, and famously good for its size — it was the workhorse behind the first spam filters and still powers lots of text sorting today. It is also a beautiful first look at probability in machine learning: instead of one hard yes/no, it gives a belief you can read, like “82% cat.”

The idea, intuitively

Each clue has two overlapping bell curves — one for cats, one for dogs. Where your value lands under the taller curve, that animal gets the louder vote. One clue on its own is often a toss-up in the overlap zone. But stack a second clue and the two votes multiply: a faint “maybe cat” plus another faint “maybe cat” becomes a confident cat.

Peek at the data first

Before believing anything, look at the pets we already measured — weight, length, and which animal each one was. The bell curves below are just a summary of columns like these, much like Spectra’s describe_data.

Try it

Drag the dashed marker in each panel (or use the sliders) to set the pet’s weight and length. The dots show how high each animal’s bell curve is at your value, and the bar shows the combined belief. Uncheck Use the length clue too to decide with a single clue.

Where it shows up

Spam filters. The classic use: each word is a clue voting “spam” or “not,” and the votes multiply into a decision.
Sorting text. Tagging articles by topic, or reviews as positive or negative, from the words they contain.
Quick first guesses. Because it is so cheap, it is a great baseline to compare fancier models against.

Where it came from

The rule for updating a belief with evidence is Bayes’ theorem, named for the Reverend Thomas Bayes, whose work was published in 1763 after his death (and developed further by Pierre-Simon Laplace). Using it as a simple classifier — with the “naive” assumption that clues are independent — became a standard, surprisingly strong method in the 20th century.

Try it in code

The Studio’s bayes model is a Naive Bayes classifier, and show_model prints the profile (prior + typical value) it learned for each answer:

data  = load "flowers"
train, test = split data, hold_out: 20%

model = make_model "bayes"
train_model model, on: train, predict: "species", using: ["petal_length", "petal_width"]

check model, with: test
show_model model

Open it in the Studio ▶

Check your understanding

Why can two “maybe” clues add up to a confident guess?
What does the prior do, and when would it matter most?
Why is it called “naive” — and why does it still work so well anyway?