EP04 · 8 min

Generalization: overfitting, underfitting, train/val/test, leakage

Learn how to evaluate models honestly and detect leakage before deployment.

Simple definition

A good model performs well on new data, not only on data it already saw.

Precise definition

Generalization is the model's expected performance on unseen samples from the same underlying distribution.

Objective

This lesson trains your "skeptic mode". You should be able to look at any result and ask whether it will survive real traffic.

Core ideas

Overfitting: model memorizes noise, fails on new data.
Underfitting: model too simple, misses signal.
Train split: fit parameters.
Validation split: tune choices.
Test split: final unbiased estimate.

Worked example (online store)

Imagine delivery predictions with features including "actual delivered timestamp" by mistake. The model appears brilliant because it indirectly sees the answer. In production, that column does not exist, so performance collapses.

This is leakage: hidden answer clues during training.

Spot-the-leak mindset

Ask for each feature:

Is it available at prediction time?
Is it derived from the target?
Could it include future information?

If yes to any, investigate.

Practical loop

Build baseline.
Evaluate on validation.
Lock choices.
Report once on test.

Never tune after seeing test metrics unless you intentionally reset the experimental protocol.

Three takeaways

High scores are meaningless without split discipline.
Leakage is common and expensive.
Honest evaluation beats impressive dashboards.

Visual Stage

Interactive walkthrough

Visual walkthrough: underfit vs overfit

Tap each scenario to reveal how it behaves.

Step Insight

Model is too simple, so it misses meaningful patterns in both train and validation data.

Common traps

Tuning repeatedly on the test set.
Accidentally using future information during training.
Interpreting high training accuracy as production readiness.

Three takeaways

Use train/validation/test with clear boundaries.
Leakage can make bad models look excellent.
Generalization quality is the real KPI.

Next lesson

Interactive Panel

Complete the blocks to lock in the lesson.

Quiz progress: 0 / 5

Score: 0 / 5

Spot-the-leak mini game

Drag each feature idea into safe vs leakage buckets.

Draggable Terms

Targets

Safe at inference time

Drop item here

Likely leakage

Drop item here

Correct matches: 0 / 4

Quick check (5 questions)

Validate your generalization instincts.

1. Which split should remain untouched for final reporting?

2. Overfitting usually means:

3. Leakage often causes:

4. Best response after tuning on test repeatedly?

5. Which question catches leakage early?

Score: 0 / 5 · Answered 0

Teach-back

Write a short protocol summary.

Explain why train/validation/test separation protects against false confidence.