EP04 · 8 min

Generalization: overfitting, underfitting, train/val/test, leakage

Learn how to evaluate models honestly and detect leakage before deployment.

Simple definition
A good model performs well on new data, not only on data it already saw.
Precise definition
Generalization is the model's expected performance on unseen samples from the same underlying distribution.

Objective

This lesson trains your "skeptic mode". You should be able to look at any result and ask whether it will survive real traffic.

Core ideas

  • Overfitting: model memorizes noise, fails on new data.
  • Underfitting: model too simple, misses signal.
  • Train split: fit parameters.
  • Validation split: tune choices.
  • Test split: final unbiased estimate.

Worked example (online store)

Imagine delivery predictions with features including "actual delivered timestamp" by mistake. The model appears brilliant because it indirectly sees the answer. In production, that column does not exist, so performance collapses.

This is leakage: hidden answer clues during training.

Spot-the-leak mindset

Ask for each feature:

  • Is it available at prediction time?
  • Is it derived from the target?
  • Could it include future information?

If yes to any, investigate.

Practical loop

  1. Build baseline.
  2. Evaluate on validation.
  3. Lock choices.
  4. Report once on test.

Never tune after seeing test metrics unless you intentionally reset the experimental protocol.

Three takeaways

  • High scores are meaningless without split discipline.
  • Leakage is common and expensive.
  • Honest evaluation beats impressive dashboards.

Visual Stage

Interactive walkthrough

Visual walkthrough: underfit vs overfit

Tap each scenario to reveal how it behaves.

Step Insight

Model is too simple, so it misses meaningful patterns in both train and validation data.

Common traps
  • Tuning repeatedly on the test set.
  • Accidentally using future information during training.
  • Interpreting high training accuracy as production readiness.
Three takeaways
  • Use train/validation/test with clear boundaries.
  • Leakage can make bad models look excellent.
  • Generalization quality is the real KPI.
Next lesson