Épisodes

  • When Clean Data Is Actually Dirty
    Feb 16 2026

    “Cleaning” data is often treated as a harmless preprocessing step.

    Delete missing rows.

    Fill gaps with the mean.

    Move forward.

    But cleaning is not neutral.

    It is a modeling decision that can change:

    • The estimand
    • The sampling mechanism
    • The bias–variance trade-off

    In this episode, we examine the statistical dangers of deletion and simple imputation — and why naïve cleaning can quietly corrupt inference.

    Voir plus Voir moins
    6 min