Page de couverture de When Clean Data Is Actually Dirty

When Clean Data Is Actually Dirty

When Clean Data Is Actually Dirty

Écouter gratuitement

Voir les détails du balado

À propos de cet audio

“Cleaning” data is often treated as a harmless preprocessing step.

Delete missing rows.

Fill gaps with the mean.

Move forward.

But cleaning is not neutral.

It is a modeling decision that can change:

  • The estimand
  • The sampling mechanism
  • The bias–variance trade-off

In this episode, we examine the statistical dangers of deletion and simple imputation — and why naïve cleaning can quietly corrupt inference.

Pas encore de commentaire