Friday, May 23, 2014

Normality is Overrated

People are often overly concerned about having normally distributed data.  There are a few cases where having normally distributed data is important.  There are many more cases where distribution does not matter very much.  It's usually much more important that the data are drawn from a stable and predictable (same as homogeneous, or in control) process, but nobody seems to pay much attention to that.

Choosing the Normal Distribution to represent your data is an assumption.  Your choice of assumptions does not add information to the data.

If your data are stable and predictable, you can pretty well depend on the following rules holding true, regardless of how the data are distributed:

60-75% of the data will be within plus and minus one standard deviation of the mean.

90-98% of the data will be within plus and minus two standard deviations.

99-100% of the data will be within plus and minus three standard deviations.

In view of this, the conventional 68%, 95%, and 99.7% numbers for the Normal Distribution are a modest refinement of the general case.

No comments:

Post a Comment