Friday, September 27, 2019

The Cauchy Distribution

In an appendix of Intermediate Physics for Medicine and Biology, Russ Hobbie and I analyze the Gaussian probability distribution
An equation for the Gaussian probability distribution.
It has the classic bell shape, centered at mean x with a width determined by the standard deviation σ.

Other distributions have a similar shape. One example is the Cauchy distribution
An equation for the Cauchy probability distribution.
where the distribution is centered at x and has a half-width at half-maximum γ. I initially thought the Cauchy distribution would be as well behaved as any other probability distribution, but it’s not. It has no mean and no standard deviation!

Rather than thinking abstractly about this issue, I prefer to calculate and watch how things fall apart. So, I wrote a simple computer program to generate N random samples using either the Gaussian or the Cauchy distribution. Below is a histogram for each case (N = 1000; Gaussian, x = 0, σ = 1; Cauchy, x = 0, γ = 1).

Histograms for 1000 random samples obtained using the Cauchy (left) and Gaussian (right) probability distribution.

Those samples out on the wings of the Cauchy distribution are what screw things up. The probability falls off so slowly that there is a significant chance of having a random sample that is huge. The histograms shown above are plotted from −20 to 20, but one of the thousand Cauchy samples was about −2400. I’d need to plot the histogram over a range more than one hundred times wider to capture that bin in the histogram. Seven of the samples had a magnitude over one hundred. By contrast, the largest sample from the Gaussian was about 4.6.

What do these few giant samples do to the mean? The average of the thousand samples shown above obtained from the Cauchy distribution is −1.28, which is bigger than the half-width at half-max. The average of the thousand samples obtained from the Gaussian distribution is −0.021, which is much smaller than the standard deviation.

Even more interesting is how the mean varies with N. I tried a bunch of cases, summarized in the figure below.
A plot of the mean versus sample size, for data drawn from the Gassian and Cauchy probability distribution.

There’s a lot of scatter, but the means for the Gaussian data appear to get smaller (closer to the expected value of zero) as N gets larger, The red line is not a fit, but merely drawn by eye. I included it to show how the means fall off with N. It has a slope of −½, implying that the means decay roughly as 1/√N. In contrast, the means for the Cauchy data are large (on the order of one) and don’t fall off with N. No matter how many samples you collect, your mean doesn’t approach the expected value of zero. Some oddball sample comes along and skews the average.

If you calculate the standard deviations for these cases, the problem is even worse. For data generated using the Cauchy distribution, the standard deviation grows with N. For N over a million, the standard deviation is usually over a thousand (remember, the half-width at half-max is one), and for my N = 5,000,000 case the standard deviation was over 600,000. Oddballs dominate the standard deviation.

I’m sorry if my seat-of-the-pants experimental approach to analyzing the Cauchy distribution seems simplistic, but for me it provides insight. The Cauchy distribution is weird, and I’m glad Russ and I didn’t include an appendix about it in Intermediate Physics for Medicine and Biology.

No comments:

Post a Comment