Significance and Effect Size: The Bonferroni Correction


A test of significance is used to determine whether an effect is likely to have occurred by chance.  There has been much criticism of using p<.05 as a “cliff,” over which an effect is magically deemed to be real, and before which it is not.   Using this criterion, there is a 1 in 20 chance of finding an effect in your sample where none exists in the larger population.  So, if you run 100 tests, you would expect 5 to come out as “significant” by chance.

The Bonferroni correction is a simple, but very conservative, way of accounting for this problem and avoiding drawing conclusions based on chance findings.  If you run 100 correlations, you divide the p value by 100.  So, you would only consider a result to be significant if p<.05/100 or p<.0005.  The problem is that, by setting the p value so low, there is a 99.95% chance that you have rejected a finding that really holds in the larger population.  Even with p<.05, there is a 95% chance that you rejected a true finding, but we rarely pay much attention to those type II errors.  They become important in medical research, where failure to recognize a carcinogen, for example, can lead to fatal consequences.

Benjamini-Hochberg Procedure

A less conservative alternative to the Bonferroni correction is the Benjamini-Hochberg procedure.  To use this, you need to decide how great your tolerance is for accepting a finding given that the observed effect does not exist in the larger population.  In this procedure, you rank the findings from lowest to highest p value.  The formula for the new cutoff point is (i/m)Q where i is the rank, m is the total number of tests, and Q is your tolerance for accepting a finding that is not borne out in the larger population.  So, for 100 correlation coefficients, with a 5% tolerance for accepting a relationship that doesn’t exist in the larger population, the finding with the lowest p value has to be less than .0005 (the same as the Bonferroni correction) to be considered significant: (1/100).05).  However, with this procedure, the required significance level drops as the rank increases.  So the second lowest significance level of the 100 only has to be significant at p<.001, and the tenth at p<.005. And the required significance level continues to drop as you move further down the ranking until, at i=100, the critical value is (100/100).05=.05.

Practical Considerations

It seems clear that the Bonferroni correction can put an undue burden on the researcher when a large number of tests are being performed.  More about that problem was discussed in a previous post on Power Analysis.   The remaining question for this discussion is “At what point should one of these correction procedures be used?”  Clearly, the p<.05 criterion is only technically valid when only one significance test is being performed on a data set.  A 2-way ANOVA tests three effects and so the cutoff for each of the three effects using the Bonferroni correction would be p<.017 and, using the Benjamini-Hochberg procedure, the cutoff would also be p<.017 for the effect with the lowest p value, using Q=.05.  Most published articles report at least a dozen tests of significance, making the Bonferroni cutoff and Benjamini-Hochberg cutoff for the effect with the lowest p value p<.004.  But, in  30 years as editor of a peer-reviewed journal, I never saw anyone apply either of these corrections unless they are running at least 25 significance tests on a single data set.

Ultimately, the cutoff point for a significance level is always a subjective decision. The purpose of this posting is to offer some alternatives for coming to a reasonable conclusion as to whether or not to accept a finding as reflective of reality.  A reasonable solution is probably to use one of these correction procedures in exploratory studies to screen possible findings and then to do follow-up studies on new data sets using the findings from the initial study to reduce the number of tests performed in future studies.

Dr. Richard Pollard
Latest posts by Dr. Richard Pollard (see all)