# Significance and Effect Size: The Bonferroni Correction

A test of significance is used to determine whether an effect is likely to have occurred by chance.  There has been much criticism of using p<.05 as a “cliff,” over which an effect is magically deemed to be real, and before which it is not.   Using this criterion, there is a 1 in 20 chance of finding an effect in your sample where none exists in the larger population.  So, if you run 100 tests, you would expect 5 to come out as “significant” by chance.

The Bonferroni correction is a simple, but very conservative, way of accounting for this problem and avoiding drawing conclusions based on chance findings.  If you run 100 correlations, you divide the p value by 100.  So, you would only consider a result to be significant if p<.05/100 or p<.0005.  The problem is that, by setting the p value so low, there is a 99.95% chance that you have rejected a finding that really holds in the larger population.  Even with p<.05, there is a 95% chance that you rejected a true finding, but we rarely pay much attention to those type II errors.  They become important in medical research, where failure to recognize a carcinogen, for example, can lead to fatal consequences.

## Benjamini-Hochberg Procedure

A less conservative alternative to the Bonferroni correction is the Benjamini-Hochberg procedure.  To use this, you need to decide how great your tolerance is for accepting a finding given that the observed effect does not exist in the larger population.  In this procedure, you rank the findings from lowest to highest p value.  The formula for the new cutoff point is (i/m)Q where i is the rank, m is the total number of tests, and Q is your tolerance for accepting a finding that is not borne out in the larger population.  So, for 100 correlation coefficients, with a 5% tolerance for accepting a relationship that doesn’t exist in the larger population, the finding with the lowest p value has to be less than .0005 (the same as the Bonferroni correction) to be considered significant: (1/100).05).  However, with this procedure, the required significance level drops as the rank increases.  So the second lowest significance level of the 100 only has to be significant at p<.001, and the tenth at p<.005. And the required significance level continues to drop as you move further down the ranking until, at i=100, the critical value is (100/100).05=.05.

## Practical Considerations

It seems clear that the Bonferroni correction can put an undue burden on the researcher when a large number of tests are being performed.  More about that problem was discussed in a previous post on Power Analysis.   The remaining question for this discussion is “At what point should one of these correction procedures be used?”  Clearly, the p<.05 criterion is only technically valid when only one significance test is being performed on a data set.  A 2-way ANOVA tests three effects and so the cutoff for each of the three effects using the Bonferroni correction would be p<.017 and, using the Benjamini-Hochberg procedure, the cutoff would also be p<.017 for the effect with the lowest p value, using Q=.05.  Most published articles report at least a dozen tests of significance, making the Bonferroni cutoff and Benjamini-Hochberg cutoff for the effect with the lowest p value p<.004.  But, in  30 years as editor of a peer-reviewed journal, I never saw anyone apply either of these corrections unless they are running at least 25 significance tests on a single data set.

Ultimately, the cutoff point for a significance level is always a subjective decision. The purpose of this posting is to offer some alternatives for coming to a reasonable conclusion as to whether or not to accept a finding as reflective of reality.  A reasonable solution is probably to use one of these correction procedures in exploratory studies to screen possible findings and then to do follow-up studies on new data sets using the findings from the initial study to reduce the number of tests performed in future studies.

Latest posts by Dr. Richard Pollard (see all)