Significance and Effect Size: Power Analysis

By

In the early 1970s, Robert Rosenthal published a meta-analysis examining the effects of psychotherapy.  About half of the studies found significant improvement in the mental health after short term psychotherapy and half did not.  In looking more closely at the studies, he realized that the major difference between those that found a significant improvement and those that did not was sample size. If the sample size was small, no significant improvement was found; if it was large, the improvement was significant.  So, the cynical interpretation is that those who believed in psychotherapy evaluated a large number of patients and those who wanted to debunk the value of psychotherapy evaluated a small number of patients.  The overall conclusion of the meta-analysis was that psychotherapy has a small positive effect on patients’ mental health.

In the time since the publication of Rosenthal’s meta-analysis, power analysis has become a standard tool for social scientists.  By making a few assumptions about the effect you are looking for and plugging them into a simple program, you can find out how many subjects you need to include in your study in order to give you a good chance of finding a significant result.  

Free Power Analysis Software

There are a number of good tools available online for free.  G*Power is one of those that has a fairly complete list of statistical procedures and can be downloaded to PC or Mac for free.  It includes a large number of tests, listed under the following general families: exact (includes correlation and multiple regression), F tests, t tests, Chi Square, and z tests.  You pick the test family from a drop down menu and then the specific statistical test from a second drop-down.  Power analysis is generally used before data collection in order to determine the minimum sample size necessary, but G*power also has an option to look at you sample after the fact to determine whether or not you have given yourself a fair chance to find a significant result.

Required Parameters for Power Analysis

The first question you have to ask yourself is whether you are going to perform a one- or two-tailed test of significance. I argued for always performing two-tailed tests in the past because you don’t want to throw out what would have been a highly significant result just because it came out in the opposite direction to what you predicted.  But there are exceptions to every rule.  If the opposite result would completely defy logic, a one-tailed test may be justified.  A two-tailed test will, of course require more subjects than a one-tailed test, all other factors being equal.

Second, you need to specify on effect size.  The most common effect size is r (for correlation), or some parameter that is equivalent to r, such as Phi (for Chi Square).  These parameters (r and Phi) run from -1 to 1 and, depending on your discipline, .3 or -.3 would generally be considered a weak effect, .5 or -.5 a moderate effect, and .7 or -.7 a strong effect. But, again, different disciplines have different standards for weak, moderate, or strong effects.  This is the most difficult parameter to set in a power analysis.  In general, it is best to be conservative in picking your anticipated effect size.  If you set it too high, you may miss a real effect that is just smaller than you anticipated.

The third parameter is the p value, or alpha error value.  In the social sciences, this is generally p<.05.  But see previous blogs: “Significance and Effect Size: An Introduction,“ “The p<.05 Cliff,” and “The Bonferroni Correction” for discussions of factors affecting the appropriate choice of the alpha error.

Finally, you need to specify your chosen power level.  This is the likelihood of finding a significant effect in your sample if there is a non-zero effect in the same direction in the larger population.  Typically, you pick a power level between .80 and .95, indicating that you want to be 80% to 95% certain of finding a significant effect in your sample if the effect exists in the larger population.  The choice of this parameter depends on how important it is for you not to miss the effect if it really exists.  For a dissertation, I would set the power to .95.

Deciding on the Final Sample Size

These free power analysis tools are quick and easy to use.  It’s worth playing with several combinations of the parameters before deciding on your final sample size.  And it’s important to remember that, if you perform a statistical analysis on a small sample and get what looks like a real difference, but that is not significant, you can’t legitimately add observations and run the analysis again.  You must start over with a larger sample.

Latest posts by Dr. Richard Pollard (see all)