Usability Techniques
How Many Subjects Do I Need for a Statistically Valid Survey?

by Daryle Gardner-Bonneau, Ph.D.
Office of Research
Michigan State University/Kalamazoo Center for Medical Studies
Reprinted from Usability Interface, Vol 5, No. 1, July 1998

Beware of people who give quick, pat answers in response to the question - "I’m doing a survey. How many subjects do I need?" They probably haven’t a clue as to what they’re talking about.

There aren’t any valid quick answers to this question. I work in the medical domain and advise faculty/residents/medical students on sample size determination for survey research studies all the time because, in medicine, survey results are often discounted and are not publishable unless you can support/validate the decision you made regarding sample size. We do this through power analysis, and except for the simplest power analyses, it's good to have the advice and assistance of a statistician.

That said, I can tell you how we generally approach the problem for surveys and what information a statistician needs to do a power analysis to determine sample size.

Usually, surveys involve a number of hypotheses. You do a power analysis and get a sample size estimate with respect to each hypothesis, but I usually ask folks to give me the two or three most important survey questions or, more specifically, hypotheses, they want to explore. We do power analyses for those, get a sample size estimate for each one, and from there make a decision as to the sample size for the survey as a whole.

Here's an example to give you some idea of what your statistician needs to know to determine the sample size for a survey. Let's say you're looking for a difference in patient satisfaction between two departments in a hospital - obstetrics and cardiology - and in your survey patients are asked to rate their satisfaction on a scale from 1 to 100. To determine how many patients to sample, the statistician needs information/estimates with respect to the following questions:

1. What do you consider an "important" difference in satisfaction ratings that you'd like to be able to detect between the two departments (e.g., 10 points? 20 points?)?

2. What do you think the variability is in satisfaction ratings?
Note: This might be a tough question to answer, and in the absence of any data you may have to guess. But what you might use, for example, is the standard deviation of ratings in the last survey of patient satisfaction you did, unless there was something more specific available.

3. What is, in your mind, an acceptable probability of an alpha error - an alpha error meaning that you will see a statistically significant difference in the samples, when no difference actually exists in the populations? This is often set by convention at .05.

4. Similarly, what is an acceptable probability for a beta error - that you may NOT find a statistically significant difference between the samples when there actually is a difference in the populations? This is also often estimated by convention as .20, .15, or .10, the first of these being the most common.

If you can answer these four questions, the statistician can then estimate the number of obstetrics and cardiology patients you need to sample. Sometimes, when we're really "iffy" on the answer to a question, we'll run several power analyses, say, with different values for the alpha, beta, and/or the variability estimates just to see how these variables affect the final result (i.e., the sample size estimate). This can be an especially useful exercise when there are tradeoffs that must be considered (e.g., when the cost per survey administered is significant).

One word of caution: The estimate given to you by the statistician is the number of subjects from whom you need valid data. This number is going to be less than the number of people you actually approach with the survey, because some will fail to respond and some may respond inappropriately and their data will not be usable. Referring to the example above, if the statistician tells you that you’ll need 65 cardiology patients and 65 obstetrics patients, and you know, based on past experience, that the non-respondent rate is 25%, you want to send your survey to 88 cardiology patients and 88 obstetrics patients in order to receive 65 responses from each group. Hopefully, if your survey is well-designed, all of the responses you receive will be valid...but that’s another issue.

The rationale is pretty much the same for any power analysis, though I've given you a fairly straightforward and simple example. The calculations can get "hairy" once you have more than two comparison groups, for example, but there are computer programs to help with that, and statisticians generally know this area pretty well.

The best source of information about power analysis and sample size estimation is Jacob Cohen’s book, Statistical Power Analysis for the Behavioral Sciences (Erlbaum). First published in 1969, revised, and published again in a second edition in 1988, this book is still considered the "Bible" among those who do power analysis. A short, highly readable, basic treatment of the subject, which may suffice nicely for the simpler power analysis problems, is found in the book, How Many Subjects? by Helena C. Kraemer and Sue Thiemann (Sage Publications, 1987). Finally, for those who feel confident doing their own power analyses without the guidance of a statistician, there is some excellent software available. nQuery Advisor, from Statistical Solutions Ltd., does a power analysis for almost any research design situation. It costs several hundred dollars, but is certainly worth the price for those who must do these analyses quite often. For more information about nQuery Advisor, contact the company’s Boston office at 1-800-262-1171, or visit their web site (http://www.statsolusa.com).

Go to STC Society Web Site