## How many participants do you need for your survey?

Many of my clients ask me about “statistical significance”. The main thing they want to know is “How many participants do I need in order to get statistical significance?” I usually counter with an annoying question. “How accurately do you need to know the answer?

A common misconception is that in order to achieve “statistical significance”, your sample size should increase with your population size. That is, if I have 1 million customers, I will need to survey many more people than if I had only 1 thousand customers. This is somewhat true, but the dependence is dramatically less than what most people think.

In fact, there are two primary factors that affect how big your sample size should be. The *confidence level* , and the *confidence interval* you are shooting for.

Let’s say you are doing a survey asking if people like to dunk their cookies into milk before eating them (I know I do – or would, if I still ate sugar. Or dairy. But that’s another story).

In your survey, you find that 70% of the people surveyed say “Yes! I love dunking my cookies!”. A common way you might describe this is would be to say “70% (with 95% confidence +/- 5%)” . This translates into “If we repeated this experiment lots of times, we would get an answer that falls between 65% and 75% , 95% of the time.

Obviously, if you wanted to know about the people in a metropolis of 50 *million*, you would need to talk to WAY more people than if you wanted to know about a town of only 50 *thousand* right?

Wrong.

Let me illustrate with this riveting table:

Population size |
Confidence level |
Confidence interval |
# of people you need to survey |

50,000 |
95% |
5% |
381 |

500,000 |
95% |
5% |
384 |

5,000,000 |
95% |
5% |
384 |

50,000,000 |
95% |
5% |
384 |

Whoa! To get the same level of confidence for a population 100 times the size of 50 thousand, you need **only 3 additional people**!

It turns out that the biggest variable is that +/- number. The “*confidence interval*“. So if you are OK with knowing something to within only +/- 10% you need much fewer people than if you need to know to +/- 5%.

Let’s look at the effect of changing the confidence interval.

Population size |
Confidence level |
Confidence interval |
# of people you need to survey |

50,000 |
95% |
1% |
8057 |

50,000 |
95% |
5% |
381 |

50,000 |
95% |
10% |
96 |

50,000 |
95% |
15% |
43 |

So you can see that the degree of precision you require is by far the dominant factor in determining the sample size. If you want to go from 5% to 1% confidence, you need 20 TIMES more people.

I know you’re skeptical. That’s OK, I don’t take it personally. Go play with this sample size calculator and you’ll see what I mean. I’ll wait here…

OK you’re back.

Now that I’ve hopefully convinced you, you have a difficult question to answer next time you are thinking of doing some research.

**How accurately do you need to know the answer to your question? **

You will never know something with 100% certainty. So the question is, what is the lowest level of precision you need in order to make a business decision? How much uncertainty can you live with? Are you willing to spend 20 times as much on your research to go from a confidence interval of 5% to 1%? Something to discuss at your next meeting.

More on confidence intervals:

Khan Academy has a good overview:

Wikipedia is of course very thorough, but frankly, if you can understand this entry, you didn’t need to read this post. The section on interpretation and misunderstandings is very good though.