choosing sample size

Do I Really Need a Sample Size of 30?

Through my side-gig of teaching online, I get the tremendous pleasure of discussing some of the finer points of quality engineering, business analysis and career management with professionals from around the world. I recently heard from Kristen with a question about the appropriate sample size when performing a capability study.

Following a lecture in my course titled “An Introduction to Reliability Engineering” about drawing samples of wire rope cables, and then testing them to failure to estimate the mean and dispersion of the tensile strengths within the population, she asked,

“You mentioned needing a sample size of at least 30. Could you expand on why for example a sample size of one or two parts tested to failure to measure cable tensile strength is not enough to estimate population cable tensile strength mean and standard deviation, assuming the population is normally distributed?

What an excellent question! And one I hear regularly. I answered,

“Hello Kristen,

Just to lay some groundwork, the 30-piece minimum sample size is a rule of thumb used to minimize the standard error of the parameter you’re estimating.

Secondly, you can indeed calculate mean and standard using two samples. The math works. The question is, how useful are those estimates?

Quick lesson on Standard Error of the Mean: SEM is the standard deviation of the sampling distribution of your population mean, and is calculated as follows: SEM = stdev / sqrt(n), where n is the sample size.

When considering the mean of a data set, say you have two data sets both with a mean of 100 and a standard deviation of 10. But the first data set has two samples and your second has 50 samples. Using the formula above, SEM1 = 7.07 and SEM2 = 1.41.

Both data sets have the same mean and standard deviation, but the range in which the true population mean is found is far wider in data set 1 than 2.

A related topic that may be worth your further investigation is the “confidence interval” or CI. CI uses SEM but then multiplies it by a Z score associated with a specific confidence interval like 90%, 95%, etc.

Using the CI approach, you can then say for instance that the mean of the population is between x1 and x2 with a 95% confidence interval.

Roughly speaking, with a sample size of 30, the 95% confidence interval for the mean is 1/3 of the standard deviation … A level generally regarded as small enough for the estimate to be useful.

One additional note: Drawing 30 samples from a population does not guarantee you will get a useful estimate. The sampling plan (sub-grouping, frequency, randomness, etc.) can have a substantial impact on the quality of your parameter estimates even if you pull more than 30 samples.

Given your interest, I would recommend you investigate these topics at more length … they will certainly broaden your understanding of inferential stats.

Hope this helps … let me know if I can be of any other assistance to you.”

To learn more about the fundamental tools and concepts of reliability engineering, sign up for my low-cost, online course titled, “An Introduction to Reliability Engineering

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

New Course: Foundations in Statistical Decision Making

Learn the foundations of Statistical Decision Making.
SHOW ME THE NEW COURSE
close-link
Click Me