Sign Up | Already a member? Sign In
Six Sigma IQ

Explaining the Total Degrees of Freedom for Six Sigma Practitioners

Contributor: E. George Woodley
Posted: 06/08/2009  5:36:00 PM EDT  | 
4

Share |

Rate this Article: (4.9 Stars | 120 Votes)
  Thanks for your rating!


We, as statisticians, Six Sigma Belts and quality practitioners have utilized the term degrees of freedom as a part of our hypothesis testing, such as the t-test for comparison of two means and ANOVA (Analysis of Variance), as well as confidence intervals, to mention a few references. I can recall from the many classes I have taught from Six Sigma Green Belts to Six Sigma Master Black Belts inclusive, that students have had a bit of a problem grasping the whole idea of the degrees of freedom, especially when we describe the concept of the standard deviation: …the average distance of the data from the MEAN…1 By now, Six Sigma practitioners should have a comfort level with concepts like the MEAN; which is calculated by taking the sum of all the observations, and dividing by the number of observations (n). The total degrees of freedom are then represented as (n-1).

Defining Degrees of Freedom

One method for describing the degrees of freedom, as per William Gosset, has been stated as, “The general idea: given that n pieces of data x1, x2, … xn, you use one ‘degree of freedom’ when you compute μ, leaving n-1 independent pieces of information.”2

This was reflected in the approach summarized by one of my former professors. He stated that the degrees of freedom represented the total number of observations minus the number of population parameters that are being estimated by a specific statistical test. Since we assume populations are infinite and cannot be easily accessed to generate parameters, we rely on samples to generate statistical inferences that provide estimates of the original population, provided the sampling techniques are both random and representative; another discussion for later.

This may seem very elementary, but from my own experiences, degrees of freedom have not been the easiest of concepts to comprehend—especially for the novice Six Sigma belt. A definition that can also be representative of the concept of the degrees of freedom can be summarized as “equal to the number of independent pieces of information concerning the variance.”3 For a random sample from a population, the number of degrees of freedom is equal to the sample size minus 1.

A Degrees of Freedom Numerical Example

A numerical example of this approach might illustrate this. The values would reflect the actual observations of a given data set. This example is simply for illustration purposes only. Given that we have eight data points that sum up to 60, we can randomly and independently assign values to seven of them. For instance, we may record them as: 8, 9, 7, 6, 7, 10 and 7. The seven values would have the freedom to be any number, yet the eighth number would have to be some fixed value to sum up to a total of 60 (in this case the value would have to be 6). Hence, the degrees of freedom are (n-1) or (8–1) = 7. There are seven numbers that can take on any value, but only one number will make the equation (the sum of the values, in this case) hold true.

One may argue that although this seems to be a simplistic illustration, the data collected for the original seven readings are not really independent, in that they are representative of an existing process, and depend on what the observations are in reality. Furthermore, we would have to know from the beginning what the final value was—in this case 60. Even though this illustration attempts to explain the theory behind the degrees of freedom it can be more confusing than obvious.

My “Easy” Way of Describing Degrees of Freedom

What really inspired me to write this article about the impact of the degrees of freedom was a conversation I had with my wife. She was heading to her class, and she called me and asked if I had an “easy” way of explaining the degrees of freedom. I gave her the description for describing the degrees of freedom I use in my classes:

Since statistics deals with making decisions in the world of uncertainty, we, as statisticians, need to provide ourselves with a cushion or padding to deal with this uncertainty. It can be viewed as the greater the sample size, the more confident we can be with our decisions. For example, when we estimate the variance of a normal distribution, we divide the sum of the squared deviations by (n-1). Hence if we have a sample size of 5, we are dividing by 4. This provides us with as a cushion of 20 percent. In fact, we are overstating the variance by a factor of 20 percent. If, however, our sample size is 100, we would be dividing by 99 percent. Here we are only overstating the variance by a factor of 1 percent.

This explanation places the emphasis on a common statistical concept that the larger the sample size, the more confident we can be of our estimates and decisions. To summarize this idea in a slightly different way—as long as our sampling technique is random and representative, the likelihood that we have a good estimator of a parameter increases as the sample sizes increase.

I have attempted to address the various approaches to the degrees of freedom and hopefully my simplistic approach to the rationale behind what we are trying to accomplish can shed some light on future explanations of such a vital part of statistical analysis.

References
1Gonick, L. and Smith, W. (1993),
The Cartoon Guide to Statistics, Harper Collins Publishers, pg. 22
2Breyfogle, Forrest W. III (2003),
Implementing Six Sigma, John Wiley & Sons, pg. 1105
3Upton, Graham and Cook, Ian (2002),
Dictionary of Statistics, Oxford University Press, pg. 100
E. George Woodley Contributor: E. George Woodley

Share |


* = required.

Not a member yet? Sign up
User Name:
Password:
View Profile
  Report Abuse  
dhallowell 10/22/2009 10:46:06 PM EDT

I have found this way of getting across the idea of degrees of freedom when I teach.... in, say, a c;lass pf 20 people. I ask them to imagine they each thought of a number and wrote it down secretly. At that moment, they have 20 interesting and independent 'pieces of information' -- ie degrees of freedom that I might be curious about -- so curious I'd be willing to buy each number for $5. BUT before buying, what if I convinced them to let me leave the room for a minute and have them compute and give me the average of their numbers for free. Now I start buying numbers -- do I need to buy them all? -- when can I stop? They get that I wouldn't need to buy the last number -- I'd already 'know it' -- as the mean gave me one of their 'degrees of freedom'-- it took something that was unknown and made it 'determinable' by a quantity computed. It'e easy to expand from there to other related analogies, but I find this anchors the idea for many people..
Replies (0)

View Profile
  Report Abuse  
jamesvergo 06/17/2009 12:22:09 PM EDT

I look at degrees of freedom as a fudge factor; one that allows you to estimate a worst case senario.
Replies (0)

View Profile
  Report Abuse  
Pval 06/10/2009 11:32:36 AM EDT

Good article. Is there a way to reach you for a private question? iyke14@yahoo.com
Replies (0)

View Profile
  Report Abuse  
qi2@qualityi2.com 06/09/2009 1:01:57 PM EDT

Very helpful and well-written article, George. Thanks. One other way I try to explain d.o.f. in the context of an ANOVA, say, for an L8 DOE with 7 main effects, is that we have 8 trials and 8 (n) bits of information (temporarily ignoring any repetitions or replications here). We end up with 7 (n-1) comparisons (level 1 vs. level 2 for each factor), plus the overall mean, totaling 8. Then I say that one of the degrees of freedom is thus reserved for the mean (or the grand average line if I am showing the idea with graphs). Mike White
Replies (0)


Post a Comment
Sign in or Sign up to post a comment
Advertisement

You Might Also Like

Advertisement