Tool Time: Comparing Data Sets Using Box Plots and ANOVA Analysis
box plots | analysis of variance | ANOVA analysis | Six Sigma tools | Gene Rogers
As a Six Sigma practitioner, one of the toughest jobs is deciding on what tool to use and when to use it. How would you know if you need to use box plots or analysis of variance (ANOVA)? Would you have to look these up in a quality handbook or online forum? If these Six Sigma tools sound like foreign concepts to you, tune into this Profit through Process Interactive webcast.
In this session, Genna Weiss of Six Sigma & Process Excellence IQ speaks with Gene Rogers, continuous improvement manager at Baker Hughes, Inc., who shows how to use box plots and ANOVA to evaluate data. Using data from the Division I College Baseball World Series, Gene explains how and when to use these important statistical tools.
-
Lean Six Sigma: You're So Money! Making Cash Flow from Process Improvements -
Have You Found the "Root Cause" Yet? -
Process Excellence and Lean Thinking at British Gas -
Evolution of Continuous Improvement Behaviors -
Switched On: Leadership Lessons on Getting Ahead as An Entrepreneur -
Rules to Follow for Effective Customer Feedback Collection -
How Starwood Hotels & Resorts Avoided the 7 Year Itch with Its Six Sigma Program -
Transforming the NHS -
RWE Power International Integrates Data Quality Management with Business Process Improvement for Effective Decision Making -
The Back Story Behind Citi's Business Process Reengineering Program
* = required.
[In the country of the blind the one-eyed man is king.” Erasmus]
“Tool Time” – I like the title and concept. But I think there is much to be wary of here:
• Notwithstanding the purpose was to explain and compare the use of two methods of data analysis (or data comparison, if you prefer), the purpose of analysis is to gain understanding – usually to predict (if not control) the future. In that regard context is important. If we want to have some basis for predicting future winners of baseball games by examining game scores, shouldn’t we use the runs scored in all games played, and not just those of the games won? Better yet, maybe we should examine the differences in runs scored between the teams listed and their opponents. After all, using only the runs scored in their winning games biases the analysis, and using only the winning teams’ runs in all games played doesn’t take into account the effect of the opposing teams – another source of bias. In addition, since the games are played over some period of time, and each team’s performance will vary over time, we should examine each team’s scores (or again, each game’s difference in runs scored) over time. So the proper tool to use first in this instance is some form of Statistical Process Control. If no significant effects over time are found (patterns inside the Control Limits), then we may consider box plots and ANOVA.
• Regarding ANOVA, the speaker is only half right. Besides Comparison of Means, ANOVA is also used for just what its name says: analyzing the overall variance into its separate components and determining which are significant. This is the case when the data represent elements of each factor (e.g., different lots of manufactured goods, or different operators) randomly chosen from a “universe” (of lots or operators) and we wish to determine which factors’ impacts are significant.
• Box plots and histograms both show the distribution of data – they just use different perspectives. And while box plots explicitly locate the median and mean, the histogram will show attributes of the data set the box plot won’t: e.g., bimodality, truncation, bifurcation. Use both together. And multiple histograms can be displayed just as easily as multiple box plots using Tufte’s Principle of Small Multiples.
• The side-by-side display of the box plot and histogram in the talk cannot be the same data set – unless the histogram is upside-down.
• Visual inspection of the five box plots shows their means and interquartile ranges to be so close as to render a guess as to which team is better (or will win) highly hazardous – as later confirmed numerically by the ANOVA. Stating otherwise misleads the viewers.
• Regarding significance: The risk level (alpha) of a Type I Error (“false positive”) should be set before data are gathered – certainly before any analysis is done – to prevent biasing the conclusion. If the p-value were, say, 0.07 and I then compared it to an alpha of 0.10 and declared significance, as opposed to an alpha of 0.05 and declared “not significant” – what has changed? Certainly not the data! [And alpha should be chosen with a clear understanding of the consequences if you do declare an effect significant when it really isn’t.]
• The speaker’s comment that a histogram shows the number of data in each bin, while correct, is misleading. The number of bins used is not fixed, but based on a “rule-of-thumb”…a guide. Good software lets the user adjust this parameter, so the number of data in each bin signifies nothing about the data set itself.
• Finally, since most improvement of processes occurs over time, and all processes vary over time, if you sample more than, say, 20 data each time, consider plotting the metric using a series of box plots…over time. After all, if you use an Xbar-R SPC chart, you are representing quite a lot of data with only two statistics each sampling interval and may be missing important attributes of the process. [I even use a dot plot over time if I have 10-30 data or so each sampling period.]
|
Nice and useful talk, thanks for very informative video by Gene Rogers |
Thanks. This was helpful. Especially, using both tools to tell the full story. |
Thanks a lot for the explanation on ANOVA and hypothesis testing. |
Thanks a lot. Simple but very useful explanations. |
-
Process Excellence Regional Forum - Benelux
Hotel Le Plaza, Brussels, Belgium
October 26- 27, 2010 -
Lean Six Sigma Summit
Buena Vista Palace Hotel & Spa, Orlando, FL
January 17- 20, 2011 -
Business Process Excellence in Financial Services
Hilton Tower Bridge Hotel, London UK
September 21- 22, 2010 -
Business Process Excellence Congress
Venue to be confirmed, Johannesburg, South Africa
February 21- 24, 2011
-
Recipe for Success with Six Sigma in Healthcare
As I was waiting for my oil change, I flipped through a magazine and came across a photo of perfectly baked...Read more
Anantha Kollengode

Replies (0)