Notes from Chapter 2 of Understanding Variation: The Key to Managing Chaos, by Donald J Wheeler
The chapter is titled:"Knowledge is Orderly and Cumulative". Here Wheeler discusses three approaches to interpreting data, and the benefit of the control chart.
Two people may have access to the same set of data, but they can arrive at different conclusions. The reason is because they may have different interpretation processes.
Not all interpretation processes are valid. Wheeler describes two problematic approaches, and then discusses Shewhart's approach.
The Specification Approach
The first approach is 'comparison to specifications', or the Specification Approach. In this approach, data is interpreted in comparison to specifications, and its 'goodness' or 'badness' is determined based on how well it conforms to specifications.
For example, management may demand that the factory produce 1,000 units per month (the specification). If the factory produces 800 units in one month, that figure is interpreted as 'bad'.
Comparison by specification leads to binary results: either data conforms or it does not. it also leads to sudden changes in goodness or badness. A process could be in good graces one month, and in bad graces in another.
But the biggest problem with this approach is because specifications are the 'voice of the customer'. This is what the 'customer' wants. The process has nothing to do with it. The voice of the process is what the process is able to achieve. If the voice of the customer and the voice of the process is not aligned, the result is people distorting the system or distorting the data in order to meet the specification on paper.
Nevertheless, the voice of the customer is important because it tells you when you're in trouble.
The Average Value Approach
The Average Value approach compares the goodness of data against the average value of the data. "Why is sales 10% below last year's average?". This approach to interpreting data leads to pathologies similar to the Specification Approach -- people tend to distort the system or distort the data so they don't have to explain why the data is too far away from the average.
At least the Average Value approach uses the voice of the process (after all, the average value is produced by the process, and not some arbitrary number).
Shewhart's Approach to Interpreting Data
Walter Shewhart was the first to define the voice of the process. He called this a "control chart."
A control charts has time in the X-axis, has a central line, has two control limits, one on either side of the central line, both the same distance from the central line. The distance of the control limits is determined from the data, from the voice of the process. The values plotted can be raw data, or some value calculated from the raw data.
The control chart helps interpret the data by characterising the behaviour of the data and allowing us to predict the future behaviour of the data.
The control chart shows that there really is no distinction between good outcomes and bad outcomes; they both came from the same process.
A process that is in control is predictable. In other words, we can predict its behaviour in the future and make plans based on that prediction.
The Second Principle for Understanding Data
Shewhart's second principle for understanding data is:
While every data set contains noise, some data sets may contain signals. Therefore, before you can detect a signal within any given data set, you must first filter out the noise.
(His first principle is introduced in Chapter 1. It's: No data have meaning apart from their context.")
The two mistakes that could be made in analysing data is treating random variation as a meaningful departure from the past, and the second mistake is not recognising when data is a sign of change.
Attempting to avoid the first mistake by not reacting to variation can cause you to miss true signals of change, and thus make the second mistake. Attempting to avoid the second mistake by reacting to every signal causes you to make the first mistake
The key benefit of a control chart is it identifies for you data that is noise, and data that is signal.