I can’t tell you how many times I’ve heard, “We don’t have the data for this. I guess we’ll need to make an educated guess.” In the lean Six Sigma engagements I work on, my response to this is, “Let’s create the data.” Without fail, I get the deer-in-the-headlights stare for a couple of seconds until I explain what I’m proposing.
Below are some of the tools you can use to deal with situations when there are no data.
Once you have created a value-stream map or just a regular process map, you should have identified the primary metrics to capture. Most metrics deal with time, quality, and cost across the value stream. Metrics—e.g., queue time, rework, defects, processing time—often are not captured in a system. Although you have a few ways of collecting the data, such as asking employees performing the work for their best estimate, the data are likely to be biased. You can also do what I do: Undertake a data collection exercise with employees for 10 to 15 days.
I work with employees to create simple data collection forms that they will use to keep track of when an activity starts, when it ends, the type of activity, when the work was received, and defects and types of defects. Two elements are critically important when you design these data collection forms: make it as simple and the least time-consuming as possible for employees to complete; and ensure that all the details are captured so you can separate the different types of activities. Where possible, I ask employee to fill in an electronic PDF or web form; however, you must balance that against the time required to fill in the form.
I also select, through a random, stratified sample of employees, whom we will collect the data from. The stratification includes factors such as experience, employee level, type of work that the employees perform (e.g., some employees may work on only dedicated files), language (if relevant), age, and shift (if applied). I also randomize the days on which the data will be collected to ensure that variations in demand and work type are well represented.
Typically, within 10 to 15 days during a one-month period, I am able to capture more than 5,000 data observations with consistent data across employees, divisions, and regions. As a verification check, I often input the data collected into a process simulation model and compare the model outputs to actual output numbers. I can be pretty confident about the data collected if the variation between the model and actual outputs is between 1 percent and 4 percent.
Reverse engineering, or backward induction, tends to take a few iterations and requires experience to obtain a good approximation of the numbers. Here is how my reverse-engineering method works.
1. I conduct a working session with employees during which we go through the amount of time they take to do various activities, the defect rates, wait time, etc. I ask employees to provide estimates times, number of defects, etc., for an average day, a low workload day, and the highest workload days.
2. I enter those data (usually a triangular distribution, i.e., minimum, mean, and maximum) into a value-stream simulation model and estimate some key performance metrics, output, and production numbers. If you know the actual distribution type, such as an exponential distribution or log normal (which are the most common), then you can enter the estimates required for these types of distributions. (Obtaining exponential and log linear values from employees might be difficult, though.) The model numbers are then compared to the actual numbers for various days and months throughout the year.
3. I then assess the data to determine which estimates are likely off. This is where experience is required. I check the model analytics to see if, for example, the resources are idle, unproductive, or at capacity; or if bottlenecks occur and work accumulates, to determine the most likely spots where the data estimation may be wrong.
4. I verify the model analysis with the employees, revise the estimates, and repeat steps two, three, and four.
These steps are completed until the model outputs are close to the actual output values.
Keep in mind that there are many other types of data collection techniques available, such as surveys, focus groups, file reviews. Next time when someone says we have no data, try saying, “Let’s create it,” and use the methods I have proposed.
This article was first published July 11, 2012, on the Toppazzini and Lee website.