Monday, May 13, 2013

Safety Through Control

Safety through Control

An Event
A true event from the Civil Aviation Daily Occurrence Reporting System, (CADORS),  in Canada. 
Two separate companies flying between the same cities in Canada. Both companies compete to fly passengers from one city to the other with the slogan “We’ll have you home before dinner time.” One company, we shall call Dart Air, has two crews to fly this route. Dart air management pressures the crews to do all that they can to reduce the flying time. The management pressure on the crews appears to be working since the times from wheels up to wheels down has been steadily reducing. Dart air records the flight time in the run chart below:

Things look good.
As evidences by the run chart, the fly time is being reduced by the crews. Managements pressure is paying off. The pilots and crews were praised by the progress. The companies record against the competition also evidenced a steady reduction in flight times. 
The Incident.
On the last Friday, as illustrated on the run chart, one of the flights ran off the end of the runway. The CADOR revealed that there were no injuries but, the plane sustained moderate damage to the landing gear. An internal investigation was initiated by Dart air. Interviews of both pilot teams revealed that the two crews were engaged in a competition to see who could fly the route the fastest! The pressure from Dart air management fostered and encouraged the competition which ended in a near disaster. It was not until the overrun and near accident that revealed the unsafe condition that was present in this route. You might say, we have an excellent reactive process.
Swiss cheese
The model.
The Swiss Cheese model of accident causation is a model used in the risk analysis and risk management of human systems, commonly aviation, engineering, food industry and healthcare.  It likens human systems to multiple slices of swiss cheese, stacked together, side by side. The swiss cheese model is sometimes call the cumulative act effect. Reason hypothesizes that most accidents can be traced to one or more of four levels of failure: Organizational influences, unsafe supervision, preconditions for unsafe acts, and the unsafe acts themselves. In the Swiss Cheese model, an organization's defenses against failure are modeled as a series of barriers, represented as slices of swiss cheese. The holes in the cheese slices represent individual weaknesses in individual parts of the system, and are continually varying in size and position in all slices. The system as a whole produces failures when all of the holes in each of the slices momentarily align, permitting (in Reason's words) "a trajectory of accident opportunity", so that a hazard passes through all of the holes in all of the defenses, leading to a failure.

The Swiss Cheese Model to Accidents

The swiss cheese model suggests that several latent failures lined up to allow condition to be right for an accident. The question can be posed, what if we can catch the latent failures before the alignment. In other words, close the hole in even just one of the swiss cheese barriers. The conclusion would have to be drawn that the accident could then be prevented. The use of “control” methods in each of the sub process would accomplish this same prevention

Traditional Safety Management Systems
What’s wrong with the traditional safety management system? The answer is simply nothing. However, the problem occurs with the events we choose to “analyze.” Most safety management systems analyze problems discovered by the quality control system. The quality control system “discovers” problems that are then analyzed for root cause in the quality assurance system. This relegates the safety management system to a system of “failure analysis” acting on incidents and problems that have already occurred. 

The flowchart is a graphic of the traditional safety process. Some would argue that the safety management system does act “proactively” on identifying “potential hazards” in the proactive reporting system. Of course identifying hazards proactively is better. But, what if we can have a system that identifies potential hazards before they become hazards. That is where the concept of “control” comes in. 
Dr. Walter Shewhart stressed that bringing a production process into a state of Statistical control, where there is only chance cause variation, and keeping it in control, is necessary to predict future output and to manage a process economically. Dr. Shewhart created the basis for the control chart and the concept of a state of statistical control by carefully designed experiments. While Dr. Shewhart drew from pure mathematical statistical theories, he understood data from physical processes never produce a "normal distribution curve" (a Gaussian distribution, also commonly referred to as a "bell shaped curve"). He discovered that observed variation in manufacturing data did not always behave the same way as data in nature (Brownian motion of particles). Dr. Shewhart concluded that while every process displays variation, some processes display controlled variation that is natural to the process, while others display uncontrolled variation that is not present in the process causal system at all times. It is the uncontrolled variation that must be identified and mitigated. This can be accomplished statistically, visually or experientially. 

Probably the greatest contributor to the concept of “control” is my hero Dr. W. Edwards Deming. In 1927, Deming was introduced to Walter Shewhart of the Bell Telephone Laboratories by Dr. C.H. Kunsman of the United States Department of Agriculture (USDA). Deming found great inspiration in the work of Shewhart, the originator of the concepts of statistical control of processes and the related technical tool of the control chart, as Deming began to move toward the application of statistical methods to industrial production and management. Shewhart's idea of common and special causes of variation led directly to Deming's theory of management. Deming saw that these ideas could be applied not only to manufacturing processes but also to the processes by which enterprises are led and managed. This key insight made possible his enormous influence on the economics of the industrialized world after 1950.
Deming edited a series of lectures delivered by Shewhart at USDA, Statistical Method from the Viewpoint of Quality Control, into a book published in 1939. One reason he learned so much from Shewhart, Deming remarked in a videotaped interview, was that, while brilliant, Shewhart had an "uncanny ability to make things difficult." Deming thus spent a great deal of time both copying Shewhart's ideas and devising  ways to present them with his own twist.
Deming developed the sampling techniques that were used for the first time during the 1940 U.S. Census. in 1947, Deming was involved in early planning for the 1951 Japanese Census. The Allied Powers were occupying Japan, and he was asked by the United States Department of the Army to assist with the census. While in Japan, Deming's expertise in quality control techniques, combined with his involvement in Japanese society, led to his receiving an invitation from the Japanese Union of Scientists and Engineers (JUSE).
JUSE members had studied Shewhart's techniques, and as part of Japan's reconstruction efforts, they sought an expert to teach statistical control. During June–August 1950, Deming trained hundreds of engineers, managers, and scholars in Statistical Process Control (SPC) and concepts of quality. He also conducted at least one session for top management.Deming's message to Japan's chief executives: improving quality will reduce expenses while increasing productivity and market share. Perhaps the best known of these management lectures was delivered at the Mt. Hakone Conference Center in August 1950.
A number of Japanese manufacturers applied his techniques widely and experienced theretofore unheard of levels of quality and productivity. The improved quality combined with the lowered cost created new international demand for Japanese products.

Process Control: The Key to Safety
Dr. Deming states that everything is a process. All functions in our companies can be identified by a process. The process model shows the continuous improvement cycle. You could easily substitute the quality management system with the safety management system. The measurement and analysis portion of the QMS is engine that runs control of processes. Instead of analyzing incidents and failures, the system monitors “variation” of the processes that were used to realize the product. 

Product realization can also mean service realization. All process have outputs. These outputs can be monitored numerically. Numerical outputs can be then be monitored and improved by using control charts. 
Control Charts
A control chart consists of:
  • Points representing a statistic (e.g., a mean, range, proportion) of measurements of a quality characteristic in samples taken from the process at different times [the data]
  • The mean of this statistic using all the samples is calculated (e.g., the mean of the means, mean of the ranges, mean of the proportions)
  • A center line is drawn at the value of the mean of the statistic
  • The standard error (e.g., standard deviation/sqrt(n) for the mean) of the statistic is also calculated using all the samples
  • Upper and lower control limits (sometimes called "natural process limits") that indicate the threshold at which the process output is considered statistically 'unlikely' and are drawn typically at 3 standard errors from the center line
The chart may have other optional features, including:
  • Upper and lower warning limits, drawn as separate lines, typically two standard errors above and below the center line
  • Division into zones, with the addition of rules governing frequencies of observations in each zone
  • Annotation with events of interest, as determined by the Quality Engineer in charge of the process's quality
Control Acts on Variation Rather than Event

The control chart is a powerful tool for monitoring variation in a process.  The chart allows you to determine when variation is simply due to random (common cause) variation or when the variation is due to special causes.  How does a control chart tell if only common cause variation is present? This is determined by the data itself.  Using the data, we compute a range of values we would expect if only common cause variation is present.  The largest number we would expect is called the upper control limit.  The smallest number we would expect is called the lower control limit.  In general, if all the results fall between the smallest and the largest number and there is no evidence of nonrandom patterns, the process is in statistical control, i.e., only common cause variation is present. 

A control chart tells you if your process is in statistical control (i.e., only random variation is present) or if your process is out of statistical control (i.e., special cause variation is also present).  To be able to determine which situation is present, you must be able to interpret the control chart.  The most important factor in finding the reason for a special cause is time.  The faster the out of control situation is detected, the better the odds of finding out what happened. This is why control charts are very powerful tools for all associates.  If associates are keeping the charts, they can immediately begin to look for the reasons for out of control points when they appear on the chart. Control charts are based on a general model.  The general model is that the control limits are ±3 standard deviations from the average. The control limits are sometimes called 3 sigma limits. 

Dart Air Revisited
The Incident Again.
Dart air management was praising the job of the crews in reducing the flight time between cities. The run charts reveals a steady drop in flight times. It was obvious that the crews were coming up with ways to make the flight much more efficient. This trend continued until one of the flights ran off the runway. The CADOR report stated excess landing speed as a contributor to the incident. Upon analysis by the company’s own personnel, it was revealed that two crews had a competition with each other on who could reduce the flight time best.  Unfortunately after the fact, we applied control limits using the data produced by the run chart. 
By applying control limits to the run chart, we can see that the Monday before the incident occurred the chart went “out of control.”  If Dart air monitored the flight time with a control chart rather than a run chart, the chart would have alerted company management that something was “abnormal.” with the flight time. Yes, the abnormal situation would have been “before” the incident and thus could have prevented the plane from running off the end of the runway. 

Variation vs. Failures

The Dart air case is a perfect illustration of the power of “control.” When a company uses control tools such as control charts, we see management acting on “variation” rather than failures. There is also an added benefit to the company’s safety management system. When a process is monitored using statistical methods, then that process can be exempt from the audit program freeing up audit resources to audits other areas.

The question arises; what if a process does not produce numerical outputs? Remember the key is “variation.”  Processes must be governed by procedures that define what is normal. Anybody can see if something is “abnormal” if and only if they know what normal is in the process. Some examples of “variation” or “abnormality” is a doors that doesn’t close as usual, an unusual oder in an area, a discoloration that is not normal, a crew member that is acting differently, an unusual reading of an instrument. It is important to note that these examples are not failures but simply “variations” from what is defined as normal. By acting on variation we prevent the incident or accident from occurring. It should be noted that no one can predict a truly catastrophic failure. But, by looking at historical data, truly catastrophic failures are rare. Most accidents occur as a result of processes that went “out of control.” 

Applying Controls and Swiss Cheese

Monitoring the variation from each process will effectively “close” the hole in that process, thus rendering the swiss cheese model more like provolone! There are many presentations of the controls that can be applied to processes. We will examine the main forces effecting a process. These forces were defined by Dr. Deming. Each of these forces have forces that act upon them. It is up to the analyst to decide what controls should be placed on these forces.

These interactional forces can only be controlled by management of an organization. Often considerable monetary investments must be made to change or enhance these factors. These factors should no be changed without the “Profound” knowledge of the process. Management does not have the profound knowledge to change these factors. As stated by Dr. Deming, “Only the person or persons doing the job have the knowledge needed to change it. Management must consult the process stakeholder before making a change to a process. Doing otherwise is ‘Tampering’ with the system
”. The team approach using tools to direct the consensus, is a powerful process improvement activity

When we say act on variation, what do we mean? Variation is any event, circumstance, action or routine that is “abnormal” as defined by the standard. The established process has already identified the inputs, e.g. methods, material, machines, environment, and people. In the day-to-day operation or running of the process is there evidence of abnormality or variation that is beyond the normal variation of the process.  An illustrative example: You get into your car in the morning and it usually starts right up. But, today it took 3 tries to get it started. Note that. Does it happen again? this may an indication of a latent problem. A worker is always on time, but today he or she was late. Is it a one time occurrence or do you note that they are coming in later several days. The oil pressure reading on an engine is lower than normal but, within tolerance. Record this and note. Does it happen again? if so, analysis and corrective action may be needed. A door closes properly. Today you notice that it was harder to close. Is this an isolated event? Does it happen again? note it as variation. 
The establishment of a standard for all processes gives us limits of variation. These limits of variation enable us to define “control” within the process. Keep in mind that establishment of limits, whether statistical or experiential, gives us the means to analyze on variation rather than analyze the failure(s) that have occurred to cause an accident.

Your Thoughts............


  1. I have to add to the discussion. Safety Through Control will be discussed at a great symposium in September at DISNEY WORLD Orlando FL. In fact, Dr. Bill McNeese, creator of the SPC for Excel program and Ms. Diane Ritter who worked with Dr. W. Edwards Deming and co-founder of GOAL/QPC will be presenting. goto: or click on Disney Symposium.

  2. See these strategies demonstrated in our workshops:

  3. If a control process isn't sufficiently secure from cyber threats then it can't be regarded as adequately safe, and there's a very clear implication here that the security lifecycle needs to be managed appropriately. You will also learn about security clearance approach


Teamwork Simplified

 Teamwork Simplified By Catalina9 I n aviation, both airlines and airports, teamwork is the foundation for an organization to function withi...