What A Healthy SMS Looks Like
By OffRoadPilots
After several years of operating with a safety management system (SMS), an SMS enterprise should be operating with zero regulatory findings. The accountable executive (AE) should have full control over the path their SMS has taken in the past and established a vision in their SMS policy of what to expect in the future. The are three regulatory compliance principles for a successful safety management system. The accountable executive is responsible for compliance with all regulations, the certificate holder (CH) is responsible for the quality assurance program (QAP), the person managing the safety management system (SMS manager) is responsible for monitoring concerns that the aviation industry has about your airport. A healthy SMS includes a risk management officer (RMO) position. Risk management is what makes a safety management system a healthy SMS within a fluid environment and ever-changing priorities.
The duties of a risk management officer are often assigned to an SMS manager when the CH appoints a person to managing their SMS. The person managing the safety management system shall identify hazards and carry out risk management analyses of those hazards. Other duties assigned to an SMS manager are to maintain a reporting system, investigate, analyze and identify the cause or probable cause of all hazards, incidents and accidents, maintain a safety data system, by either electronic or other means, to monitor and analyze trends in hazards, incidents and accidents, monitor and evaluate the results of corrective actions with respect to hazards, incidents and accidents, monitor the concerns of the civil aviation industry in respect of safety and their perceived effect on the your airport, and determine the adequacy of the training required. These responsibilities which are assigned by the regulations to an SMS manager are extremely labor intensive, research intensive, data collection intensive and comprehension intensive. There are not enough hours in a 24-hour day for one person to comply with these requirements in addition to carry out daily risk management analyses.
If anyone for a minute thought that risk management analyses are not a daily and ongoing tasks, an SMS is not only rolling downhill, but it is also rolling down a path to operational failure. SMS itself cannot fail since all it does is to paint a true picture of a failed operation, but operations can fail by ignoring SMS drift and trends. Just as investments professionals must assess the risk daily, an airline and airport operator must also assess their risks daily.
Conventional wisdom is that airlines and airports only need to assess the risks for accidents that already have happened. This is also a misconception, but it does not imply that it is wrong or incorrect. When SMS first was introduced, there were little to no information or literature available of what an aviation safety management system actually is. Airlines and airports required to implement SMS continued the path they were on, which was to react reactively to incidents and accidents. SMS was not fully understood at that time. Common phrase was that safety is common sense, knowing that common sense had produced accidents since the beginning of time on December 17, 1903.
Some time ago, I received a practice SMS report, and this is what the report said:
“On 17 DEC 1903 two unlicensed pilots, Orville and Wilbur Wright, made 4 unauthorized flights in an unregistered aircraft. They departed and arrived without
communicating with air traffic control or utilizing local CTAF. Their airplane, which had not received its annual inspection by a licensed Aircraft Mechanic, was damaged during their last flight. They failed to report the incident to the TC and TSB, neither of which had been invented yet. Corrective Action: Recommend TC to be invented immediately, and Wilbur and Orville Wright's pilot certificates to be issued then revoked.”
In the Safety oversight component, the reactive reporting process was the first
operational task for airlines and airports. This task was fully understood, since
reactive reporting with corrective actions was how safety was managed prior to a
regulated implemented SMS. There were several other options available on how to
initiate the regulated SMS process, and the consensus was to begin with the
reactive reporting process.
When operating with a reactive process system, an incident or accident must first
happen before it is reported and analyzed by applying statistic process control
(SPC). The first step to report an accident was familiar to operators, but the
challenge came when the analytical process took place. In the pre-SMS days, the
broken piece was fixed, forgotten about, and nobody conducted process analysis.
Special cause variation for root cause analysis was unknown, and most operators
could not identify the difference between common cause variations and special
cause variations. SMS was implemented with several other new definitions and
tasks in the reactive system, which immediately caused confrontations. Since the
SMS regulations are performance based, the golden rule is that if the regulation
does not specifically state what needs to be done, that is the exact reason why an
airline or airport operator must do what it takes to meet the intent of the
regulations. A common phrase with the SMS implementation was that “the
regulations does not say that.”
The next step of the safety oversight element was to phase-in the proactive
process. There was still a confusion among airlines and airport operators, including
the Regulator, of what defined an SMS process. Since the phase-in was a proactive
task, the consensus became to identify hazards and do something about that
hazard before it became a bigger problem or would lead to an incident. Operators
dangled carrots, or bribes, for employees to report hazards. Whoever reported the
most hazard in a month would receive a gift. Gifts, or bribes, when initiating a
process to learn the process itself is acceptable, but within a fully operational SMS,
bribes, or carrots do not paint a true picture of the health of an SMS.
The Heinrich Pyramid, or the Heinrich Law, was used as justification to action to
prevent minor hazards immediately, since they would, unquestionable, lead to
accidents. Heinrich's law is based on probability and assumes that the number of
accidents is inversely proportional to the severity of those accidents. It leads to the
conclusion that minimizing the number of minor incidents will lead to a reduction
in major accidents, which is not necessarily the case. In a workplace, for every
accident that causes a major injury, there are 29 accidents that cause minor
injuries and 300 accidents that cause no injuries. Hinrich Law is applicable to an
overcontrolled environment with common cause variations only, and where
special cause variations are excluded. Eventually, several airline and airport
operators put the Heinrich Law aside and referenced this principle as guidance and
instruction material only, rather than a law written in stone.
After the reactive and proactive process systems were phased-in, the next step in
the SMS was to implement investigation and analysis. The first constraint for this
phase-in period was to determine what to investigate and a consensus made sense
to investigate accidents and incidents. After all, this is what TSB did, so operators
assumed they were expected to do the same. Accidents and incident investigated
by operators were not limited to the severity of the outcome, but anything that
failed were placed in the investigation hat. Upon completion of an investigation an
operations bulletin was issued for personnel to read and accept, and after just a
few months, the paper clipboard was overloaded with bulletins. An airport would
conduct a root cause analysis and investigate a burnt-out runway edge light, and
airline would do the same for a burnt-out aircraft taxi light. During the phase-in
period SMS personnel had limited training to comprehend the safety management
system. Investigations and analysis of incidents that were done at that time were
not the wrong thing to do, since it was common sense based on their current
knowledge. Investigating the outcome itself was the incorrect thing to do. The
difference between doing the wrong thing and the incorrect thing, is that doing the
wrong thing is to do a task against better knowledge, and doing the incorrect thing
is the lack of knowledge of what needs to be done. As the SMS learning level
progressed, it became clear that the investigation was not to investigate the
outcome, but to investigate the hazard and how a hazard was carried forward in
the operational process.
The final step in the 4-year
phase-in period was to
implement the quality
assurance program and
assess the effectiveness of
SMS. The struggle with this
phase-in period was to
determine what makes an
effective SMS. Conventional
wisdom was that operating
with zero accidents or
incidents was the prime key-
performance indicator, and the SMS performance level was assessed to the
number of incidents during an established time period. This is still an ongoing
assessment process used to establish an effective SMS. Effectiveness is analyzed in
graph-charts and run-charts, where a downwards trends are good, and upward
trends are bad. Applying this process provides some useful information, but the
analysis is based on opinions and emotions. When opinions and emotions are the
foundation for analyses, the trap to fall into is overcontrolling of processes. When
there is overcontrolling of processes, the ops-bulletin clipboard gets filled up faster
than the paper can be printed. An invaluable tool to operate with a paper-format
SMS is that process overcontrol can easily be identified by viewing the number of
paper files. When operating with a flawed system, e.g. flying an airplane without
required maintenance, by random chance that flight will be successful and safe. If
a pilot on a precision approach misread the approach chart minimums, e.g. a flawed training system, and lands in zero-zero, the odds by random chance is that
the flight will be successful. The moral of the story is that lack of accidents is not a
key performance indicator (KPI) of how effective an SMS is.
The most critical task and difficult task in assessing the effectiveness of a safety
management system is to rate, or classify processes to different risk levels, safety
critical areas and safety critical functions within these areas. From a non-analytical
point of view, all processes in flying must be assessed as high-risk levels since there
are always possibilities for an element to cause an accident. Operating with
possibilities is an emotional assessment of effectiveness. There is no evidence that
missing one or all items on a landing checklist will cause an accident. The
effectiveness of a safety management system cannot be determined without
applying statistical process control since it must be assessed by probabilities, as
opposed to possibilities.
The quality assurance program is a component of the safety management system
and is therefore an integrated part of an SMS in the same manner as the safety
polity, processes for setting goals, measuring the attainment of goals, hazard
identification, training, reporting system, process manual, communication to
personnel, periodic review of the SMS and review for cause are integrated
components of the SMS.
A regulatory requirement of a safety management system is to conduct an audit of
the entire quality assurance program carried out every three years. During the 4th
year phase-in period, the struggle with this requirement was to identify what the
quality assurance program actually was and what it should look like. Since the
quality assurance program is a component of the SMS system, it must be treated
the same way as a safety policy, goalsetting processes, or reporting processes.
Since none of these components include specific text on what an airline or airport
must include to meet the performance requirement, an airline or airport must
design their own quality assurance program tailored specifically to their
operations. One vital component, and prerequisite of a healthy quality assurance
program is an operational daily quality control system. This system is not included
in the text of the regulations but is a component of the overarching quality
assurance system. With the daily quality control program implemented, and just as
any small or large grocery store counts the cash at the end of the day, an SMS
enterprise must count their daily quality control processes daily. When the quality
control system is counted, an audit of the quality assurance program is possible,
and the checkboxes may be downgraded to be incidental to the daily quality
control.
Over a period of four years,
both airlines and airport had
been operating with an SMS
without knowing or
comprehending its definite
purpose. This also caused
conflicts and struggles
within the industry to define
the SMS path of how to apply
this to operations. A consensus for a solution was to ensure that all required
checkboxes were completed, and the aviation SMS quality assurance program built
its platform on this principle. The checkbox syndrome is still the basis of SMS
performance and effectiveness and has become so powerful that it was also
implemented in the initial pilot training programs. Checkboxes are necessary for a
healthy SMS, but when checkboxes become the primary task, the accountable
executive takes their SMS down the wrong path. As I learned from a
groundbreaking woman in aviation, who also become one of the first female pilots
hired by a major airline, that completing all checkboxes have become a more
important task than the actual individual flight training.
Operating with a healthy SMS is a simple task when all the groundwork is
completed. A healthy SMS does not interfere or affect roles, responsibilities or
assigned tasks that an airline or airport has assigned to a consultant, director of
operations, airside crew, airport manager, SMS manager, airfield maintainers,
airside operations personnel, or cloudbased SMS resources systems. A healthy SMS
is scaled to the size and complexity of operations by assigning multiple regulatory
requirements to one task and operating with a regulatory element of the SMS and
an operational element of the SMS separately, but with both integrated in the SMS
analysis.
The single most significant role for a healthy SMS to accept that the accountable
executive is the person who is responsible for complying with the regulatory
requirement to be responsible for operations, and to be accountable on behalf of
the certificate holder for meeting the requirements of the regulations. A healthy
SMS looks like an organization where major factors affecting operations are
monitored daily. A healthy SMS collects data from multiple different sources, such
as web cameras, internal and external reports, and publicly available flight critical
observations and predictions. A healthy SMS operates with an Above the Fold
system, where factors that the risk management officer has assessed as
operational priority risk levels for that day are placed above the fold,
communicated to the AE, and monitored by the SMS manager.
A healthy SMS is when an accountable executive accepts that a healthy SMS is a
OffRoadPilots