Saturday, November 25, 2023

What A Healthy SMS Looks Like

 What A Healthy SMS Looks Like

By OffRoadPilots

After several years of operating with a safety management system (SMS), an SMS enterprise should be operating with zero regulatory findings. The accountable executive (AE) should have full control over the path their SMS has taken in the past and established a vision in their SMS policy of what to expect in the future. The are three regulatory compliance principles for a successful safety management system. The accountable executive is responsible for compliance with all regulations, the certificate holder (CH) is responsible for the quality assurance program (QAP), the person managing the safety management system (SMS manager) is responsible for monitoring concerns that the aviation industry has about your airport. A healthy SMS includes a risk management officer (RMO) position. Risk management is what makes a safety management system a healthy SMS within a fluid environment and ever-changing priorities.

The duties of a risk management officer are often assigned to an SMS manager when the CH appoints a person to managing their SMS. The person managing the safety management system shall identify hazards and carry out risk management analyses of those hazards. Other duties assigned to an SMS manager are to maintain a reporting system, investigate, analyze and identify the cause or probable cause of all hazards, incidents and accidents, maintain a safety data system, by either electronic or other means, to monitor and analyze trends in hazards, incidents and accidents, monitor and evaluate the results of corrective actions with respect to hazards, incidents and accidents, monitor the concerns of the civil aviation industry in respect of safety and their perceived effect on the your airport, and determine the adequacy of the training required. These responsibilities which are assigned by the regulations to an SMS manager are extremely labor intensive, research intensive, data collection intensive and comprehension intensive. There are not enough hours in a 24-hour day for one person to comply with these requirements in addition to carry out daily risk management analyses.

If anyone for a minute thought that risk management analyses are not a daily and ongoing tasks, an SMS is not only rolling downhill, but it is also rolling down a path to operational failure. SMS itself cannot fail since all it does is to paint a true picture of a failed operation, but operations can fail by ignoring SMS drift and trends. Just as investments professionals must assess the risk daily, an airline and airport operator must also assess their risks daily.

Conventional wisdom is that airlines and airports only need to assess the risks for accidents that already have happened. This is also a misconception, but it does not imply that it is wrong or incorrect. When SMS first was introduced, there were little to no information or literature available of what an aviation safety management system actually is. Airlines and airports required to implement SMS continued the path they were on, which was to react reactively to incidents and accidents. SMS was not fully understood at that time. Common phrase was that safety is common sense, knowing that common sense had produced accidents since the beginning of time on December 17, 1903.

Some time ago, I received a practice SMS report, and this is what the report said:

“On 17 DEC 1903 two unlicensed pilots, Orville and Wilbur Wright, made 4 unauthorized flights in an unregistered aircraft. They departed and arrived without

communicating with air traffic control or utilizing local CTAF. Their airplane, which had not received its annual inspection by a licensed Aircraft Mechanic, was damaged during their last flight. They failed to report the incident to the TC and TSB, neither of which had been invented yet. Corrective Action: Recommend TC to be invented immediately, and Wilbur and Orville Wright's pilot certificates to be issued then revoked.”

In the Safety oversight component, the reactive reporting process was the first

operational task for airlines and airports. This task was fully understood, since

reactive reporting with corrective actions was how safety was managed prior to a

regulated implemented SMS. There were several other options available on how to

initiate the regulated SMS process, and the consensus was to begin with the

reactive reporting process.

When operating with a reactive process system, an incident or accident must first

happen before it is reported and analyzed by applying statistic process control

(SPC). The first step to report an accident was familiar to operators, but the

challenge came when the analytical process took place. In the pre-SMS days, the

broken piece was fixed, forgotten about, and nobody conducted process analysis.

Special cause variation for root cause analysis was unknown, and most operators

could not identify the difference between common cause variations and special

cause variations. SMS was implemented with several other new definitions and

tasks in the reactive system, which immediately caused confrontations. Since the

SMS regulations are performance based, the golden rule is that if the regulation

does not specifically state what needs to be done, that is the exact reason why an

airline or airport operator must do what it takes to meet the intent of the

regulations. A common phrase with the SMS implementation was that “the

regulations does not say that.”

The next step of the safety oversight element was to phase-in the proactive

process. There was still a confusion among airlines and airport operators, including

the Regulator, of what defined an SMS process. Since the phase-in was a proactive

task, the consensus became to identify hazards and do something about that

hazard before it became a bigger problem or would lead to an incident. Operators

dangled carrots, or bribes, for employees to report hazards. Whoever reported the

most hazard in a month would receive a gift. Gifts, or bribes, when initiating a

process to learn the process itself is acceptable, but within a fully operational SMS,

bribes, or carrots do not paint a true picture of the health of an SMS.

The Heinrich Pyramid, or the Heinrich Law, was used as justification to action to

prevent minor hazards immediately, since they would, unquestionable, lead to

accidents. Heinrich's law is based on probability and assumes that the number of

accidents is inversely proportional to the severity of those accidents. It leads to the

conclusion that minimizing the number of minor incidents will lead to a reduction

in major accidents, which is not necessarily the case. In a workplace, for every

accident that causes a major injury, there are 29 accidents that cause minor

injuries and 300 accidents that cause no injuries. Hinrich Law is applicable to an

overcontrolled environment with common cause variations only, and where

special cause variations are excluded. Eventually, several airline and airport

operators put the Heinrich Law aside and referenced this principle as guidance and

instruction material only, rather than a law written in stone.

After the reactive and proactive process systems were phased-in, the next step in

the SMS was to implement investigation and analysis. The first constraint for this

phase-in period was to determine what to investigate and a consensus made sense

to investigate accidents and incidents. After all, this is what TSB did, so operators

assumed they were expected to do the same. Accidents and incident investigated

by operators were not limited to the severity of the outcome, but anything that

failed were placed in the investigation hat. Upon completion of an investigation an

operations bulletin was issued for personnel to read and accept, and after just a

few months, the paper clipboard was overloaded with bulletins. An airport would

conduct a root cause analysis and investigate a burnt-out runway edge light, and

airline would do the same for a burnt-out aircraft taxi light. During the phase-in

period SMS personnel had limited training to comprehend the safety management

system. Investigations and analysis of incidents that were done at that time were

not the wrong thing to do, since it was common sense based on their current

knowledge. Investigating the outcome itself was the incorrect thing to do. The

difference between doing the wrong thing and the incorrect thing, is that doing the

wrong thing is to do a task against better knowledge, and doing the incorrect thing

is the lack of knowledge of what needs to be done. As the SMS learning level

progressed, it became clear that the investigation was not to investigate the

outcome, but to investigate the hazard and how a hazard was carried forward in

the operational process.

The final step in the 4-year

phase-in period was to

implement the quality

assurance program and

assess the effectiveness of

SMS. The struggle with this

phase-in period was to

determine what makes an

effective SMS. Conventional

wisdom was that operating

with zero accidents or

incidents was the prime key-

performance indicator, and the SMS performance level was assessed to the

number of incidents during an established time period. This is still an ongoing

assessment process used to establish an effective SMS. Effectiveness is analyzed in

graph-charts and run-charts, where a downwards trends are good, and upward

trends are bad. Applying this process provides some useful information, but the

analysis is based on opinions and emotions. When opinions and emotions are the

foundation for analyses, the trap to fall into is overcontrolling of processes. When

there is overcontrolling of processes, the ops-bulletin clipboard gets filled up faster

than the paper can be printed. An invaluable tool to operate with a paper-format

SMS is that process overcontrol can easily be identified by viewing the number of

paper files. When operating with a flawed system, e.g. flying an airplane without

required maintenance, by random chance that flight will be successful and safe. If

a pilot on a precision approach misread the approach chart minimums, e.g. a flawed training system, and lands in zero-zero, the odds by random chance is that

the flight will be successful. The moral of the story is that lack of accidents is not a

key performance indicator (KPI) of how effective an SMS is.

The most critical task and difficult task in assessing the effectiveness of a safety

management system is to rate, or classify processes to different risk levels, safety

critical areas and safety critical functions within these areas. From a non-analytical

point of view, all processes in flying must be assessed as high-risk levels since there

are always possibilities for an element to cause an accident. Operating with

possibilities is an emotional assessment of effectiveness. There is no evidence that

missing one or all items on a landing checklist will cause an accident. The

effectiveness of a safety management system cannot be determined without

applying statistical process control since it must be assessed by probabilities, as

opposed to possibilities.

The quality assurance program is a component of the safety management system

and is therefore an integrated part of an SMS in the same manner as the safety

polity, processes for setting goals, measuring the attainment of goals, hazard

identification, training, reporting system, process manual, communication to

personnel, periodic review of the SMS and review for cause are integrated

components of the SMS.

A regulatory requirement of a safety management system is to conduct an audit of

the entire quality assurance program carried out every three years. During the 4th

year phase-in period, the struggle with this requirement was to identify what the

quality assurance program actually was and what it should look like. Since the

quality assurance program is a component of the SMS system, it must be treated

the same way as a safety policy, goalsetting processes, or reporting processes.

Since none of these components include specific text on what an airline or airport

must include to meet the performance requirement, an airline or airport must

design their own quality assurance program tailored specifically to their

operations. One vital component, and prerequisite of a healthy quality assurance

program is an operational daily quality control system. This system is not included

in the text of the regulations but is a component of the overarching quality

assurance system. With the daily quality control program implemented, and just as

any small or large grocery store counts the cash at the end of the day, an SMS

enterprise must count their daily quality control processes daily. When the quality

control system is counted, an audit of the quality assurance program is possible,

and the checkboxes may be downgraded to be incidental to the daily quality

control.

Over a period of four years,

both airlines and airport had

been operating with an SMS

without knowing or

comprehending its definite

purpose. This also caused

conflicts and struggles

within the industry to define

the SMS path of how to apply 

this to operations. A consensus for a solution was to ensure that all required

checkboxes were completed, and the aviation SMS quality assurance program built

its platform on this principle. The checkbox syndrome is still the basis of SMS

performance and effectiveness and has become so powerful that it was also

implemented in the initial pilot training programs. Checkboxes are necessary for a

healthy SMS, but when checkboxes become the primary task, the accountable

executive takes their SMS down the wrong path. As I learned from a

groundbreaking woman in aviation, who also become one of the first female pilots

hired by a major airline, that completing all checkboxes have become a more

important task than the actual individual flight training.


Operating with a healthy SMS is a simple task when all the groundwork is

completed. A healthy SMS does not interfere or affect roles, responsibilities or

assigned tasks that an airline or airport has assigned to a consultant, director of

operations, airside crew, airport manager, SMS manager, airfield maintainers,

airside operations personnel, or cloudbased SMS resources systems. A healthy SMS

is scaled to the size and complexity of operations by assigning multiple regulatory

requirements to one task and operating with a regulatory element of the SMS and

an operational element of the SMS separately, but with both integrated in the SMS

analysis.


The single most significant role for a healthy SMS to accept that the accountable

executive is the person who is responsible for complying with the regulatory

requirement to be responsible for operations, and to be accountable on behalf of

the certificate holder for meeting the requirements of the regulations. A healthy

SMS looks like an organization where major factors affecting operations are

monitored daily. A healthy SMS collects data from multiple different sources, such

as web cameras, internal and external reports, and publicly available flight critical

observations and predictions. A healthy SMS operates with an Above the Fold

system, where factors that the risk management officer has assessed as

operational priority risk levels for that day are placed above the fold,

communicated to the AE, and monitored by the SMS manager.

A healthy SMS is when an accountable executive accepts that a healthy SMS is a

maturity system.

OffRoadPilots



No comments:

Post a Comment

Identify Special Cause Variation

  Identify Special Cause Variation By OffRoadPilots S pecial cause variation, also known as assignable cause variation, refers to variation ...