Saturday, November 25, 2023

What A Healthy SMS Looks Like

 What A Healthy SMS Looks Like

By OffRoadPilots

After several years of operating with a safety management system (SMS), an SMS enterprise should be operating with zero regulatory findings. The accountable executive (AE) should have full control over the path their SMS has taken in the past and established a vision in their SMS policy of what to expect in the future. The are three regulatory compliance principles for a successful safety management system. The accountable executive is responsible for compliance with all regulations, the certificate holder (CH) is responsible for the quality assurance program (QAP), the person managing the safety management system (SMS manager) is responsible for monitoring concerns that the aviation industry has about your airport. A healthy SMS includes a risk management officer (RMO) position. Risk management is what makes a safety management system a healthy SMS within a fluid environment and ever-changing priorities.

The duties of a risk management officer are often assigned to an SMS manager when the CH appoints a person to managing their SMS. The person managing the safety management system shall identify hazards and carry out risk management analyses of those hazards. Other duties assigned to an SMS manager are to maintain a reporting system, investigate, analyze and identify the cause or probable cause of all hazards, incidents and accidents, maintain a safety data system, by either electronic or other means, to monitor and analyze trends in hazards, incidents and accidents, monitor and evaluate the results of corrective actions with respect to hazards, incidents and accidents, monitor the concerns of the civil aviation industry in respect of safety and their perceived effect on the your airport, and determine the adequacy of the training required. These responsibilities which are assigned by the regulations to an SMS manager are extremely labor intensive, research intensive, data collection intensive and comprehension intensive. There are not enough hours in a 24-hour day for one person to comply with these requirements in addition to carry out daily risk management analyses.

If anyone for a minute thought that risk management analyses are not a daily and ongoing tasks, an SMS is not only rolling downhill, but it is also rolling down a path to operational failure. SMS itself cannot fail since all it does is to paint a true picture of a failed operation, but operations can fail by ignoring SMS drift and trends. Just as investments professionals must assess the risk daily, an airline and airport operator must also assess their risks daily.

Conventional wisdom is that airlines and airports only need to assess the risks for accidents that already have happened. This is also a misconception, but it does not imply that it is wrong or incorrect. When SMS first was introduced, there were little to no information or literature available of what an aviation safety management system actually is. Airlines and airports required to implement SMS continued the path they were on, which was to react reactively to incidents and accidents. SMS was not fully understood at that time. Common phrase was that safety is common sense, knowing that common sense had produced accidents since the beginning of time on December 17, 1903.

Some time ago, I received a practice SMS report, and this is what the report said:

“On 17 DEC 1903 two unlicensed pilots, Orville and Wilbur Wright, made 4 unauthorized flights in an unregistered aircraft. They departed and arrived without

communicating with air traffic control or utilizing local CTAF. Their airplane, which had not received its annual inspection by a licensed Aircraft Mechanic, was damaged during their last flight. They failed to report the incident to the TC and TSB, neither of which had been invented yet. Corrective Action: Recommend TC to be invented immediately, and Wilbur and Orville Wright's pilot certificates to be issued then revoked.”

In the Safety oversight component, the reactive reporting process was the first

operational task for airlines and airports. This task was fully understood, since

reactive reporting with corrective actions was how safety was managed prior to a

regulated implemented SMS. There were several other options available on how to

initiate the regulated SMS process, and the consensus was to begin with the

reactive reporting process.

When operating with a reactive process system, an incident or accident must first

happen before it is reported and analyzed by applying statistic process control

(SPC). The first step to report an accident was familiar to operators, but the

challenge came when the analytical process took place. In the pre-SMS days, the

broken piece was fixed, forgotten about, and nobody conducted process analysis.

Special cause variation for root cause analysis was unknown, and most operators

could not identify the difference between common cause variations and special

cause variations. SMS was implemented with several other new definitions and

tasks in the reactive system, which immediately caused confrontations. Since the

SMS regulations are performance based, the golden rule is that if the regulation

does not specifically state what needs to be done, that is the exact reason why an

airline or airport operator must do what it takes to meet the intent of the

regulations. A common phrase with the SMS implementation was that “the

regulations does not say that.”

The next step of the safety oversight element was to phase-in the proactive

process. There was still a confusion among airlines and airport operators, including

the Regulator, of what defined an SMS process. Since the phase-in was a proactive

task, the consensus became to identify hazards and do something about that

hazard before it became a bigger problem or would lead to an incident. Operators

dangled carrots, or bribes, for employees to report hazards. Whoever reported the

most hazard in a month would receive a gift. Gifts, or bribes, when initiating a

process to learn the process itself is acceptable, but within a fully operational SMS,

bribes, or carrots do not paint a true picture of the health of an SMS.

The Heinrich Pyramid, or the Heinrich Law, was used as justification to action to

prevent minor hazards immediately, since they would, unquestionable, lead to

accidents. Heinrich's law is based on probability and assumes that the number of

accidents is inversely proportional to the severity of those accidents. It leads to the

conclusion that minimizing the number of minor incidents will lead to a reduction

in major accidents, which is not necessarily the case. In a workplace, for every

accident that causes a major injury, there are 29 accidents that cause minor

injuries and 300 accidents that cause no injuries. Hinrich Law is applicable to an

overcontrolled environment with common cause variations only, and where

special cause variations are excluded. Eventually, several airline and airport

operators put the Heinrich Law aside and referenced this principle as guidance and

instruction material only, rather than a law written in stone.

After the reactive and proactive process systems were phased-in, the next step in

the SMS was to implement investigation and analysis. The first constraint for this

phase-in period was to determine what to investigate and a consensus made sense

to investigate accidents and incidents. After all, this is what TSB did, so operators

assumed they were expected to do the same. Accidents and incident investigated

by operators were not limited to the severity of the outcome, but anything that

failed were placed in the investigation hat. Upon completion of an investigation an

operations bulletin was issued for personnel to read and accept, and after just a

few months, the paper clipboard was overloaded with bulletins. An airport would

conduct a root cause analysis and investigate a burnt-out runway edge light, and

airline would do the same for a burnt-out aircraft taxi light. During the phase-in

period SMS personnel had limited training to comprehend the safety management

system. Investigations and analysis of incidents that were done at that time were

not the wrong thing to do, since it was common sense based on their current

knowledge. Investigating the outcome itself was the incorrect thing to do. The

difference between doing the wrong thing and the incorrect thing, is that doing the

wrong thing is to do a task against better knowledge, and doing the incorrect thing

is the lack of knowledge of what needs to be done. As the SMS learning level

progressed, it became clear that the investigation was not to investigate the

outcome, but to investigate the hazard and how a hazard was carried forward in

the operational process.

The final step in the 4-year

phase-in period was to

implement the quality

assurance program and

assess the effectiveness of

SMS. The struggle with this

phase-in period was to

determine what makes an

effective SMS. Conventional

wisdom was that operating

with zero accidents or

incidents was the prime key-

performance indicator, and the SMS performance level was assessed to the

number of incidents during an established time period. This is still an ongoing

assessment process used to establish an effective SMS. Effectiveness is analyzed in

graph-charts and run-charts, where a downwards trends are good, and upward

trends are bad. Applying this process provides some useful information, but the

analysis is based on opinions and emotions. When opinions and emotions are the

foundation for analyses, the trap to fall into is overcontrolling of processes. When

there is overcontrolling of processes, the ops-bulletin clipboard gets filled up faster

than the paper can be printed. An invaluable tool to operate with a paper-format

SMS is that process overcontrol can easily be identified by viewing the number of

paper files. When operating with a flawed system, e.g. flying an airplane without

required maintenance, by random chance that flight will be successful and safe. If

a pilot on a precision approach misread the approach chart minimums, e.g. a flawed training system, and lands in zero-zero, the odds by random chance is that

the flight will be successful. The moral of the story is that lack of accidents is not a

key performance indicator (KPI) of how effective an SMS is.

The most critical task and difficult task in assessing the effectiveness of a safety

management system is to rate, or classify processes to different risk levels, safety

critical areas and safety critical functions within these areas. From a non-analytical

point of view, all processes in flying must be assessed as high-risk levels since there

are always possibilities for an element to cause an accident. Operating with

possibilities is an emotional assessment of effectiveness. There is no evidence that

missing one or all items on a landing checklist will cause an accident. The

effectiveness of a safety management system cannot be determined without

applying statistical process control since it must be assessed by probabilities, as

opposed to possibilities.

The quality assurance program is a component of the safety management system

and is therefore an integrated part of an SMS in the same manner as the safety

polity, processes for setting goals, measuring the attainment of goals, hazard

identification, training, reporting system, process manual, communication to

personnel, periodic review of the SMS and review for cause are integrated

components of the SMS.

A regulatory requirement of a safety management system is to conduct an audit of

the entire quality assurance program carried out every three years. During the 4th

year phase-in period, the struggle with this requirement was to identify what the

quality assurance program actually was and what it should look like. Since the

quality assurance program is a component of the SMS system, it must be treated

the same way as a safety policy, goalsetting processes, or reporting processes.

Since none of these components include specific text on what an airline or airport

must include to meet the performance requirement, an airline or airport must

design their own quality assurance program tailored specifically to their

operations. One vital component, and prerequisite of a healthy quality assurance

program is an operational daily quality control system. This system is not included

in the text of the regulations but is a component of the overarching quality

assurance system. With the daily quality control program implemented, and just as

any small or large grocery store counts the cash at the end of the day, an SMS

enterprise must count their daily quality control processes daily. When the quality

control system is counted, an audit of the quality assurance program is possible,

and the checkboxes may be downgraded to be incidental to the daily quality

control.

Over a period of four years,

both airlines and airport had

been operating with an SMS

without knowing or

comprehending its definite

purpose. This also caused

conflicts and struggles

within the industry to define

the SMS path of how to apply 

this to operations. A consensus for a solution was to ensure that all required

checkboxes were completed, and the aviation SMS quality assurance program built

its platform on this principle. The checkbox syndrome is still the basis of SMS

performance and effectiveness and has become so powerful that it was also

implemented in the initial pilot training programs. Checkboxes are necessary for a

healthy SMS, but when checkboxes become the primary task, the accountable

executive takes their SMS down the wrong path. As I learned from a

groundbreaking woman in aviation, who also become one of the first female pilots

hired by a major airline, that completing all checkboxes have become a more

important task than the actual individual flight training.


Operating with a healthy SMS is a simple task when all the groundwork is

completed. A healthy SMS does not interfere or affect roles, responsibilities or

assigned tasks that an airline or airport has assigned to a consultant, director of

operations, airside crew, airport manager, SMS manager, airfield maintainers,

airside operations personnel, or cloudbased SMS resources systems. A healthy SMS

is scaled to the size and complexity of operations by assigning multiple regulatory

requirements to one task and operating with a regulatory element of the SMS and

an operational element of the SMS separately, but with both integrated in the SMS

analysis.


The single most significant role for a healthy SMS to accept that the accountable

executive is the person who is responsible for complying with the regulatory

requirement to be responsible for operations, and to be accountable on behalf of

the certificate holder for meeting the requirements of the regulations. A healthy

SMS looks like an organization where major factors affecting operations are

monitored daily. A healthy SMS collects data from multiple different sources, such

as web cameras, internal and external reports, and publicly available flight critical

observations and predictions. A healthy SMS operates with an Above the Fold

system, where factors that the risk management officer has assessed as

operational priority risk levels for that day are placed above the fold,

communicated to the AE, and monitored by the SMS manager.

A healthy SMS is when an accountable executive accepts that a healthy SMS is a

maturity system.

OffRoadPilots



Saturday, November 11, 2023

The Devil Is In The Details

 The Devil Is In The Details

By OffRoadPilots

The Titanic disaster was caused by a detail in the watertight compartment design flaw that the walls separating the bulkheads extended only a few feet above the water line, so water could pour from one compartment into another, especially if the ship began to list or pitch forward.

The Alexander Kielland disaster was caused by a fatigue crack in one of its six bracings, which connected the collapsed D-leg to the rest of the rig. This was traced to a small 6mm fillet weld which joined a non-load-bearing flange plate to this D-6 bracing.


The Sioux City IA air disaster was cause by a catastrophic failure of its tail-mounted engine due to an unnoticed manufacturing defect in the engine's fan disk, which resulted in the loss of many flight controls. None of these details were identified as issues of any concerns, but they caused some of the most horrific and catastrophic historical events within their own areas of history. Titanic was built to be unsinkable, a deep-sea diver once said to me that there were terrible working conditions for underwater welders, and it was known ten years prior to the disk failures that these disks had flaws and could fail.

Details may be known by management, but are often dismissed, they are brushed aside as being unimportant, or seen as irrelevant to the issue. Details are not only important in operations, but also for regulatory and standard compliance.

both airlines and airports have to maintain compliance with a comprehensive safety management system (SMS). I concept, an SMS is simple but unless details are identified within a system analysis, the system becomes complex and often unmanageable. A manageable SMS is based on daily quality control, established processes and each operational task is linked to multiple compliance requirements. When processes are established, an SMS has been simplified and manageable, with the primary tasks to monitor for deviations from assigned path. The more details paid attention to in an SMS make the SMS simpler and easier to use. When details are known, it is easy to see where the pieces fit into the whole picture, as opposed to fit a large piece into a detailed issue. When SMS is forced, it makes it difficult and complex to apply in operations. A symptom of an SMS that is too complex or unmanageable for operations, is therefore when SMS is overloaded, or overcontrolled, and safety information is a tool to justify its existence.

Paying attention to details is a regulatory requirement for a certificate holder to adapt their safety management system to the size, nature and complexity of the operations, activities, hazards, and risks associated with the operations. Adapting to size and complexity requires detailed knowledge of their operations. When an operator only has a high- level knowledge and overview of their systems does not allow for a

certificate holder to apply operational targeted processes that suits their size of operations. A certificate holder is required to appoint an accountable executive (AE) to be responsible for operations or activities authorized under the certificate and accountable on their behalf of the certificate holder for meeting the requirements of the regulations. This requirement does not imply that an AE only need to be familiar, or only have partial knowledge of the regulations, but is a requirement for the AE to have detailed knowledge of regulations to detect deviations from established paths and non-conforming processes. Conventional wisdom is that an AE only need to be responsible for financial and human resources, which is a job description of their position, while the knowledge of regulations is the requirements for accepting the role. SMS is a businesslike approach to safety, and no business owners, corporate directors or airport authority would hire an accountant or lawyer who have limited knowledge of regulatory requirements and their areas of responsibilities. However, they continue to hire accountable executives who do not have the knowledge base to fulfil their obligations to the regulations.

Obligations of an airport operator is to review each issue of each aeronautical information publication on receipt and, immediately after a review, notify the Regulator of any inaccurate information. Detailed knowledge of how to obtain a copy of the aeropub is required, detailed knowledge of how often a new revision is issued, and what date it is published is required. They need detailed knowledge of what information pertains to their operations, what action to take in addition to reporting any errors to the Regulator, and how their internal SMS process capture these requirements. An operator must design, develop, and submit to the Regulator an operations plan for airside construction, and operate with airside operations plans for maintenance and repairs. Operations plans must include details of operations for processes to conform to regulatory requirements.

The person managing the safety management system, or SMS manager is required to monitor the concerns of the civil aviation industry in respect of safety and their perceived effect on the certificate holder and determine the adequacy of the training for personnel. In-depth and detailed knowledge of their own operations are required for an SMS manager to monitor the aviation industry in respect to safety and how they view different independent operators. An airport operator who frequently closes their runways due to maintenance and repairs, may be viewed as unsafe since this particular airport does not have project plans in place for airside management and for runways to remain open for business. An airport operator may choose to close a runway between 2AM and 4AM for daily maintenance and inspection, which is different that publish NOTAMS for unexpected maintenance requirements during hours of operations. An SMS manager is required to determine requirement of training, and without the details of expected outcome of the training this function cannot be performed.

Comprehension of details in operations, the text of regulations, and the intent of performance-based regulations are required for an operator to design processes that conform to regulatory requirements.

Generally speaking, a regulation is applicable to any airline or airport, unless there are special provisions for size and complexity. One such regulation is the airport winter maintenance regulations, where the regulation is applicable to airports serving turbojet aircraft, and the other part appliable to airport serving propeller aircraft and on-demand operations only. Winter operations for airports serving propeller aircraft is to consult a representative sample of the air operators that use the airport about the intended level of winter maintenance and the remove sand from movement areas when it is no longer needed. Additional requirements for airports serving turbojet aircraft are that they have a winter operations plan, snow removal priority areas, pre-threshold maximum snow accumulation, use of ice control and chemicals, friction measurement and movement area inspection reports. The detail of this regulatory requirement is not in this regulation itself, but in the requirement for an airport certificate. An applicant for an airport certificate must maintain verification records that they can operate with a safety management system and is requirement for non-certified aerodrome operations prior to the issuance of an airport certificate. When a certified airport operator elects to operate as an airport serving propeller aircraft only, they voluntarily give up their SMS records for operations serving turbojet aircraft. Should an airline operator wish to operate turbojet aircraft out of this airport, they must delay their operations until the airport can verify their capability to operate with an SMS supporting turbojet aircraft. The detail of this requirement is to connect the link between two regulations to conduct a system analysis of future operational restrictions. With the implementation of the safety management system, any operational regulations must be linked to the SMS regulations. This is a detail that an AE must be aware of and able to distinguish between multiple regulations and how they are linked to same SMS regulation.

An airport is required to maintain a runway strip, or an area beyond the side of the pavement of a runway, and beyond both threshold, that are without aeronautical obstructions. This includes natural obstructions and other encroachments such are riverbanks. One airport decided, without consultation, to fill in a riverbend to widen their runway strip.

After the construction application was submitted, the community responded with opposition to this initiative. The airport boundary needed to be expanded by filling in the river and bird wetlands. In practice, this means that birds, wildlife, and plants are forced to leave their habitats. In the application, the airport manager wrote the following: "Regarding natural diversity: The airport does not have the professional expertise to assess any special impacts on natural diversity. Our experience from operating the airports over several years is that there is very limited animal and bird life in that area. We assume that this is due to the presence of the lake on the opposite side of the runway, which has a bustling wildlife and bird activities, and which therefore seems to be more attractive. Nor has any extensive movement of wildlife or birds been observed between this lake and the riverbend, which is probably due to the activity on the runway. In addition, the airport has limited data entries in their bird and wildlife register.” The airport manager states in their application that they do not have the professional expertise to assess impact on bird and wildlife, and due to airport operations, bird and wildlife activities are scared away and therefore does not exist as a justification to stop the construction project. This application is in non-compliance with an SMS to conduct system analysis of projects and comprehend all details included. An accountable executive needs to be able to comprehend the details and impact on the community by reading their own submission. There is also a regulatory requirement for airport extensions to consult with their neighbors, stakeholder, and other interested parties.

When the Regulator conducts an inspection, and since the regulations are performance based, they will inspect what is not written in the text of the regulations. An inspection includes the regulations itself, how it is linked to other regulations, and how an SMS enterprise maintain a path to monitor processes. An AE needs to have knowledge to link for an airport to publish NOTAM (Notice To Air Men, and the new definition is Notice To Air Missions), and for the captain of an aircraft to be able to assess an airport for suitability. The intent of an airport operator, and a certificate requirement, is for an operator to operate an aerodrome as an airport. This requirement implies that the airport meet certification standards 24/7. A published NOTAM does not change that requirement but is a tool for an airport operator to fix or repair an unexpected issue within a short timeframe.

The devil is in these details and other safety or regulatory details. For all practical purposes, what this mean is that an SMS enterprise does not have any justifiable cause to operate outside of the intent of the regulations, or exempting themselves from standards, or their own policies as they please, and most important, it is a responsibility for an accountable executive to know what this entail to ensure ongoing compliance.

OffRoadPilots

When SMS is Flawed

  When SMS is Flawed By OffRoadPilots A safety management system (SMS) is a system which paints a true image of airports and airline operat...