Saturday, November 26, 2022

Accepting or Rejecting Risks

 Accepting or Rejecting Risks

By OffRoadPilots

Accepting or rejecting risks is a fundamental principle in a successful safety management system (SMS). A person managing the safety management system is expected to maintain a process for identifying hazards to aviation safety and for evaluating and managing the associated risks and ensuring that personnel are trained and competent to perform their duties as they apply to the safety management system. This includes training for both the accountable executive and SMS manager, in addition to other airport and airline operations personnel.

A level of risk is an inherent element of aviation safety and there are several types of risks to consider when accepting or rejecting risks. One type of risk may take precedence over another type even if it is not directly associated with operations. Risk control strategies are beyond accepting or rejecting a risk, it is to justify control actions based on defined criteria. There are five categories of risks. The total risk is the sum of identified and unidentified risks. Identified risks are risks which has been determined through various analysis techniques. A task for the SMS manger is to identify all possible risks. Unidentified risks are risk not yet identified. Some unidentified risks are identified by occurrences, and some risk will never be known. Unacceptable risks are risks that are beyond a limit to what is acceptable to an SMS enterprise. Unacceptable risks may be controlled or eliminated. Acceptable risks are identified risks that is allowed by the SMS enterprise to persist without further engineering actions. Residual risks are the left-over risks after all other options has been fully explored. The residual risk is the sum of acceptable risks and unidentified risks and integrated in airport or airline operations. 


Conventional wisdom is that the safety management system is about safety, while the fact is that the SMS is about processes, and how things are done. The expected output of these processes is to eliminate harm and create prosperity. When decisions are based on emotional safety principles, rather than data points of facts, the end result may change risk levels to unknown risk level, or unmanageable risk levels.

The AE is the final decisionmaker to accept or reject risks, system analyses or predictive SMS operations plans. Accepting or rejecting risk is not an authority to deviate from any of safety risk management (SRM) processed, or to base accepting or rejecting on common sense and prior practices. In the past, several practices which were acceptable for an airport operator are unacceptable today within an SMS environment. Airport operators has a responsibility for their airport operations to be compatible with aircraft operations, which is the purpose of an airport. In the past, a NOTAM that a runway was covered with ice or snow contaminants were a sufficient action. However, today within an SMS-world, an airport operator must comply with the airport standards, which includes a friction index requirement, or close the runway. An AE may be the final authority, but when risk acceptances are based on prior practices, both safety in operations, and certificate compliance are jeopardized. Risk acceptance based on prior practices, with the justification that it was done before without incidents doesn’t hold water. In addition, data from prior practices applied to hazard classifications and risks may be outdated. 


An easy trap for an AE to fall into is to believe that they have the authority to change a risk level by the stroke of a pen. Nothing can be further from the truth. When an AE wishes to change a risk level, they must follow established processes for root cause analysis, risk assessment and system analysis, which include a signature page that they rejected a risk level advise from the SMS manager. In most organizations, an AE is the President of the company and the business management expert. An AE is not the data analysis expert but is still the person with final authority to change a risk level. Should an AE reject a recommended risk level, operations affected by the hazard in question is paused until an acceptable risk decision is made. On the other hand, an accountable executive has the prerogative to manipulate risk decisions after reviewing other apparent risks, or identified residual risks, and combined exceeds the effect of proposed risk control. 

The role of an SMS manager is not to lower a risk level due to pressure, but to assess mitigation options for assigned risk level, and options for processes to conform to regulatory requirements and acceptable to the AE. A trap for an SMS manager to fall into, is to change the risk level to the demand of an accountable executive. When an SMS manager is a non-employee at a remote location, temptations to manipulate risk levels are reduced. In a just culture there is no personal liability associated with the position of an AE as this individual represents the certificate holder. The certificate holder retains all liability for non-compliance with the regulations. It is crucial to the success of an SMS that an AE works within the just-culture principles of trust, learning, accountability and information sharing when considering recommended risks controls. 


A purpose of regulations is to establish operational limits acceptable to the interest of public safety as determined by the regulatory authority. Public safety may be a floating object and change with circumstances. In the aviation industry this became evident during the pandemic period, where regulatory aviation limits were changed to justify the cause of a greater threat to public safety. This makes risk control measures only applicable under the regulatory jurisdiction. Unless there are international agreements, a just culture, or non-punitive policy is not applicable beyond the regulatory jurisdiction. For airlines, an acceptable risk control within its own borders my be acceptable, while the same risk control internationally may be rejected, or in worst case a criminal action. A recent event occurred when a charter flight crew discovered an indication in the cockpit that something was wrong in the avionics bay. During an inspection of the bay, a duffel bags with illegal substances were discovered, and the flight crew reported this to the authorities. Since the crew was outside of the jurisdiction of their safety management system they were detained for seven months.


Accepting or rejecting risks is therefore more than just organizational related, it is also related to areas of operations, wherever that might take you. A principle of a successful SMS is that hazards are locally identified.  




Saturday, November 12, 2022

Predictive SMS

Predictive SMS

By OffRoadPilots 

Predictive SMS methods are applied research to entail the development of an expanded and well-organized safety database, as well as the use of predictive, or forecasting methods to identify potential and emerging hazards, trends and behaviour patterns. Using data analysis and predictive methods to identify latent hazards is a tool to prevent future adverse events in operations of any organization. SMS has generated wide support in the aviation community as an effective approach that can deliver real safety and financial benefits. SMS integrates safety concepts into repeatable, proactive processes in a single system. The structure of SMS provides organizations greater insight into their operational environment, including their reactive phase, proactive phase, and predictive phase. A prerequisite for a fully operational predictive safety management system are system analyses. 


There are several purposes to operate with a predictive safety management system, and one of these are to move special cause variations into common cause variation for specific operators and locations. A predictive analysis is forecasted expectations as opposed to special cause variations, where expectations are unknown. A predictive analysis is also different from a proactive approach, since the proactive approach is to assume potential hazards, and predictive approach is to analyze known hazards as facts. It is impossible to predict when a hazard will affect operations and cause an occurrence, but it is possible to predict that a hazard will appear in operations within a pre-established time, location, and direction. A predictive SMS does not predict accidents, incidents, or events since the affect of latent hazards are only available with reactive analyses. A predictive safety management system operates within a 3D system and in a virtual moment of the flight, taxi, vehicle operations or other movements. A 3D identification process is measured in time (speed), space (location), and compass (direction). When 3D thinking is applied in a safety management system, future scenarios can be designed with a defined exposure level to predictive hazards.



Root cause analyses of hazards for specific phase of operations and locations have already been conduced and accepted when operating with a predictive safety management. There is a requirement for the person managing the SMS to analyze and identify the cause or probable cause of all hazards, but this requirement does not extend to identify the cause of every hazard, or the same hazard multiple times. The cause of a hazard needs to be identified once, with subsequent same hazard classification numbers to be monitored in a control chart for pattern and frequency. Note that a predictive SMS is applicable to hazards of same classification number, and not of hazards with similar classifications. A successful SMS operates with a hazard classification system of safety critical areas and safety critical functions within identified areas. 


Analyzing birdstrike data in a predictive SMS generates control charts for reliability pattern and frequency. The outcome of this experiment unfolded as the post was written. Data applied in this scenario are from publicly available data for a specific airport between 2010 and 2022. Adding bird observations by airport personnel, tenants or users would enhance the analysis and improve predictive SMS operations. Data are reactive facts, since there are no expected, or assumed data applied in a predictive SMS analysis. 


The X-mR control chart is used with variables data - data that can be "measured" like time, density, weight, conversion, etc.  Like all control charts, the X-mR monitors variation over time.  The X-mR chart will tell if your process is in control (only common causes of variation present) or if there are special causes of variation.  You use the X-mR chart when you have only one data point to represent the situation at a given time.  For example, suppose your company is tracking accounts receivable each month.  You have limited data - one data point a month.  You can use the X-mR in these situations.  You plot the monthly result on the X chart.  You plot the moving range between consecutive months on the mR (for moving range) chart.”


An X-mR variable chart detects special cause variations. The X-mR chart below shows five spikes of special cause variations, or an out-of-control process, between 2010 and 2022. When a special cause variation is identified requires an SMS enterprise to conduct a full-scale Root Cause Analysis. 

When analyzing the out-of-control points, it is noticeable that they occurred during the summer seasons, with the last spike in 2017. What steps the airport took to eliminate special cause variations in 2018 is unknown. Since the main migratory bird routes through the area did not change overnight in 2018, it is assumed that the airport operator implemented changes. If operating with a proactive SMS, an operator would need to conduct a root cause analysis, system analysis and applied a predictive SMS approach to migratory bird behavior. With a predictive SMS approach to   migratory bird travel, systems may be put in place to direct the birds locally away from airport approaches. This particular airport is previously known for changing local bird travel routes by applying the principles of landuse in vicinity of airport, to divert, or eliminate bird activites. Such activities include diverting travel to and from landfills, water reservoirs, or removal of cereal crops in the area. Previous research has identified that bugs are attracted to the blacktop runway surfaces, which again attracts birds. Without any out-of-control points since 2018, it is assumed that a predictive SMS approached fulfilled its expectations.

A Pareto chart is a data-based approach to determine what the major problem or cause is.  All companies have lots and lots of problems on which to work.  There is not enough time in our day to work on everything.  The Pareto chart gives us a way to determine which problem to work on first – where we will get the most return for our investment.  And the Pareto chart is also a great communication technique as we shall see.

Vilfredo Pareto, an Italian economist, developed the Pareto chart in the late 1800s.  He discovered that 80% of Italy’s wealth was held by 20% of the people.  This has become known as the 80/20 rule or the Pareto principle.  It is at the heart of the Pareto chart.  The 80/20 rule applies in many places – 20% of our customers are responsible for 80% of the customer complaints; 20% of the workforce account for 80% of employee issues.  The Pareto chart is one method of separating that 20% - the vital few – from the 80% - the trivial many.  This allows us to focus our time, energy, and resources where we will get the most return for our investment.”


A pareto chart detects the frequency of hazard classifications. When frequencies are identified, an SMS enterprise may prioritize action plans for classifications with the highest frequencies. In a normal distribution, 20% of events are the cause of 80% of all hazards. 

This pareto chart identifies the months of July, August and September as the months when 73% of hazards are occurring during 25% of the months. For the airport operator, these three months now becomes the target focus area to manage bird activities. For airlines operating out of the airport, these three months become the target focus area for their predictive SMS. However, before jumping to a conclusion to apply these analyses to their predictive SMS, airline operators should approach the airport operator for detailed information about actions applied in their bird and wildlife control program. If no actions were taken by the airport operator, then other factors would have affected the bird activity process to reduce birdstrikes.  


In a search it was learned that the airport had implemented corrective actions, and revealed that in 2018 the airport implemented a new bird control system. Here is an excerpt of the news article (redacted): “The airport is pleased to welcome (company) to the airport. The company brings a specialty Falconry Bird Control Program to the airport which augments the airport’s existing wildlife management program. The company provides a service with trained falcons and other species of birds of prey to manage issues that are caused by wild birds in commercial and industrial environments. Bird control falconry is one of the only target specific methods of control which has the minimum impact of the environment and other non-evasive species within it.”


With the new bird control system, a new control chart analysis from 2018 was conducted that produced a similar special cause variation result. 

Migratory bird routes are common cause variations in the bird movement process. Their travel in the vicinity of airports or using airport lands as their feeding grounds is integrated into their process. The same birds come back year after year. For an airline or airport operator, this bird activity becomes a special cause variation when affecting the planned air travel or airport operations, since it is not an integrated part of their operations. When a common cause variation is manipulated, or controlled, the outcome may deviate from statistical expectations. As noted in this experiment, when the bird activity process is controlled by falconry, both the reliability pattern and frequency were slightly altered. 


The responsibility for improving a process in statistical control lies with management, while front-line personnel may have excellent suggestions on how to do this. Improving a process that is in control may mean changing the average or reducing variation. It is a never-ending process. The system must be changed to improve the process. From the birds’ point of view, their process may now be out-of-control, since a common cause variation was manipulated. The bird control system at this airport changed and monitoring the effect of implemented action verified that the birdstrike counts went down. This is a classical example of how simple, but effective, the concept of a safety management system is. 

With this new information an airline or airport has an opportunity to apply a predictive SMS to their operations. It is not the birdstrike that is predictive, but the bird activity. A root cause analysis can only be conducted of a hazard that the operator has control over, both control over data required for the analysis and control over the corrective action plan. In the bird experiment example, an airline operator has control over publicly available birdstrike data, and they have control over aircraft operations. A root cause analysis may have identified the migratory bird season as a root cause and their control measure may have been to accept the risk, reduce flights to this airport to mitigate aircraft damages, or pause operations during the hours when birds are present. Since an airline is in the business of generating money, it is impractical to reduce, or close down flights due to bird activities. Their own birdstrike data becomes their preferred tool to assess the likelihood and severity in their operations. At this particular airport, with approximately 1.3 mill movements and 5.2 birdstrikes annually, or one birdstrikes per 250,000 movements, any reasonable SMS manager would accept the risk. Zero birdstrike is an unacceptable goal. Both the airport and airline conducted their root cause analyses, their system analyses and is now ready to operate with a predictive SMS. At this particular airport, they continue to track the counts of birds, and birdstrikes, but a root cause analysis is not needed since it is already done for this hazard classification. 


A trap that is easy to fall into, when birdstrike numbers, or any hazards are low, is to reduce or eliminate current mitigation processes. The return of investment (ROI) in an SMS is inverted, with relatively higher investment and fewer occurrences returned. Most often justification for changing the mitigation process is due to cost and the low number of hazards. A Canadian airport voluntarily gave up their airport certificate a while ago since that allowed them to change their mitigation processes and eliminate their safety management system. Just this month they experienced what this trap could cause, by operating without an SMS, which also includes a plan of construction operations. “A privately registered Cessna P210N from (airport) to (airport) was taxiing on the hangar line and fell into an unmarked 3-foot wide strip where the pavement was taken away. The front wheel fell 4 to 5 inches into the construction area. There was propeller damage and engine damage to the aircraft.”





Sunday, October 30, 2022

Remote SMS Manager

 Remote SMS Manager

By OffRoadPilots

The person managing the SMS (SMS Manager) for an airline or airport has more opportunities to positively affect safety processes in an organization when there is a physical distance between the operator and SMS Manager. For the integrity of an SMS program, the person managing the SMS is expected to report directly to the Certificate Holder (CH) and remain independent and separate from both airline and airport operations. 


It is the CH who appoints a person to manage the safety management system, it is the CH who appoints the Accountable Executive (AE), and it is also the CH who maintain the safety management system. The CH is also the operator, or the operator my be any person in charge of operations, whether as employee, agent, or representative of the CH. The two executive positions as AE and SMS Manager play unique roles by their appointed positions to remain independent of airline or airport operator and preserve the integrity of the SMS. The CH appoints two positions to be responsible for meeting the requirements of the regulations on behalf of the certificate holder. Since their roles are to ensure regulatory compliance, these positions are at equal level in an organisation chart. That an SMS Manager is required to make progress reports to the accountable executive at intervals determined by the accountable executive is a component of the SMS and is not an organizational hierarchy position. However, the AE is the final authority for meeting the requirements of the regulations on behalf of the CH.

he Quality Assurance Program (QAP) is component of the SMS and maintained by the CH. The QAP Manager is not an appointed position by the CH but is an administrative position under the SMS Manager to manage and facilitate QAP responsibilities. By placing the QAP under control of the person managing the safety management system, the program’s integrity is achieved by its independence from the operator. A quality assurance program includes an audit function that consists of an audit of the entire quality assurance program carried out every three years, or a series of audits conducted at intervals set out in a controlled manual to be fully completed triennially. This audit function is performed by an operational independent source and by a person who is not responsible for carrying out operational tasks. An operator does not collect and assess data and performs an audit of its own performance unless the risk is accepted by the Regulator due to size, complexity, and nature of its operations. 

The role of an SMS Manager is to implement a reporting system for the timely collection of information related to observations, hazards, incidents, and accidents. Effective SMS Managers collect data in a timely manner and maintain safety compliance oversight by electronic means, rather than by unreliable paper documents.  An SMS Manager identifies hazards and carry out risk management analyses of those hazards. They investigate, analyze, and identify the cause or probable cause of all hazards, and also identify the root cause of special cause variations. SMS Mangers are required to implement a safety data system, by either electronic or other means, to monitor and analyze trends in hazards. The purpose of data collection and trend analysis in SMS is not to find errors, but to collect data to analyse how the system works compared to its expected outputs. As an example; checking the oil level, tire pressure, or adjusting rear-view mirrors in a vehicle is data collection to learn how a system function, and is not data collection to find errors. In addition, SMS trend analyses must be done within an SPC system (Statistical Process Control) which is not based on opinions or emotions caused by any graph charts. I often hear the phrase: "it is nice that the graph has a downward trend” A downward trend could be a latent hazard ready to explode, or it could be a safety improvement. One does not know if it is a safety improvement or not just because the graph is trending downward. An invaluable program to use is to apply p-control charts and xmr-control charts. These two control charts supplement each other with performance (80/20 rule) and timely delivery (UCL - LCL). A primary responsibility for an SMS Manager is to monitor. SMS Managers also monitor and evaluate the ongoing results of corrective actions, monitor the concerns of the civil aviation industry in respect of safety, and determine the adequacy of the training required. Monitoring is achieved by collecting data daily, or more frequently due to size and complexity, and applying control charts to identify drift in operations. Every role and responsibility of an SMS Manager has already been established as a remote function, even if operations and safety share the same office. 

The safety management system in aviation is a product of a continuing evolution in aviation safety. Early aviation pioneers had little safety experience, or practical experience to guide them. Over the years, each reactive approach to occurrences has led to significant gains in safety. However, even with these significant advances, the term "organizational accident" was developed to describe that accidents are related to organizational decisions and attitudes. SMS is an approach to improving safety at the organizational level. A superior SMS Enterprise applies this concept and include system analyses to examine its operations, its impact on sub-systems, and the effect of decisions implemented. SMS allows an organization to adapt to change, increasing complexity, and limited resources. SMS is also about enhancing organizational policies and processes, the organizational culture of leadership management and forward-looking accountability. 


The role of a person managing the safety management system is about processes, and to what level operational processed conforms to regulatory compliance, standard compliance and their safety policy. Since it’s all about processes, an SMS Manager located off-site has greater opportunities to analyze processes independently of operations. A pre-SMS process only expectation was that a safety officer had unlimited powers to fix all unsafe conditions and to make stern statements of the issues. The pre-SMS culture is still alive in SMS organizations, and with the SMS Manger in the office every day, there is a temptation to just “say hi” and ask for an immediate fix. With the SMS Manager at a remote location, this temptation is removed, and the SMS manager has more time to focus on processes. In a successful and effective SMS Enterprise, the person managing the SMS is a confidential adviser to the AE, located in a physical remote location from the operator, independent of operations and is without bias ties to oversight and management by an SMS Enterprise. In other words, a successful SMS Enterprise are using expertise services of a contracted SMS Manager, just as they are contracting other expertise third-party services. This enables the SMS Manager to freely, and without interference, to establish unbiased processes to be presented to the AE for acceptance or rejection. If rejected, the AE must alter identified processes to their own liking, and sign-off in a risk assessment, or system analysis, that the recommendation by SMS Manager was rejected. 

One reason for a safety management system to go off the rails, is that emotions are applied to safety, rather than data, facts and processes. A remote located SMS Manger has a-million more opportunities to successfully keeping SMS on track, than what an in-office employee has.  


There are three tools that an SMS Enterprise cannot effectively function without: The SMS Memory Jogger for out-of-control tests, SPCforexcel to analyze trends in performance and delivery, and SiteDocs as an electronic data collection tool.




Sunday, October 16, 2022

System Analysis

 System Analysis

By OffRoadPilots

A System Analyses is Safety Risk Management (SRM) and is the highest achievable level of a successful Safety Management System (SMS). Systems analysis is the process of studying a system and its interacting systems. System analysis projects are fundamental to define problems or issues, discover opportunities for incremental improvements, and to publish directives or operations plans. System analyses are what makes the SMS a common-sense approach to incremental process changes

When applying safety risk management an SMS enterprise conducts system analyses for implementation of new systems, revision of existing systems, development of operational procedures, and for identification of hazards or ineffective risk controls. When conducting a system analysis, an SMS enterprise considered function and purpose of the system, the system’s operating environment, an outline of the system’s processes and procedures, personnel, equipment, and facilities necessary for operation of the system maintain processes to identify hazards within the context of the system analysis. 


The context of a system analysis is the circumstances that form the setting for an event or observation in terms of which it can be comprehended and assessed. A system analysis is more than checkbox completion, is a comprehensive task to analyze details of how each system interacts with other systems within the analysis. A system analysis includes analysis of common cause variations but excludes special cause variations from the analysis. A common cause variation is a variation required for the system to function as intended. Common cause variations are controlled and managed for the process to produce a desired output. The difference between an intended output and desired output is that an intended output is a process where common cause variation is without control action, and a desired output is a process with a control action applied. 

The vast majority of issues come from common causes of variation, due to the way processes are managed on a day-to-day basis. If special causes of variation are present, a root cause analysis mut be conducted to identify the issue and for a process to change course of action. The only effective way to separate common from special causes of variation is through the use of SPC control charts. A process is in statistical control when only common cause of variations are present and this is determined by examining SPC control charts. When there are no points above or below the upper and lower control limits and without trends, then a process is said to be in statistical control.


For a system analysis to be effective and make a difference, an identified hazard is within the context of the system analysis. The context of an analysis is the area, or segment of operations affected by the event or observation. A new gate assignment at an international airport may affect flight operations, dispatch, and maintenance, while a new parking location for a single engine freight carrier, the pilot might be the only context of a system analysis. 


Within a safety management system there are five generic features to characterize a SMS. There is a comprehensive systematic approach to the management of aviation safety within an organization, including the interfaces between the company and its suppliers, sub-contractors, and business partners. There is a principal focus on the hazards of the business and their effects upon those activities critical to air operations or airport safety. In addition to the safe operations of aircraft or airport, there is full integration of safety considerations into the business, via the application of management controls to all aspects of the business processes to safety critical areas. It is crucial for the success of an SMS that there are active monitoring and audit processes to validate that the necessary controls are in place, and to for a continued commitment to safety. The fifth characteristics of an SMS is the use of quality assurance principles, including improvement and feedback mechanisms or tools. 

An SMS enterprise must operate with a process to identify hazards and associated risks, analyze risks, and develop new risk controls that affect multiple processes, or hazard owners, within its organization. A final risk acceptance may be made at a management level above the process owner, by a committee, or by the accountable executive. Processes may be decided by the process owner, while policies are decided on management level. A comprehensive system analysis requires technical knowledge of areas within the context of the analysis and how identified hazards affect those areas. 


A system analysis is an invaluable tool when maintaining a safety management system. At the time of the SMS phase-in implementation, operators were required to conduct a gap-analysis, which is very different from a system analysis. System analyses are ongoing and applied at stages parallel to the process flow. Processes in an SMS system is to operate pursuant to a safety management plan, maintain documentation management, safety oversight, training, quality assurance and emergency response preparedness. For each one of these SMS sub-systems, or components, a system analysis is conducted and applied to air operations or airport operations prior to a complete system analysis of the SMS. 


Audits are prerequisites for a full SMS system analysis. Audit results are unbiased, they are based on facts and paint a true picture of operational processes. Each system, or sub-system audited, becomes an independent system analysis. At the conclusion these systems are combined and will paint a picture of flaws in the operations, or paint a picture of an operation where common cause variations are managed and controlled.   




Saturday, October 1, 2022

SMS Bulletins

 SMS Bulletins

By OffRoadPilots

Safety Management System (SMS) Bulletins is published for current issues and areas of concern. Areas of concern may be based on data and facts, or just an opinion of the SMS bulletin publisher. Opinions are forward looking, while data and facts are backwards looking. One is just as important as the other. For SMS bulletins to be effective they should be published regularly. Just as a newsletter in a small or medium SMS Enterprise, a SMS bulletin communique should be expected to arrive in the inbox monthly

An active safety culture can be considered as the heart that is vital to the continuing success of an SMS, and it gives the dynamic energy needed for a system to provide a continuous cycle of incremental improvement. This can only be developed by leadership, commitment and setting a good example. When an SMS bulletin is published at irregular intervals, or not published at all, is an appearance of a level of commitment to SMS below what is expected of the workers. SMS bulletins offer options to management to justify a safety management system that is versatile, flexible, and fluid. A rigid SMS system is a hazard to aviation safety, while a versatile, flexible, and fluid SMS system leaves room for incremental improvements. A grassroot SMS with left-out checkboxes, but with 100% buy-in is infinitely better than a perfect high-level system with all checkboxes completed but are without commitments. Publishing a bulletin about cognitive lockup, is one example of an ongoing human factors campaign.

SMS bulletins are issued by the person managing the safety management system as a data point to conform to regulatory compliance by monitoring the concerns of the civil aviation industry in respect of safety and their perceived effect on the certificate holder. An SMS is generally defined as a formalized framework for integrating safety into an organization's daily operations, including the necessary organizational structures, accountabilities, policies, and procedures. SMS is a tool that becomes part of an organization's culture, and of the way people go about their work. While individual personnel routinely make decisions about risk, SMS focuses on organizational risk management, human factors, supervision factors and environmental factors, and includes and supports the decision makers. An SMS is scalable and can be designed to meet the needs of a given operation in a way that respects the scop and nature of their work. An SMS bulletins fits in by addressing areas of interest within the scope and nature an SMS enterprise. The scope and nature of an organization are best known by the operators themselves and SMS bulletins are excluded from a one-fit-all. 

An SMS bulletin is a communique with a link to the SMS Safety Policy to provide for a bulletin with accountability. Accountability is a place where there is trust, jut culture, learning and information sharing. An SMS bulletin builds trust between human interactions, human and hardware interactions, human and software interactions and human and the operating environment. An SMS bulletin builds faith in the internal SHELL model. Bulletins are often viewed as a tool to communicate immediate threats to aviation safety, approaching hazards (e.g hurricanes or winter storms), or to communicate common errors by personnel. While these are valid events to publish bulletins, within a safety management system a bulletin is a tool to instill operational awareness of daily regular tasks.     


SMS bulletins are tools to instill awareness and competency in daily operations. When the reasoning for a task is that “we have always done it this way and had no accidents”, it is often forgotten that several years ago multiple fatal accidents happened. Human behavior is to suppress what is unpleasant. SMS bulletins is a path to accountability in a just culture. A common human behavior is to believe that we are quite skilled at multi tasking. Nothing could be farther from the fact. The concept of effective multi tasking is simply a common misbelief, with no basis in science. Task sequencing is very different than multi tasking. This misbelief has led to aircraft accidents. It could be possible that the main component this human limitation is cognitive lockup, which is the tendency of operators to deal with disturbances sequentially. Cognitive lockup can also be defined as holding on to a task or sticking to a problem, which yields a reluctance to switch to an alternative task or problem. An extremely high-profile event exemplifying this was the December 1972 EA L1011 Flight 401 accident when the crew was troubleshooting a malfunctioning landing gear indicator light.

Cognitive lockup happens when the operator focuses on an immediate threat or fault and forget the other interacting systems. Attention tunneling accompanies cognitive lockup narrowing of focus on the immediate threat to the exclusion of other simultaneous competing task demands, e.g., focus on gear light vs. flying the aircraft. Cognitive lockup also yields to emotional hijacking when brain disagree with the actual experience. The brain declares an emotional emergency, causing a reduction in the rational brain’s problem-solving capabilities. Mitigations to cognitive lockup is to recognize of separate external pressures upon pilots’ time and task. Recognition of high cognitive lockup threat during the different phases of flight and especially during critical phase of flight such as arrivals and departures. 


Training in recognition and skillset of positive task switching brief evaluation of task priority assessment. Development and training with decision support tools and practice apply these tools regularly in training. Cognitive lockup is not only applicable to pilots, but also to airside crew and air traffic services personnel. Human factors training is also a regulatory requirement before being assigned tasks airside to optimize the human factors interface within the concept of SHELL. 




Saturday, September 17, 2022

Root Cause

 Root Cause

By OffRoadPilots

The origin of an occurrence travels through multiple stages until it is analysed as a  root cause. When it comes to aviation safety, prevention of accidents and the  Safety Management System (SMS), conventional wisdom is that there could be  multiple root causes causing an occurrence. There might be multiple root causes,  but there is only one primary root cause breaking away, leading the way to define  the scope of the root cause analysis. The fist step in a root cause analysis is not to  learn why an occurrence happened or why a latent hazard became an issue, but it  is to assign the scope of the analysis to multiple root cause factors. One reason for  assigning predetermined root cause factors is to work within a structured analysis  system. SMS is also a businesslike approach to safety. The aviation industry put a  safety management system in place as an extra layer of protection for incremental  safety improvements. When conducting a root cause analysis outside of a  structured system, the analysis is without directional control. When working outside of a structured systems, opportunities and failures are allowed to be  introduced in the process to follow the path of least resistance with a guaranteed failure of a root cause analysis. 

A lightning strike is a symptom and not a root cause

A root cause analysis needs  to be analysed in a 3D system measured in time  

(speed), space (location), and  compass (direction) and  within the scope of human factors, organizational factors, supervision factors  and environmental factors. A 3D analysis system places the environment of events. However, assigning and implementing changes to operations based on a root  cause analysis is not a guarantee that same or similar occurrences are eliminated in the future. 

This is a fundamental principle of an SMS and published by ICAO that  “Safety is not risk free.” An SMS regulation states that an SMS Enterprise needs a  process for the internal reporting and analyzing of hazards, incidents, and  accidents and for taking corrective actions to prevent their recurrence. Conforming  to this regulation does not guarantee elimination of future occurrences, but a  corrective action under the control of the enterprise that could have prevented  the non-compliance. The purpose of a root cause analysis is to predict with a 95%  confidence level the probability for a successful outcome without an unscheduled event. There are several more contributing factors beyond the control of an  operator than there are factors under their control.  

A root cause analysis is not filed in SMS but is traveling on the trip

It is crucial for the successful application of a root cause to know what a root cause is not. A root cause analysis is not perfect, it is not the magic wand of miracles for accidents never to happen again. A root cause is not a system where prescriptive expectations are applied as regulations. A root cause statement is not a one-size-fit-all model, a root cause is not a model where everything is grouped. A root cause analysis is not about emotions, wishes or dreams, but is an imperfect system applied to proactive processes. Working with an imperfect system opens millions of doors of opportunities for  improvements, while a perfect system is ridged without justifications to be  changed. We all know the saying “If it ain’t broke don’t fix it.” 

A safety management system is about human behaviors and how external events affect internal emotions and human behaviors. This makes a root cause analysis  different from a root cause analysis of mechanical or tangible items. A root cause  analysis of material strength only needs one special cause variation, or one failure, 

to conduct a root cause analysis of its system. Material is reliable and when  produced the same way will provide the same output. Human factors are different,  that the same input, such as training and learning, does not provide the same  operational output between different people.  

A Non-Destructive Testing system (NDT) is a system to detect flaws within a  material or on its surface, and to established if production process produces flaws  or failures. There are different independent systems within an NDT system and  none of these systems are compatible to interact with the other systems. Some  frequently used NDT inspection process are X-ray, ultrasound, magnetic particle,  fluorescent penetrant, or acid inspections. The system of X-ray inspection is  applied to inspect for flaws within a material to relatively fine and defined  resolutions. Ultrasound is also applied to inspect for flaws within a material, but to  a relatively course and undefined resolutions. Magnetic particle inspection is  applied to both internal and external material flaws discovery. NDT inspection  system is applied to external inspection of flaws is the fluorescent penetrant  inspection. Acid inspection is a surface inspections of material temperature  variations. Within an NDT system all these independent systems function to  produce an outcome of an effective system that will function as it was designed to  function. None of these methods of NDT inspections are inferior to one or the  other, they are just a part of one total system to manage, or lead processes to  produce a flawless output. 

In the same way as an NDT system defines the scope of its intended inspection,  and the scope of a root cause analysis after a failure discovery, a root cause  analysis within a safety management system must also define its scope and root  cause analysis factor. In a material failure root cause analysis, the scope is  predefined and could be of the mixtures, the oven temperatures, the vacuum  chamber, the manufacturing process or the assembly process. Without defining  the scope, a root cause is only an opinion of the 5-Ws and How. A root cause  analysis within an SMS Enterprise establishes human factors, organizational  factors, supervision factors and environmental factors as their primary scope of  analysis. Several other factors could be added, such as mechanical factors, 

electronic factors, material factors, economical factors, ergometric factors and  more.  

Assume for a moment that there was a flaw in a compressor disk bult for extreme  high RPM. An undetected microscopic flaw could cause a major destruction to the compressor itself and equipment it was powering. When a flaw or material failure  is discovered the scope of the root cause must first be decided on. The root cause  could be of human factors, inspection processing factors, material composite  

factors or manufacturing factors. Each factor may have contributed to the flaw,  but only one factor would be the primary root cause for a corrective action plan.

Jumping to conclusion could end up in a crash

A root cause analysis within an SM Enterprise is prone to pre-analysis conclusions or jumping to conclusions without first determining the scope of analysis. When a root cause analysis is assigned to a responsible person, the first step is to ask the 5-why root cause is predetermined, the analysis question. When the first Why-questions demands a trail that leads to a predetermined answer. A root  cause analysis outcome may be affected by intimidation, or high-level management demanding root cause to be identified as human errors. Should an  SMS manager oppose their demand to jump to the human error conclusion, senior  managers may become verbally abusive and feeling ignored, that their opinions  are not important, and find it shocking that their SMS manager is running a  program that nobody have control over. This is a virtual scenario, but with a  probable likelihood to occur. A root cause analysis needs to first establish the  scope to remain neutral.  

The first purpose of a root cause analysis is to identify system level findings non compliances that show a system-wide deficiency of an enterprise system. Examples of system findings are safety management system, quality assurance  program, operational control system, maintenance control system, or a training  program system.  

The second purpose of a root cause analysis is to identify process level findings of  an enterprise process which did not function and resulted in non-scheduled  output. Examples of processes applicable in various aviation industry sectors  include, but could be documentation control process, safety risk management  process, internal audit process, or emergency response testing processes.  

When a root cause analysis has established its scope and purpose, corrective  action assigned has an opportunity to successfully prevent further occurrences.  


Accepting or Rejecting Risks

  Accepting or Rejecting Risks By OffRoadPilots A ccepting or rejecting risks is a fundamental principle in a successful safety management s...