Saturday, September 17, 2022

Root Cause

 Root Cause

By OffRoadPilots

The origin of an occurrence travels through multiple stages until it is analysed as a  root cause. When it comes to aviation safety, prevention of accidents and the  Safety Management System (SMS), conventional wisdom is that there could be  multiple root causes causing an occurrence. There might be multiple root causes,  but there is only one primary root cause breaking away, leading the way to define  the scope of the root cause analysis. The fist step in a root cause analysis is not to  learn why an occurrence happened or why a latent hazard became an issue, but it  is to assign the scope of the analysis to multiple root cause factors. One reason for  assigning predetermined root cause factors is to work within a structured analysis  system. SMS is also a businesslike approach to safety. The aviation industry put a  safety management system in place as an extra layer of protection for incremental  safety improvements. When conducting a root cause analysis outside of a  structured system, the analysis is without directional control. When working outside of a structured systems, opportunities and failures are allowed to be  introduced in the process to follow the path of least resistance with a guaranteed failure of a root cause analysis. 

A lightning strike is a symptom and not a root cause

A root cause analysis needs  to be analysed in a 3D system measured in time  

(speed), space (location), and  compass (direction) and  within the scope of human factors, organizational factors, supervision factors  and environmental factors. A 3D analysis system places the environment of events. However, assigning and implementing changes to operations based on a root  cause analysis is not a guarantee that same or similar occurrences are eliminated in the future. 

This is a fundamental principle of an SMS and published by ICAO that  “Safety is not risk free.” An SMS regulation states that an SMS Enterprise needs a  process for the internal reporting and analyzing of hazards, incidents, and  accidents and for taking corrective actions to prevent their recurrence. Conforming  to this regulation does not guarantee elimination of future occurrences, but a  corrective action under the control of the enterprise that could have prevented  the non-compliance. The purpose of a root cause analysis is to predict with a 95%  confidence level the probability for a successful outcome without an unscheduled event. There are several more contributing factors beyond the control of an  operator than there are factors under their control.  

A root cause analysis is not filed in SMS but is traveling on the trip

It is crucial for the successful application of a root cause to know what a root cause is not. A root cause analysis is not perfect, it is not the magic wand of miracles for accidents never to happen again. A root cause is not a system where prescriptive expectations are applied as regulations. A root cause statement is not a one-size-fit-all model, a root cause is not a model where everything is grouped. A root cause analysis is not about emotions, wishes or dreams, but is an imperfect system applied to proactive processes. Working with an imperfect system opens millions of doors of opportunities for  improvements, while a perfect system is ridged without justifications to be  changed. We all know the saying “If it ain’t broke don’t fix it.” 

A safety management system is about human behaviors and how external events affect internal emotions and human behaviors. This makes a root cause analysis  different from a root cause analysis of mechanical or tangible items. A root cause  analysis of material strength only needs one special cause variation, or one failure, 

to conduct a root cause analysis of its system. Material is reliable and when  produced the same way will provide the same output. Human factors are different,  that the same input, such as training and learning, does not provide the same  operational output between different people.  

A Non-Destructive Testing system (NDT) is a system to detect flaws within a  material or on its surface, and to established if production process produces flaws  or failures. There are different independent systems within an NDT system and  none of these systems are compatible to interact with the other systems. Some  frequently used NDT inspection process are X-ray, ultrasound, magnetic particle,  fluorescent penetrant, or acid inspections. The system of X-ray inspection is  applied to inspect for flaws within a material to relatively fine and defined  resolutions. Ultrasound is also applied to inspect for flaws within a material, but to  a relatively course and undefined resolutions. Magnetic particle inspection is  applied to both internal and external material flaws discovery. NDT inspection  system is applied to external inspection of flaws is the fluorescent penetrant  inspection. Acid inspection is a surface inspections of material temperature  variations. Within an NDT system all these independent systems function to  produce an outcome of an effective system that will function as it was designed to  function. None of these methods of NDT inspections are inferior to one or the  other, they are just a part of one total system to manage, or lead processes to  produce a flawless output. 

In the same way as an NDT system defines the scope of its intended inspection,  and the scope of a root cause analysis after a failure discovery, a root cause  analysis within a safety management system must also define its scope and root  cause analysis factor. In a material failure root cause analysis, the scope is  predefined and could be of the mixtures, the oven temperatures, the vacuum  chamber, the manufacturing process or the assembly process. Without defining  the scope, a root cause is only an opinion of the 5-Ws and How. A root cause  analysis within an SMS Enterprise establishes human factors, organizational  factors, supervision factors and environmental factors as their primary scope of  analysis. Several other factors could be added, such as mechanical factors, 

electronic factors, material factors, economical factors, ergometric factors and  more.  

Assume for a moment that there was a flaw in a compressor disk bult for extreme  high RPM. An undetected microscopic flaw could cause a major destruction to the compressor itself and equipment it was powering. When a flaw or material failure  is discovered the scope of the root cause must first be decided on. The root cause  could be of human factors, inspection processing factors, material composite  

factors or manufacturing factors. Each factor may have contributed to the flaw,  but only one factor would be the primary root cause for a corrective action plan.

Jumping to conclusion could end up in a crash

A root cause analysis within an SM Enterprise is prone to pre-analysis conclusions or jumping to conclusions without first determining the scope of analysis. When a root cause analysis is assigned to a responsible person, the first step is to ask the 5-why root cause is predetermined, the analysis question. When the first Why-questions demands a trail that leads to a predetermined answer. A root  cause analysis outcome may be affected by intimidation, or high-level management demanding root cause to be identified as human errors. Should an  SMS manager oppose their demand to jump to the human error conclusion, senior  managers may become verbally abusive and feeling ignored, that their opinions  are not important, and find it shocking that their SMS manager is running a  program that nobody have control over. This is a virtual scenario, but with a  probable likelihood to occur. A root cause analysis needs to first establish the  scope to remain neutral.  

The first purpose of a root cause analysis is to identify system level findings non compliances that show a system-wide deficiency of an enterprise system. Examples of system findings are safety management system, quality assurance  program, operational control system, maintenance control system, or a training  program system.  

The second purpose of a root cause analysis is to identify process level findings of  an enterprise process which did not function and resulted in non-scheduled  output. Examples of processes applicable in various aviation industry sectors  include, but could be documentation control process, safety risk management  process, internal audit process, or emergency response testing processes.  

When a root cause analysis has established its scope and purpose, corrective  action assigned has an opportunity to successfully prevent further occurrences.  


Friday, September 2, 2022

SMS Focus Group

 SMS Focus Group

By OffRoadPilots

A Safety Management System (SMS) Focus Group is a place with ongoing learning activities, assessments or events, virtual event discussion, or anything else in safety one can think of. Focus groups discussions tend to capture deep and more personal responses from consumers rather than purely quantifiable data. Focus groups can assist with identifying and analyzing hazards identified in the risk assessment process. An SMS focus group works well for consulting with workers and enable the collection of meaningful data on perceptions of their work environment. Some of the reasons for a focus group are to obtain more detailed information and insights into the importance of hazards, to better understand opinions and issues regarding the work environment, to establish a safe and open environment, or just culture, to express views, to provide a broad representation of diverse ideas and experiences on safety assurance topics, and to generate strategies and solutions for addressing hazards. A focus group may be closed and only available to members, or it could be an open focus group with transparency within a just culture. 

There are several responsibilities for the person managing the safety management system (SMS Manager). Two of the responsibilities of an SMS manager is to determine the adequacy of the training for personnel, and to monitor the concerns of the civil aviation industry in respect of safety and their perceived effect on the holder. Both responsibilities are achieved through SMS focus groups and acceptance of accountability. An SMS manager is accountable to the Accountable Executive (AE) and their conformance to regulatory requirements paints a picture of how well the AE perform their duties to be responsible for operations and their accountability level on behalf of the certificate holder for meeting the requirements of the regulations. 

Pre-SMS accountability was to hold someone accountable, or to find someone to point the finger at, and then punish that person. This old-fashion accountability principle was based on a concept that all systems were fail-free, that any system could not fail, and the reason for failure was simply that personnel deliberately ignored their tasks. A person causing an incident was considered to a bad apple within the organization. What was forgotten was that bad apples are bad because of their treatment, or lack thereof. The simplest way for a manger to identify bad apples, is to walk up to a mirror and take a close look at the person responsible. Imagine there is a box of apples in an uncontrolled environment, where the temperature varies with the day and night temperature. When you open the box, there are one or two bad apples on top. This is your root cause. In your opinion the root cause was these two bad apples, and the apples are removed. A week later you open the box again and there are several more bad apples in the box. This goes on for weeks and apple after apple are fired, or removed, until the box is empty. When all bad apples are gone the goal not to have any bad apples within the organization is reached. What was forgotten, was that these apples became bad because their treatment, or lack of treatment, and were not given proper treatment prior to be placed in the box. Apples are sensitive creatures, just as the human mind is a delicate operations system. When apples are picked, they must carefully be placed into the box, and the box must be placed in a temperature-controlled environment. Bad apples happens when an organization does not comprehend the system. 

Accountability is to bring solutions to problems, as opposed to complain and do nothing about it. As example, if a copier is not working, call the mechanic to repair it instead of complaining that it’s not working. Action demonstrates leadership and shows accountability. Accountability is to be proactive and do tasks required to meet a goal. Accountability is demonstrated by taking charge and being proactive. This does not imply that accountability is to do some one else’s job, or take on tasks beyond knowledge or capability, but it is to accept responsibility to initiate an action to solve a problem or remove a hazard. Submitting an SMS report could be the only action needed. However, if the hazard was Foreign Object Debris (FOD) on the ramp, then remove the FOD prior to submitting the report. Accountability is forward looking accountability. Imagine, if in the year 1857 you told the transportation experts that carriages will be travelling at 80 miles per hour in opposite direction and separated by only a painted line. Nobody would believe it could happen, because horses could not travel 80 MPH, even if 300 horsepowers were placed in front of the carriage. At a minimum, transportation experts would expect the carriages to be separated by a stone wall, or wooden fence. But 150 years later vehicles are travelling at 80 MPH only separated by a painted yellow line and the system works because of forward looking accountability. When backwards looking accountability is applied, harm has already occurred, and the past cannot be changed.  


Speaking up about problem areas is accountability and helps bring teams together to find solutions. A person who brings constructive criticism and solutions to the table shows character, personality, and leadership skills. One of the most difficult tasks of accountability is to admit personal mistakes and misjudgments, or to be included in an action which was a contributing cause to an incident. Accountability is taking ownership of actions and actions of the team supervised. If you made a mistake, admit to it and learn from it. Moreover, when mistakes are accepted, several doors open up with multiple paths for solutions. Accountability is to accept criticism. As a manager, if a team member tells you that goals don’t make sense, listen, and implement the goal-setting process. Another great example of accountability is to stay focused on achieving goals and tasks. Communication is another key to accountability in the workplace. Communication helps to establish goals and accomplish them efficiently.  Accountability is to communicate calmly, clearly, and patiently, and despite disagreements, demonstrate maturity and is a great example of accountability. Showing up is one of the greatest examples of accountability in the workplace. Suppose you don’t have any task to do as a leader. However, your team has several tasks to do to ensure the goal is met. If you’re not present on the floor to lead, you are not being accountable.


There are several benefits of accountability, from increasing collaboration, promotes performance, higher returns for a business, foster trustworthiness, cooperation, and responsibility. It ensures effective communication. Accountability makes achieving goals easier, ensures cohesiveness and enables the team to take on more responsibility. Accountability is also an ingredient of a just culture, where there is trust – learning – accountability and information sharing.  


An SMS focus group foster accountability, generates trust, instill learning, and is an information sharing tool. Activities in an SMS focus group is ongoing, at a minimum with a new task or problem presented monthly. In a large organization, SMS focus groups should be assigned into functional areas of operations. In a smaller organization, everyone in the group is assigned the same task. 


An SMS focus group task might be to analyze comprehensive statements such as the statement below: 


“An SMS is an explicit, comprehensive, and proactive process for managing risks that integrates operations and technical systems with financial and human resource management, for all activities related to operations, maintenance and flight following.


Practically speaking, a SMS is a business-like approach to safety. In keeping with all management systems, a SMS provides for goal setting, planning, and measuring performance. It concerns itself with organizational safety rather than the conventional health and safety at work concerns. An organization's SMS defines how it intends the management of air safety to be conducted as an integral part of their business management activities. A SMS is woven into the fabric of an organization. It becomes part of the culture; the way people do their jobs.


The organizational structures and activities that make up a SMS are found throughout an organization. Every employee in every department contributes to the safety health of the organization. In some departments safety management activity will be more visible than in others, but the system must be integrated into «the way things are done» throughout the establishment. This will be achieved by the implementation and continuing support of a safety program based on a coherent policy, that leads to well designed procedures.” 


An SMS focus group is an approach to conform to regulatory compliance. The approach might be to analyze a comprehensive statement, or a simple analysis of an identified hazard. An SMS focus group is a process to think outside the box, to comprehend the safety management system of an SMS enterprise and opens the doors for unlimited opportunities in aviation safety.




Saturday, August 20, 2022

The Practical Applications of SMS

The Practical Applications of SMS 

By OffRoadPilots

The Safety Management System (SMS) for aviation is a practical system to lead personnel, manage equipment and validate operational design for improved performance above the safety risk level bar. The bar is always set at an acceptable risk level. This level may vary with size and complexity of an organization, prior experience and accepted practices, risk analysis and justification, or simply by arbitrary lines drawn-in-the-sand. There are no rules within a safety management system of what reasoning to use to establish the level of the safety risk bar. One reason could be that we always did it this way, or another reason could be to do what is least stressful and creating less of a workload.  

There are regulatory requirements for a safety management system to include a safety policy, a process for setting goals and attainment of those goals, a process for identifying hazards and managing the associated risks, a process for training personnel and that they are competent to perform duties, a process for reporting and analyzing hazards, incidents and accidents and for taking corrective actions,  a document containing all SMS processes and a process, a quality assurance program, a process for conducting periodic audits and any additional requirements for the SMS to function as intended. Regulatory requirements are applied to a static operation, where there is no aircraft movement or airside operations. Certificates are issues based on expected performance and maintained based on past performance. A practical application of the SMS comes into play when operations are alive. 

If the SMS seems to be overwhelming and practically unreachable objectives and goals, it is a simple solution to move to the Some-Day-Island. On the Some-Day-Island there is no accountability, no responsibility, the island is isolated, there is unlimited safety, it is a perfect place to make excuses and there are no reasons to get up and work with initiatives. Since the Some-Day-Island is located within a personality and not an actual geo-location, there are some traits that recognizes a person living on this island. They would paint a picture of an elephant in the room that is the cause for all their failures. They see all activities to make changes as useless and waist of time and that there is nothing else to learn, since they already know all that there is to know. A person living on the Some-Day-Island expect their organization, being airline or airport, to be in top-notch condition, but still slumbers on the island hoping that someone else, and often someone they don’t know, will take initiative to get them into the top-notch-condition. Living on the Some-Day-Island is a hazardous place, but it is also the place where everything is played safe.    


A misconception often assumed by the Accountable Executive (AE) is that the primary task of an SMS is to complete the check boxes for regulatory compliance. While it is true that the check boxes must be completed for tracking and data-points purposes, this task is incidental to operational tasks preceding the checkbox entries. The primary task is to complete a task that conforms to regulatory requirements. An effective SMS has established regulatory compliance for each individual task within the system. These tasks are monitored, checkboxes completed and by the end of the day their required daily quality control system is completed.


The practical application of an SMS are the acceptable practices. An acceptable practice may be a written procedure to adhere to, or an unwritten acceptable practice that has taken form over time as a reliable and practical process. Aerial tankers, waterbomber and forest fires air suppression are examples when unwritten and acceptable practices take over and applied within a safety management system. When SMS takes control, practical application of the SMS becomes paramount. An aerodrome means any area of land, water (including the frozen surface thereof) or other supporting surface used, designed, prepared, equipped or set apart for use either in whole or in part for the arrival, departure, movement or servicing of aircraft and includes any buildings, installations and equipment situated thereon or associated therewith. In layman’s term, an aerodrome is anywhere an aircraft, including drones, operates.

When a waterbomber scoops up water from a lake, that portion of the lake becomes an aerodrome, and the regulations are applicable to operations. Regulations are not just applicable to the flight crew and air navigation, but also to the aerodrome itself. Aerodrome regulations applies in respect of all aerodromes and includes any surface of land or water where an aircraft operates. The operator of an aerodrome, other than a water aerodrome, shall install red flags or red cones along the boundary of an unserviceable movement area.  By this definition a waterbomber scooping up water may use any portion of a lake to pick up water. On the other hand, a heliport servicing a forest fire area, needs to be cleared and delineated.

The airspace above a forest fire is automatically closed and becomes restricted airspace. Only aircraft authorized into the airspace are allowed to enter. Before taking off from, landing at or otherwise operating an aircraft at an aerodrome, the pilot-in-command of the aircraft shall be satisfied that there is no likelihood of collision with another aircraft or a vehicle, and the aerodrome is suitable for the intended operation. When operating out of a lake, there is little or no data available to assess if the aerodrome is suitable. Suitability of the lake becomes an operational, or mental risk analysis task. Prior to the first pickup, the pilot may circle the lake to collect data for the risk analysis. From above any reefs and low grounds are easily identifiable and the task becomes to remember their location to align their approach and runway between the reefs. The second task of the aerodrome risk analysis is to assess both approach and departure ends for snags, hills, or other obstacles. This analysis may be based on prior experience at that particular lake or based on experience from other lakes with similar length and surrounding terrain. A third task of the aerodrome risk analysis is to assess mobile objects, boats, or recreational use of the lake. 


With several waterbombers picking up at the same lake, the pilot in command of each aircraft establishes a procedure to pick up at the same area, climbing turn in the same direction after takeoff, and to arrive on final for the next pickup in the same order as their previous arrival. When there are operational changes, the first aircraft to change becomes the lead aircraft and for the others to follow. This is a practical use of the SMS and were established decades before SMS became regulatory required. When looking at SMS as a practical application, the implementation of SMS did not change any operational processes. Any changes were unnecessary self-imposed by operators. 


When operating in an area where there are few and far between acceptable lakes for pickups, air tankers are used as the primary tool for forest fires suppression. An air tanker loads up fire retardant at the airport and heads for the fire. These turnarounds could be long and time consuming, but when using large aircraft, such as the DC-10, one load of fire retardant covers a large area. Continuous flying the DC-10 at low altitude is a variation of what the aircraft originally was designed for. The aircraft was designed for long haul transportation at high altitudes. During normal operations there is minimal control movements after takeoff and when established oncourse. Flying the DC-10 as an airtanker is a special cause variation and requires a root cause analysis of the hazard. A hazard does not imply that the operations is unsafe or dangerous, but that the operations is different than what an airline DC-10 Captain is trained for. In addition, the continuous strain of low-level turbulence and maneuvering is also different than the original certification. In 2002 two airtankers, of personal interest, crashed during firefighting operations due to material fatigue and the strain this type of operations puts on the aircraft, and pilots. When practical application of the SMS is applied, special cause variations are analyzed, assessed, and classified by its safety critical area and safety critical function. A practical application of the SMS does not eliminate accidents since it is impossible to predict an accident until the last minute. What a practical application of the SMS does, is to accept that there are inherent risks in aviation, learn from accidents, and apply past experience of how operations went well to build on that knowledge for continuous safety improvements.  





Saturday, August 6, 2022

When Safety Gets Involved

 When Safety Gets Involved

By OffRoadPilots

Safety has been involved in aviation since the first flight in 1903 and since then safety result randomly, but without directions, were able to improve. Airlines did what they could to improve safety but were unsuccessful in total elimination of accidents. Over time, as aircraft became larger and more of them at the airports, airside accidents became systemic errors. When operators become overly focused on safety, but they do not know what to do with it, then safety has become its worst enemy. No one wants to expose themselves to danger, but the real hazard when overestimating risks is overcontrolling processes to remain safe. 

Overcontrolling safety is a common reaction to opinion-based root cause analyses. When a root cause is based on preliminary assumptions, there is a strong temptation to overcontrol safety to ensure, in their own mind, that everything possible was done for immediate safety improvements. After a severe aircraft occurrence everyone wants answers, but impatience and instant gratification to find out why an aircraft crashed is a hazard to aviation safety. When the accident investigation process is not understood, management and other positions in an organization who has been assigned safety oversight may demand solutions right now, without knowing all the facts. In support of their demand for a solution, they could make irrational statements with reference to safety, and place blame and responsibilities on lower-level personnel for lack of safety management. A simple solution to protect a high-level position is to play the safety-card. The safety card is played when safety becomes the driving force of operations without considering facts. In addition, the safety card is often played when safety is not defined, measured or when operational pressure is applied from a third party or social media. 


There are no reasons for an immediate finding, cause of accident, or root cause after a single engine aircraft crash, or a large airliner crash. One crash does not render the aviation industry unsafe or demands major changes to operations. If an aircraft crash due to an unreported and unidentified wind share with an extreme change in wind velocity, or an aircraft crash due to contaminated surfaces, there is no justification to cease all aircraft operations, since the aviation industry has already established a track record for being safe. That an investigation is ongoing, and the cause of the crash and root cause are still to be determined, does not imply that an airline must remain idle until an investigation report is published. However, an enterprise is compelled by their accountability to the safety management system to conduct an internal analysis of human factors, organizational factors, supervision factors and environmental factors to determine the factor with highest probability impact on events leading up to the crash. An internal analysis, prior to the final accident report, is a probability analysis as opposed to a root cause analysis. 


There are multiple phases to an aircraft accident investigation. The most common phases are the field phase, the examination and analysis phase, and the report phase. 


In the field phase, an investigator in charge is appointed and an investigation team is formed. The nature of the occurrence determines the makeup of the investigation team, but it can comprise operations, equipment, maintenance, engineering, scientific, and human performance experts. The number of investigators needed to investigate depends on the nature of the event, severity and composition of parties involved. During the field phase the public is informed, the crash site is secured, and pictures or videos are taken of the wreckage and crash site. Overhead drones is a commonly used tool to document facts. Initially, witnesses, airport personnel, company personnel or government personnel are interviewed. Accepting to be interviewed is voluntarily, and information learned from interviews are not used for disciplinary actions against pilots, mechanics or other personnel involved. After the initial facts are documented, the wreckage is removed for further examination by the investigative authorities. The regulator does not investigate aircraft accidents. 

The examination and analysis phase is away from the accident site. This phase consists of examining the company, aircraft, flight crew, training records, maintenance records and safety management system records. SMS is relatively new in the aviation industry but becomes a vital part of an aircraft accident to analyze applied processes. Parts and components of the wreckage may be sent to a laboratory for analysis, such as material strength, metallurgical analyses and both destructive, and non-destructive testing. Any possibilities, but also unthinkable options are analyzed. The examination and analysis phase is an unbiased process without predetermined conclusions. This phase also consists of reading and analyze recorders and other data, create simulations, and reconstruct events, review autopsy and toxicology reports, conduct further interviews, determine the sequence of events, identify safety deficiencies, and update interested parties of progress in the ongoing investigation. If, at any stage of the investigation, the investigator identifies safety deficiencies, they may inform those who can address the problem right away. 

The final phase of an investigation is the report phase where the investigation report is drafted. Selected members of a committee review the draft report and may approve it, ask for amendments, or return it to the investigators for further work. A report may be rejected for any reason but may also approved for any reason. Once there is a consensus to the draft report, it is sent to designated reviewers on a confidential basis for comment. A designated reviewer may be any person at an air carrier, airport, corporation, manufacturer, or association, who, in the opinion of the review committee, will contribute to the completeness and accuracy of the report. After such review with comments, report is amended as required. After this review, the review committee now approve the report to be released to affected parties. For single engine aircraft crash, this reporting process may take 9-12 months to complete, while for a large airline crash it may take 3-5 years. Since a report is a final and conclusive report, any evidence and documents and records are destroyed. 


Overcontrolling safety, or when safety gets involved, is to fall into the instant gratification trap and conclude with a root cause before facts are known. Prior to SMS, the safety manager had all powers, and root cause statement that included the word “safety” was accepted as facts. With the implementation of a safety management system, safety was no longer verbal statements, but an intelligent system where process maturity was allowed prior to making changes, or control, specific item identified. 

A safety management system without statistical process control analysis capability (SPC), is still operating in the pre-SMS era. It is crucial for the validity of an SMS to understand the difference between a process that is in statistical control (stable) and a process that is out of control (unstable). In processes there are variations. A common cause variation is a variation in the process that is required for the process to function as designed, or to function within the laws of nature, or the laws of physics. A special cause variation is a variation introduced to a process that is not a required variation for the process to function as designed. The migratory bird season is a common cause variation and required for the process to work and causing more bird activities around airports in the spring and fall. A flat tire when driving to work is a special cause variation, since it is not a variation required for the process to travel to work.    

If this month's aviation incidents were higher than last month, a question to ask is what happened? This is a common question heard today in many organizations, but many do not know how to answer this. A major barrier to the use of control charts is that SMS enterprises do not understand the information contained in variation. When they understand this information, they will realize that the type of action required to reduce special cause variation is totally different from the type of action required to reduce common cause variation. Control charts also helps SMS enterprises to understand why costs decrease as quality improves, and that pointing faults and blame at personnel is totally wrong.


There are generally speaking two types of mistakes when looking at data. One mistake is to assume that a data point is due to a special cause when in fact it is due to common cause, and the second type of mistake is to assume that a data point is due to common cause when in fact it is due to special causes. There are different corrective action plans for a special cause variation and a common cause variation. A special cause variation needs to be removed, while a common cause variation is to be managed within a safety management system.   


When Safety Gets Involved, is when safety makes corrections, or eliminate a variation in a stable process, or when safety makes overcontrolling the only acceptable procedure. Simplified, when overcontrolling, or a desire for instant gratification in safety is happening, the next control point has moved, and will continue to move farther and farther away from the issue until there is a total and unexpended failure. It is crucial for the success of an SMS to know what battles to fight, but determining a root cause to a common cause variation is not one of them. 





Saturday, July 23, 2022

Unlock the Secrets of SMS

 Unlock the Secrets of SMS

By OffRoadPilots

Secrets of the SMS are not all positive secrets but are also pitfalls to watch for and secretly hidden in processes. On the surface a safety management system (SMS) in aviation is a perfect system to avoid or eliminate accidents. When SMS in aviation was first sold to the aviation industry, it was sold as a tool, if an operator followed the rules, to reduce number of accidents. The sales philosophy was that if an operator didn’t have any major accidents, they would for sure have one in the future if they did not implement the safety management system. Their sales pitch intentions were honorable, and the regulator firmly believed what they sold to operators. Virtual and fictional accident cost scenarios were developed to prove how beneficial an SMS system would be in the future. The predication was that the cost of implementing and maintaining a safety management system would become less significant and well worth the investment when contrasted with the cost of doing nothing. Every operator, being airline or airports, fell for the sale pitch, until they discovered that accidents and incidents, including regulatory findings, still happened. 

The SMS in itself cannot fail since it paints a true picture of an enterprise. What could fail is accountability by enterprise leaders, managers, and personnel. Safety in aviation was in the past built on failures, and failures, or accidents, were expected to occur so that new safety improvements could be put in place. Continuous improvement to safety were designed and implemented after failures and major accidents. What a functional SMS does, is to move these failures from physical harm and accidents to virtual failures on the drawing board. When drawing board tests fail, the aviation industry saves the world from a whole lot of grief. SMS does not prevent aviation accidents, it just moves accidents from operations to the drawing board. A secret of the SMS is that flaws discovered in the design or application process may be assigned an opinion to minimize the finding and design a short-cut in the process to bypass the failure.  

The safety management system has generated vast gap between design and operations. This is not due to the design of the SMS itself but is due to output expectations and application in the organizational hierarchy. When the SMS is viewed as a tool to interfere with operations to ensure safety, it is brought down from the oversight level, and quality assurance, to the operational level where it does not belong. A system that is implemented where it does not belong is a failed system by the untrained eye. While it appears that the SMS failed, it is not the system that failed, but the implementation process. There is a time and place for everything. This statement is crucial for a safety management system to survive, and it is vital for the success of an SMS enterprise that an accountable executive comprehend time and place in processes. A secret of the SMS is that it paints a true picture of an organization, and when it is brought down to an operational level, it paints a picture of failed management as opposed to a failed SMS system. An SMS at the incorrect level is when findings are prescriptive and assigned to deviations in process outcome rather than to the process itself. 

An SMS integrated at the operations level is recognized by the strain it places on operations to behave in a manner incompatible with the operations itself. A practical example is the flight operational quality assurance system, or FOQA, that most airlines have integrated into their operations. One event trigger may be that an aircraft does not bank at an angle greater than 30 degrees in flight, and when it does, the captain is assigned a red flag and becomes a higher risk to the operations. The two options available in a strong crosswind when turning a visual approach base to final, may require a 35-degree bank to remain on the approach. However, due to the red-flagged policy, a captain may chose to remain at 30 degrees of bank and overshoot the localizer. If the bank is 35 degrees or greater, the captain will at a minimum be questioned and flagged. When placed at the incorrect level in the organizational hierarchy, the system does not fulfil its expectations for improvement to perfectionism in flying. A secret of the SMS when management fails a captain, is the system has identified operational management complacency who opted to comply with a faulty arbitrary rule rather then make operational decisions based on aeronautical science and aerodynamics, which deals with the motion of air and the way that it interacts with objects in motion, such as an aircraft. Attempting to achieve perfectionism in aviation is in itself a hazard to aviation safety.  

A safety management system is in principle a simple tool to provide for understanding and engaging in safe working practices. A safety management system defines what hazards exist in aviation, being airports or airlines, and is established as an oversight tool to identify if operational processes are in control to assess hazards with a reliable track record to cause an incident or accident. The weather is a factor with major impact on aviation safety. Weather is not just hazardous weather, but also a factor in achieving safety in operations. Rightfully so, icing and strong winds, are often associated with catastrophic accidents. However, by comprehending that these hazards are common cause variations, and will always be a part of weather systems processes, aircraft manufacturer, airport operators and airlines has a golden opportunity to place a safety management system where it belongs, at the organizational level of the accountable executive, and applied as a daily quality control and oversight tool. As a reminder, the role of an accountable executive to be responsible for operations or activities authorized under the certificate and accountable on behalf of the certificate holder, or board of directors, for meeting the requirements of all applicable regulations. 


An SMS system is designed to evaluate process expectations in operations. An aircraft on approach may experience a steady crosswind, but instantly, at the same time as the aircraft touches down, the wind velocity suddenly increases to triple strength and the aircraft crashes. A well-designed SMS system is designed to capture a common cause variation, such as a steady wind suddenly turns into a special cause variation and triple increase in speed. An airport applying their SMS, a special cause variation, such as approaching extreme winds, may be identified at an earlier point than what currently is the common process, and be tracked over an extended distance prior to reaching the runway. Accidents occur when an aircraft and a special cause variation converge at the fork-in-the-road. An aircraft is operating in a 3D environment and measured in time (hours-minutes-seconds), space (geographical location) and compass (direction). When a special cause variation, also measured in 3D, and an aircraft converges anywhere in a 3D environment, is when accidents occur. Another example of a common cause variation is snow removal in Canada. An airport operator is not required to publish in the aeronautical publications that they operate with a winter operations plan, since it is assumed, due to a common cause variation, that all airport operators will have a winter and snow removal plan. A special cause variation occurs when snow accumulation is extremely low. Parked machinery becomes fragile to the environment and professional skills are forgotten. This requires airport operators to substitute actual snow removal activities with training of their crews. A secret of the SMS is that it uncovers special cause variations of hazards within a process. 


The true secret of an SMS is that the system comes with the capability to find a needle in the haystack when applied at the correct level in the hierarchy.  




Root Cause

  Root Cause By OffRoadPilots T he origin of an occurrence travels through multiple stages until it is analysed as a     root cause. When it...