Why Long-Term CAPs Crash
When the Safety Management System (SMS) regulations came into force, there were little or no guidance material available to design useful long-term corrective actions to findings. Long-term corrective actions were defined by how long time it would take to implement. While short-term corrective actions also were defined by time between design and implementation, short-term corrective action did not change with the SMS, since the fix or repair required to return to normal operations was already in place. If an aircraft engine failed, the short-term corrective action was to change out the engine with an engine that had not failed yet.
Paved roads are a long-term CAPs. No need to change the road after an accident.
A long-term corrective action is a system level change and there are seven levels to a long-term corrective action. The first level is discovery, either by hazard identification, audit finding, or an unplanned event occurrence. The second level is the immediate corrective action, which is an immediate reaction to a hazard, finding or event to establish a degree of supervision and operational management. The third level is the short-term corrective action, or the repair to return to normal operations. The fourth level is the root cause analysis, which is new with the introduction of SMS. The fifth level, and another new element of the safety management system is the long-term corrective action, or the system repair by continuous, or continual changes. For the purpose of a safety management system, continuous is a change to the current system, while continual change is a change of the system itself. A continuous change could be to move from hand-written paper copies to typewriter copies, while a continual change would be to change the system from a paper-document system to an electronic system. At this time in the process, it is unknown if a change implemented is an improvement or deterioration of a system. The sixth and crucial to success of the SMS is to define what the expected outcome of a long-term system change is. The seventh level of a corrective action plan is the analysis of expected outcomes and to compare expectations with actual outcome. Root cause analysis and long-term corrective action are not new to the aviation industry but became new as additional elements for operators to consider since prior to SMS they were only considered by accident investigators and regulators.
All parts, or systems, of a car is not changed out if one of the systems fails.
Systems are inter-dependent processes to achieve a defined result which comprises of policies, processes, procedures, and acceptable work practices. A system is the cause or expected outcome and conditions are the tasks requirements triggered by the system design. A system could be the document and records system, where an expected outcome is to generate data for an SMS enterprise to design, develop and implement action plans. A process is to define the 5-W’s + How (What, When, Where, Who (position), Why and How) to compete a task. A process could be to collect data for flight planning. A procedure are the tasks, sequence and timing of steps required to complete a process. A procedure could be the specific tasks, sequence, and timing of steps to control an engine failure. Acceptable work practices are practices accepted by an SMS enterprise since it is impossible to have procedure for everything. An acceptable work practice could be a person’s operational judgement decision such as to land an aircraft or initiate a go-around.
Long-Term corrective actions are highly influenced by the Accountable Executive (AE) and their opinion of the best approach to achieve their goals. The position of an AE is often the CEO of the company, who has a successful track record in business administration, but without being a data analytical expert is still the final decision maker for safety in operations.
April 28 was World Day for Safety and Health at Work, recognized around the world to draw attention to the estimated 317 million accidents that take place on the job each year across all industries. A common safety statement is to keep safety above all as the priority, meaning that an AE will never sacrifice safety for any other purpose. This is a well intended statement, but without safety analytical expertise the statement falls apart when it continues to read that a safety approach is common sense and simple by never sacrifice safety rules or policies and procedures for any other goal, always adhere to rules, policies, and best practices for ensuring quality service, and report any incidences that negatively impact the safety of team members. If safety was common sense and as simple as to adhere to rules and policies, there would not be any incidents to report. Pilots of the 1957 Grand Canyon crash followed the rules.
Long-term CAPs go wrong because they are not long-term CAPs. They are corrective action plans which takes a long time to complete, but the effect of the CAP is still a short fix, or repair. Long-term CAPs are system CAPs. Systems are not as complicated as we often make them and by making it complicated CAPs often go wrong. The regulator has shown a trend that they do not comprehend long-term system CAPs. This became evident to an operator, who submitted a comprehensive long-term CAP for regulatory findings. The regulator rejected the CAP with the reasoning that it was too comprehensive, that it was complex, and it was detailed, and it was irrelevant to the regulator that the outcome was a simple system long-term CAP. The regulator’s long-term CAP form is no larger than a 3x6 index card. It takes more time to plan a project than it takes to build it. Designing long-term CAPs are operational project plans.
|Long-term CAP is incremental improvements within a system|
A long-term corrective action plan is to provide long-term solutions to correct problems in the system that led to the unexpected event. An unreasonable expectation is that a long-term CAP ensures that this type of event will never happen again. There are no unreasonable expectations or goals, there are only unreasonable timelines. With an exception for the same event to never happen again, the timeline of “never” is an infinite timeline. An expectation of “never” is an unreasonable timeline, since an event which has occurred, will occur again at a later date. History repeat itself. An unreasonable timeline is a reason why a CAP goes wrong. A second unreasonable expectation in a long-term CAP is that all contributing causes and associated systems are corrected. An associated system in a birdstrike event, includes birds. Some of the birds have a system they call the migratory bird seasons. This is a common cause variation system, which is a requirement for their system to work, and it is impossible for anyone to correct that system. An unreasonable expectation is a second reason why a CAP goes wrong. When task with these two requirements to ensure that an event will never happen again and an expectation to change a common cause variation, the trap operators fall into, both airlines and airports, large and small, is to design their CAP to include these items for one reason only, which is to complete the checkbox task to conform to an expectation of what it takes for regulatory compliance. Since regulatory compliance is when operations is in a static state of operations, it is possible to comply by ensuring there are no aircraft movements. However, this is not how the real world works and the purpose of an airport is aircraft movements. When movements are happening, that’s when the regulatory compliance gap comes into play.
Long-term CAPs is not to do root cause analysis and make changes to operations so that an unexpected event never happens again. Making long-term CAP project plans is to design, develop, and operate with safety cases and internal operations plans. When you have these plans in place, the only change, or long-term CAP that is needed, is to make short-term changes to the plans for incremental safety improvements. Take a minute an assess a gravel runway. There are still airports out there that offer gravel runway services only. An airport operator makes a safety case for a gravel runway. Based on the safety case they make a gravel operations plan. Their long-term CAP is now the operations plan itself. In the plan they grade the runway once a month. Then one day there is a runway excursion because of the large ruts in the runway. Their long-term CAP fix is now in their operations plan and the fix is to change grading of the runway to every two weeks and after heavy rain. This is literally how simple a long-term CAP is when an operator comes prepared for it with safety cases and operations plans. The reason for long-term CAP crashes is because they are designed to crash.