INTRODUCTION
When a failure occurs, there is a tendency to blame an external environmental culprit (often the ground) and then progress through an aggressive adversarial dispute to determine where the fault lies.
Blaming the physical world around us does not align with the general experience of why failures occur, where it is often demonstrated that it is people that make the mistakes – it’s simply part of our human nature. And our culture also needs to change if we are going to take any real advantage of learning from failures, rather than burying causes in confidentiality agreements and legal bureaucracy. As we already know, we learn more from our mistakes than from our successes, but our systems do not support this adage and we are therefore not taking advantage of important learning opportunities.
WHAT IS A FAILURE OR AN INCIDENT?
For the purposes of the discussion, it was considered that the following definition could be used to describe an incident or failure:
An unacceptable difference between the expected and the observed performance, or a failure to follow a process or to meet a required result.
The key words here were considered to be ‘difference’ and ‘meet required result’, in other words things did not turn out as expected. This opens up a very wide range of possible events that could be considered as a failure – from minor water leakage to catastrophic structural collapse and a fatality. The majority of the project data on failures that is readily available concerns some sort of collapse, had a significant impact on cost and programme, or was an event that significantly affected third parties. This is not surprising because they are the incidents that attract news coverage and capture the public imagination. Although the discussion was therefore based upon this type of incident, it should be remembered that ‘failure’ can also apply to the less newsworthy incidents that affect projects on a more regular basis.
Not all incidents or failures get recorded, or at least do not get reported beyond the confines of the project. Some of the more obvious reasons for this include:
- Too minor an impact.
- Unpredicted event, but no impact upon the project.
- Confidentiality agreements limiting open discussion.
- Developing issue over the longer term outside of any contractual requirements.
In addition, the combined effects of some projects’ remoteness and all of the construction activity that has been undertaken before the development of the internet or social media is likely to have limited the spread of news in many cases. This limits the pool of data to incidents that have been reported widely through the press or at conferences but shouldn’t restrict the consideration of general causes to just that group.
Unfortunately, we can also add to the above list the fact that a fatality, if not the result of another cause affecting construction, has not historically been measured as a failure.
THE DATA
The main existing and publicly available sources of data on failures are listed in the references 1, 2 and 3. References 1 and 2 concentrate more upon the New Austrian Tunnelling Method (NATM) and Sprayed Concrete Lined (SCL) tunnels, although some Tunnel Boring Machine (TBM) driven tunnels are also included. The majority of these incidents are related to some form of collapse. Reference 3 includes the projects from references 1 and 2 along with additional projects and details where these were found to be available. Reference 3 was published on the internet by the Civil Engineering and Development Department of Hong Kong, and is a fairly comprehensive list covering major incidents reported between 1964 and 2014.
The author first analysed this data in 2015 for a project talk, and a year later a paper appeared at the World Tunnelling Conference in San Francisco (see reference 4), which also summarised and grouped the failures in a similar manner. The published results in reference 4 were similar to the ones that the author had established, although there were some minor differences in the precise naming of the failure categories, and within a few of the numbers, some of which can be explained by how each party may have individually interpreted and manipulated the data.
QUESTION THE ACCURACY OF THE DATA
The accuracy of the data, and the way that it has been manipulated for presentation purposes should always be considered. For transparency, some of the reasons why the data and the assessed root causes of the incident may be questioned include:
- Not all incidents are recorded, as already noted, and therefore the data may be skewed.
- The original search criteria may have failed to identify some incidents.
- Reporting bias on the root cause – the ease of placing the blame against the most obvious cause, rather than forensically investigating the cause.
- Assessment bias on understanding of the implied root cause.
- It was also considered necessary to be clear about how the data had been manipulated in order to provide a presentation of the results. For clarity the following manipulation of the data was undertaken for this interpretation:
- Some of data had been ignored to avoid skewed results. For example, one project recorded 131 incidents, but this was assessed as one incident only otherwise it would overwhelm all of the other data. Where multiple incidents had occurred on a single project and the details were clear that they were separate, they had been counted as separate incidents. Where multiple incidents had occurred and separate details had not been provided, they were assessed as one occurrence along with the associated cause(s).
- Reporting of assessed results was based upon percentages, both here and within reference 4. Here, this is the percentage of the mention of the cause rather than the number of incidents, which may obscure some issues. Some incidents reported multiple causes contributing to the incident. For example, within the list of projects ‘the ground’ contributed to 42% of all reported causes, but it was a contributing cause in 65% of the incidents.
RESULTS OF ASSESSMENT OF DATA
The ground and ground stability stand out as the most blamed cause of a failure. The authors original interpretation of the data resulted in 42% of the failures being blamed on the ground, while reference 4 resulted in 45%. The authors concern was that ‘design’ and ‘human influence’ appeared significantly less than anticipated based upon experience. The data was therefore examined in a different manner, deliberately looking for reference to, or contributions from, human decisions or influence. As a result, the modified interpretation is shown in figure 1. This assessment, although a little subjective, reduced the blame on the ground to 29% and introduced human error at 42%. It was noted that more than the established 42% might be blamed on human error, but there was no evidence within the data to support this in the details available
As a check on this result, information from the insurance industry was evaluated. Reference 14 provided additional information which is summarised in figure 2. This places the blame on design error at 41% of the incidents, but with the exception of force majeure all of the other noted causes may also have had a contribution from human decision making. It was therefore considered that a much high percentage could be attributed to human error, and probably to far greater than 60% of failures. This coincided more with experience, but was it realistic?
ARE HUMANS THE CAUSE OF INCIDENTS?
The logic of the way we undertake our work could be considered as follows:
- The laws of physics will apply.
- Materials follow the laws of physics and are predictable.
- Loads and forces are largely predictable, with a few exceptions.
- Designs are undertaken to tried and tested codes and standards and include safety factors applied to the loads and to the conservatively assessed material properties.
- The design and the construction methodologies we use are generally tried and tested.
- The ground we deal with as tunnellers has been in place for millions of years and, unless previously disturbed, will not behave differently today to how it would have behaved many years ago, or sometime in the future.
As a result of these considerations all we really appear to be left with in the majority of cases when an incident occurs is human error, within which we might include incompetence, carelessness, negligence, misunderstanding, poor communication, poor training and others. Why does this happen?
WHY HUMANS MAKE MISTAKES
Our minds analyse and interpret huge amounts of data from the outside world apparently instantaneously, monitoring and controlling our bodies without conscious thought. We learn new skills that are eventually undertaken with the minimum of conscious effort. Our minds are inventive, creative, empathetic, imaginative and flexible. But they are also messy, disorganised, illogical, confusing, sometimes contradictory, and lazy. The mind is the most complex and least understood part of our whole bodies. And every single one of them works differently.
Our minds are also littered with operational bias’s (a predisposition to favour a line of thought) and heuristics (short-cuts in thought processes) that affect the way we think and make decisions. There is a subtle difference between a bias and a heuristic, but they both refer to processes in thought such as confirmation bias, optimism bias, substitution, availability, representativeness and hindsight bias rather than the larger issues of gender, religious, ethnicity or age bias. For those who believe these apply to others but not themselves, there is even a psychological mechanism for that as well – the third person effect.
As a result of these survival related thinking processes, our normal decision making is highly efficient and usually very effective, but mistakes still happen, and they can be characterised as deliberate or inadvertent actions (see figure 3). Within the ‘deliberate’ mistakes category, ignoring greed or sabotage – which have a much larger potential to contribute to major incidents – the reasons people give, when caught in error and questioned, include:
- Low potential for detection.
- Peer pressure, competition, or conflicting requirements.
- Being unaware of the consequences.
- Precedent.
Although sometimes present, our industry does not encounter these too often. Of more relevance, are the reasons for ‘inadvertent’ mistakes, which may be considered as either accidental, or systemic Within the accidental errors category, almost anything that is random can be placed – for example a numerical mistake. The factors that might contribute to making an accidental mistake include:
- Distraction
- Fatigue
- Stress
- Even just how the person is feeling, or the general mood at that time.
Systematic errors are more pervasive and are often the result of organisational or system deficiencies. Factors that allow systematic errors to creep in might include:
- Complexity and distraction – in this case possibly a result of insufficient staff.
- Lack of training to undertake ever more technically complex or new tasks.
- Organisational constraints – such as confusion over limits of responsibility.
Unfortunately, a person’s level of intelligence does not necessarily make them less prone to making mistakes, and some research even suggests that it might make them more prone in some circumstances. We need to be more aware of how our minds operate, along with more understanding concerning the influences, both internal and external, that cause people to make mistakes in order to help reduce error and understand them better when they occur.
THE CONSEQUENCES OF AN INCIDENT
The most obvious immediate consequences that impact a project following an incident are that there is something to clean up, a problem to solve and an issue to rectify.
- This takes time, incurs cost and uses up resources, including any correction or repair, the labour and materials, but also the potential loss of revenue as a result of a delay.
- The error may introduce a functional weakness, which, even if accepted by the project, might mean that the system is operating at a sub-optimum level. This could increase frustration for the user and the owner, increase operational and maintenance costs, and even unintentionally compromise safety.
- There could be a loss of confidence in the project, even if just in the short term, and reputational damage could potentially impact everyone involved.
Disruption will be compounded when the hunt for the party at fault begins. In many, but not all cases, this can lead to finger pointing, and long battles where every side starts employing their own teams of lawyers, solicitors and technical experts within an adversarial process to proportion blame.
AN ADVERSARIAL PROCESS
At its worst, our adversarial process follows the rationalisation of finding and denigrating the persons at fault (losers) and sanctifying everyone else (winners). Winners and losers can be determined by the strongest, or possibly in some cases the most expensive, arguer. It has been shown that in a trial for example, the jury are more convinced by the person who sounds most confident rather than by the facts themselves. And who is to say that this doesn’t affect more than just jurors?
The process to attribute blame can be a major distraction from other work and it doesn’t necessarily achieve unequivocal acceptance from all parties as to the fairness of the result. Everyone becomes entrenched in their own opinion, takes up defensive positions, and there are numerous engineers, lawyers and solicitors for each party ratcheting up the cost.
The adversarial blame process often ends with results being confidential, judgements being slanted by contractual rather than technical arguments, or even the persecution of individuals for only-too-human mistakes. We seem to have created an adversarial process that thrives on stances of moral superiority. We also appear to have done this deliberately.
Perhaps it is time to adopt a different approach to achieve more open results and enhance the learning opportunities.
A ‘NO-BLAME’ PROCESS
This type of culture is already supported by various bodies including within the aviation and maritime industries. It is also the process behind the Rail Accident Investigation Branch. In the aviation industry it is generally known as a ‘Just Culture’ and the basis is that it doesn’t place blame where honest mistakes within someone’s training and experience have occurred.
A Just Culture approach concentrates on the facts and is a more open learning opportunity and is more likely to disseminate the results to those who might potentially make similar mistakes. It doesn’t focus on who to blame and therefore avoids defensive positioning and accusations.
This is not an easy option:
- It requires a conscious decision for adoption at the start of the project and high levels of discipline to accept the findings if invoked.
- It requires cooperation and a shared focus amongst all parties towards a resolution.
- Information must be shared without fear that it may be used against the originator.
If instigated there are still grey areas. Subjective decisions might need to be made concerning what should have been within a person’s experience and training.
An investigation of some sort is inevitable, but how that is undertaken can result in very different experiences, numbers of people involved, and costs. Investigations require a unique approach, and it is therefore difficult to generalise, but the process will require neutral scrutineers acceptable to all sides. A range of skills within the scrutineers will still be required, including, for example, technical, legal and contractual, but resolution should be achieved with less aggravation, fewer diverting accusations and defences, quicker and at a lower cost.
Our industry is considered to have made progress and the worst of the adversarial confrontations are diminishing, but not disappearing. It is moving towards a more open culture that actually encourages reporting, investigation and dissemination of findings. However, it probably still has a way to go. If an incident results in sides being taken, and lawyers and technical experts engaged for each party, then it still haven’t achieved a Just Culture process.
FATALITIES ON PROJECTS
As noted, fatalities in their own right do not appear as incidents within general lists of tunnel failures. A fatality may be a result of an incident, for example a flood or a collapse, but they are not measured as a failure for other causes, such as man-machine interface. They do, however, deserve consideration even if not measured in the conventional data pool.
The projects listed in figure 4 present the published most dangerous projects in terms of the total number of fatalities, but the numbers should be viewed with caution. There is some inconsistency in the numbers when checked against different sources. Also, some of the data is estimated, in particular the ‘rounded’ numbers towards the top of figure 4. These projects often used labour that was considered unimportant at the time and records were not kept with any accuracy. It is also likely that there are other reasons that will affect the accuracy of the tabulated data or result in projects not appearing in the figure at all, such as political repression and expediency, and a lack of tracking of workers health in the longer term.
The highest number of fatalities for a tunnel project was on the Hawks Nest Tunnel (USA). The total number shown is an estimate because the fatalities were largely as a result of silicosis, which didn’t manifest itself until many years after the project was completed and the tracking and recording of workers was incomplete. The projects towards the top of the figure are relatively old, while the tunnelling projects towards the bottom of the figure were constructed between the late 1970s and 2000, and tunnelling is therefore over-represented within the construction industry during this period. The tunnelling fatalities were primarily a result of manmachine interface, use of explosives, flooding or falls of ground. Of these, only the flooding and falling ground events appear as incidents in the earlier data considered on tunnel failures.
Tunnelling, like other areas of construction, has seen a vast reduction in the number of fatalities, particularly since the Channel Tunnel. Table 1 summarises the fatality rates in UK construction covering 10-year periods for the last 40 years, including members of the public who have died as a result of construction activities.
Non-fatal injuries have also decreased significantly, with 103,100 between 2002 and 2011, down to 47,514 between 2012 and 2021. Improvements aside, the numbers are obviously still too high, and the industry needs to change and treat and record injury and fatalities as a failure.
CONCLUSION
Why failures happen are less likely to be due to the physical world conspiring against us and are more likely to be due to our own actions or inactions. Making mistakes is just part of being human and we need an approach and a process that recognises that learning from failure is a necessary ingredient for all of our future successes.
To enable that to happen we require a culture that promotes open discussion and dissemination of results without fear rather than one that promotes stances of protection of self-interest and the considered necessity of confidentiality agreements. The existing ‘Just Culture’ used in other industries might be a good starting point to improve learning opportunities in the industry and prevent the next generation from making the same mistakes that we did.
QUESTIONS AND ANSWERS
Q: Dan Garbutt, Magnox Nuclear Decommissioning Authority: Do you think there should be a role for behavioural psychologists on our project teams to help with cultural change?
Mike King: I believe we do need an understanding of the way that humans operate and the influences on our decision making processes, so there is a role for psychology in our industry. It will make us more understanding about the mistakes that are made and avoid the stances of moral superiority, retribution and blame that we see too often. We need to concentrate on what has gone wrong and disseminate the information.
Q: Bob Ibell, London Bridge Associates (LBA): What impact do you think the introduction of the corporate manslaughter law has had on accident investigation?
MK: This concentrates on punishment rather than technical resolution and avoidance in the future. We obviously cannot have a free-for-all without any justice, so it needs to be in place. A ‘Just Culture’ recognises honest mistakes while punishing negligence.
Bob Ibell: It also changed (reduced) the approach to co-operation.
MK: I agree, and we also lose the opportunity to learn something.
Q: Bill Grose, Independent Consultant: Even with a no-blame culture we still need to tackle ‘who pays’. How do we separate the contractual side from a no-blame investigation?
MK: Part of the issue might be resolved through project insurance rather than individual insurance, but this is only part of it. The investigation will be undertaken by a range of specialists – technical, financial, contractual, legal – and the contract needs to reflect that this will be undertaken by acceptable neutral scrutineers and all parties will abide by their decisions rather than every party employing teams.
Q: David McCann, Jacobs: One of the most memorable things said to me was that unless you can visualize how something fails you will not understand how to design it. Do you feel that today we are too reliant on process and the tools we have and less reliant on an understanding of how structures work?
MK: Yes, you have summarised many of my views. The visualisation of how something is going to break has always been my approach to design, and I’m not sure that the design tools in use today do that in the same way. They result in ‘it stands up or it doesn’t’ based upon the input parameters rather than testing various mechanisms of failure.
Q: Martin Knights, Independent Consulting Engineer and Chair of LBA: A few years ago, I wrote a paper called ‘Mind your signature’ which was about the loss of the controlling mind and accountability. The tendency today is to have checkers checking the checkers; do you think this mitigates the possibility of failure?
MK: A simple answer, no it doesn’t. Despite all the checks – and people use the Swiss-Cheese model of going through every hole before a failure occurs – failures are still occurring, but obviously it helps. I don’t actually believe we always have checkers checking checkers, but I think we might have a problem of checkers sometimes being less experienced than the originators and therefore deferring to their experience when a more robust challenge is required.