Working In Uncertainty

Post-implementation project failure: Internal control weaknesses and project risk management

by Matthew Leitch; first appeared on www.irmi.com in April 2008

Stage 1: Good intentions

Stage 2: First cracks appear

Stage 3: Rising panic

Stage 4: Champagne

Stage 5: Emerging problems

Stage 6: Long recovery

What about Terminal 5?

What lessons might be learned?

Conclusion

Stage 1: Good intentions

People on the project responsible for designing and implementing controls around the process or system think widely about what they need to do and write a long list of things to do. It may be a good list. They make plans. They know management reporting is needed.

Stage 2: First cracks appear

Once the project is underway it may be that things start to fall behind schedule and some strain is felt. Business analysts and process designers take all the time they are allowed and then a little bit more, leaving the controls team with very little time to respond to what is happening. At the same time, they gradually become aware that they are not as skilled or as productive at controls work as they had imagined.

Stage 3: Rising panic

As project health continues to decline, stress rises and people begin to revisit their original intentions and make cuts. It is justified as ‘focusing’ but in reality this is a kind of tunnel vision that gets progressively narrower. The controls team goes back to its original list of things to do and asks ‘What do we really, really need to do by go-live?’ Systems people begin to regard controls work, and management reporting in particular, as ‘nice to haves’ that they can leave until after going live. Some people can't take the pressure and drop out of the project. Testing gets de-scoped and compressed.

As project health declines the likelihood of high error rates and backlogs on going live increases so controls development becomes more important and should be boosted. Instead, what usually happens is that it declines with everything else, or more so.

Stage 4: Champagne

It's taken long shifts, working through weekends, and the project is a few weeks late, but it's over. Hooray. Champagne is opened and everyone celebrates. It was a success. We've done it.

Stage 5: Emerging problems

Initially there is little or no evidence of things going wrong, but after a short while the first, feint indications of problems under the surface begin to emerge. As they are investigated more problems come to light and this eventually reveals a ghastly mess of faulty data, stuck transactions, or lost items. It's too late to go back to backups. Thousands of incorrect cases already exist and the reason this wasn't visible immediately is that not enough checks were being done. The controls weren't in place. Worse still, now that there's a lot of work to do the management information and supervisory control system needed to manage it is not in place either. These could have been developed and tested before going live as part of other testing, but that was de-scoped.

Stage 6: Long recovery

There's nothing for it but to bring in temporary staff to cope with the extra work. It may take months to work through all the mistakes and correct them.

What about Terminal 5?

At this stage few details have emerged about what went wrong and some of these details are slightly different from a typical melt down. So far, what has been reported by journalists and the companies involved includes the following points.

The construction of Terminal 5 and its associated transport links cost Ł4.3bn and yet was completed on time and on budget. The number of deaths and injuries during construction was exceptionally low. By all accounts this was a project where risk was well managed, in part because of an innovative approach to contracting that encouraged companies to work together to solve problems rather than waste time assigning blame.
The baggage handling system was assembled and tested in the Netherlands before being delivered to Terminal 5 for more testing, many months before the terminal was to open.
Extensive ‘live trials’ of Terminal 5, including baggage handling, were conducted in the months leading up to going live and these involved some 15,000 people, many of them travellers.
At 4am on 27th March 2008 the terminal opened for business at a planned 70% of full capacity.
That morning terminal staff struggled to find their parking area thanks to poor signage and incorrect instructions. When they got there they found it was too small. They were delayed in getting into the building by security checks and by not knowing their way around the terminal, and then many were unable to log into the computer system. Consequently, when the terminal opened to passengers no desks were opened and passengers just formed a queue. (They had already struggled to find parking themselves but this was just the start of their bad day.)
Delays moving baggage to and from aeroplanes and passengers developed immediately.
There were too few storage bins.
By early morning there were baggage backlogs because staff could not keep up with the flow of baggage coming from the belts. Union representatives were quick to say that staff had received insufficient training and that the union had told BA in advance that the people would not be able to keep up. A baggage handler spoke to the BBC and said that problems had been obvious during the ‘inadequate’ training and had not been sorted out.
That afternoon a luggage belt in the departure lounge broke down.
By midday BA had cancelled 20 flights. Some flights took off without baggage. BA offered passengers the option of flying without their baggage or not flying at all. Roughly 20% of flights were cancelled on the first day.
At 4.30pm check in was suspended completely.
Many passengers were forced to stay overnight in hotels while waiting for the next flight that could take them. BA wrote to them saying it would cover reasonable costs and giving a maximum limit for hotel room costs. Unfortunately actual room costs were higher, and it has been reported that some hotels may have raised their prices to take advantage of the situation. It rapidly emerged that BA's limit was illegal, and fines seemed likely in addition to heavy expenses.
At least one passenger was quoted by newspapers as saying that he had participated in the ‘live trials’ and had seen so many problems during them that he was not at all surprised at what had happened.
For over a week BA cancelled a proportion of flights every day and the baggage mountain was not cleared. On one day BA estimated they had 20,000 items of baggage to deal with while BAA estimated it was more like 28,000. Not knowing exactly put BA in a particularly bad light.
BA decided to send baggage by lorry to a courier company in Milan, Italy to be sorted, a decision that seemed to the British public absurd and desperate, even though BA said it was ‘standard practice’.
Cruelly, on the weekend that BA announced there would be no more cancelled flights, snow fell, resulting in cancelled flights.
The transfer of long haul flights from Terminal 4 to Terminal 5 was postponed.

Overall, a terrible period for everyone affected.

What lessons might be learned?

The most obvious lesson from the Terminal 5 baggage fiasco is that a live trial is still a trial. There's nothing quite like live operation and while trialling can help a lot, it's not enough on its own.

BA seems to have decided that the results of trialling were sufficient to justify starting at 70% of capacity on day one, when starting at 10% or less seems to have been a more reasonable approach. The background has not been made public but, typically, if you deliver incrementally it is possible to start delivering earlier than if you try to deliver everything (or nearly everything) on day one – and you can learn a lot this way.

After the trialling there would still have been uncertainties about the productivity of people involved in baggage handling and the reliability of the systems. These uncertainties should have been recognised, measured, and resolved through progressive increases in live operations.

As I have argued many times in previous articles for IRMI, our human tendency is to under-estimate uncertainties and, consequently, to do too little about them. Most likely, Terminal 5's baggage handling is another example of this phenomenon.

BA seems to have been taken by surprise by events, as suggested by their sending of an illegal letter to customers and the inability to put extra people onto moving baggage as soon as a backlog started to develop on the first morning.

Unusually, there is no evidence at this stage of project health problems, with the exception of reports that baggage handling staff were demoralised and unimpressed by the training. It may be that the baggage system project had problems but the construction work had gone so well that senior people felt that the magic of their risk management approach would work once again.

Conclusion

Terminal 5 has hit the headlines but post-implementation meltdown is a common experience with causes that are easy to understand. It is essential to understand the high risk involved, to implement incrementally if at all possible, and to boost controls development if project health is ever in doubt.

Working In Uncertainty

Post-implementation project failure: Internal control weaknesses and project risk management

Contents

Stage 1: Good intentions

Stage 2: First cracks appear

Stage 3: Rising panic

Stage 4: Champagne

Stage 5: Emerging problems

Stage 6: Long recovery

What about Terminal 5?

What lessons might be learned?

Conclusion

Further reading