What can we learn from the recent troubles at the NHS and BA?
I’m sure we’ve all seen the recent IT troubles at both British Airways and the NHS, and been relieved it wasn’t us in that position.
But aside from the normal feelings we associate with news of this sort there must be some learnings and actions we can all take to help prevent our organisations making the news.
AT the NHS – I think the obvious issue was about patch and vulnerability management. There are plenty of third party tools that can help us to understand what vulnerabilities we have (eg Qualys), but the drive from business management needs to be there to make sure that funds and resources are available to complete activities and make full patch management a genuine part of business as usual for IT. The less obvious issue must relate to business continuity planning. Had any thought been put into this kind of risk scenario, and if yes, what recovery time were managers expecting to be able to achieve? Again BCP is one of those areas that can be seen as an after thought or an area where savings can be made but this is really not the case.
I hope all IT teams out there are using this as a poster-child to demonstrate the value of these activities to the general business management. Make sure you dust off your BCP plans and take account of the issues we have today.
The BA issue was different, in that it wasn’t a deliberate attempt to cripple the organisation, rather a self-inflicted wound, however the lessons around making sure business continuity plans are available and actually work in real life scenarios are the same. I hope by now every data centre manager has investigated how interruptible their uninterruptible power supply actually is, and have looked at what their restore procedures would be in the event of a genuine power outage across the whole data centre. Reviews of the back-up facility and recovery times should also be carried out to make sure that you genuinely can bring back the most important systems within their agreed recovery times. I doubt that BAs recovery time for its booking system was really 48 hours, so there must be something that went awry in either their procedures to invoke DR or their recovery procedures. I’m sure those of us on the outside will never know, but that doesn’t prevent us from making sure these things don’t happen to us.
In short: I think the message here is that we should always pay attention to the risks that might affect our organisations, consider each scenario and make sensible plans to deal with them.