What's Holding BC Back?
Confronting the Conundrums of Business Continuity by Jon William Toigo
In government and business, there continues to be more discussion than doing in the realm of disaster recovery and business continuity. One hears a lot of talk about "10/12" - the next 9/11 - which everyone from the familiar crepe hangers and doomsayers to the most heads-in-the-clouds pollyannas agrees is more or less inevitable.
Add to that the weather-related disaster potentials that NASA weather models predict will worsen this year, the well-documented vulnerabilities of aging power and telecommunications infrastructures, and ongoing problems in information technology that range from poor interoperability standards to improved malware and hacking techniques, and you have a confluence of threats that could best be described as the Perfect Storm.
Yet, 50 percent of respondents to just about every survey taken about disaster preparedness report that they have no DR or BC plan whatsoever. And, of those who say they do have a capability, 30 to 50 percent report that they have never tested their plans - which is tantamount to having no plans at all.
There are many explanations for this misalignment between threat and preparedness. Perhaps the biggest hurdles to better alignment have to do with budgetary constraints, an overcomplicating of the planning process, and a failure to design recovery into the IT infrastructure.
In the current economy, many organizations simply lack the coin to allocate to something they regard as "simply more insurance." That is partly because of the way that DRP and BCP are packaged and represented to management. These days, a full business value case is needed to justify just about any acquisition, and business continuity capabilities are no exception. Planners must communicate how the capability will deliver not only risk reduction, but also cost savings and business process improvement, the other two sides of the business value triangle. For example, a user recovery facility can serve dual purposes, perhaps as a visitor meeting center or training center when not needed for recovery. Dual-use goes to cost savings and value.
Moreover, the process by which DRP and BCP is undertaken (when it is done correctly) involves business process analysis and data classification and modeling. Such modeling has enormous potential value to business and IT decision-makers and can be used to gain insight into the costs of lines of business, and to help predict the consequences of changes or additions of business lines to the organization. It may also provide a great framework for helping to solve the data management problems that must be solved for regulatory compliance and resource utilization efficiency in IT.
Process Perceived as Overwhelming
A second explanation for the misalignment of threat and preparedness is the perceived enormity - and perhaps the perceived inefficacy - of planning itself. Anyone who saw the video footage of the Pacific tsunami in late 2004 might well view disaster preparedness as an absurd endeavor, something right out of Camus' Myth of Sisyphus.
This attitude is easy to understand given many contributing factors. Not only does the news-as-entertainment bent of mass media focus more on highly visual disaster events than on the less "sexy" images of recovery efforts, but the fast pace of business and technology change causes the ground under planners' feet to shift as they endeavor to work out recovery strategies. Added to these factors are the cult-like attitudes of too many self-styled experts in DR and BCP who suggest that planning is a Byzantine activity known only to a few privileged practitioners with three- or four-letter acronyms following their names on their business cards.
Simply put, if planning is viewed as an overwhelming task, it will not be done. In order to diffuse this perception, planners need to tackle the problem of preparedness in parts, rather than as a whole, by first developing strategies that avoid preventable disasters, especially those that compromise the organization's two most irreplaceable assets: personnel and data. Planners can deploy fire protection (annunciation and suppression systems) and water detection systems; install uninterruptible power supplies; deploy network, systems, and storage management software; and implement antivirus software. These investments can pay dividends in safeguarding personnel and preventing avoidable downtime and have good internal return on investment arguments to convince the stingiest accounting department.
After these initial investments, planners should then address data issues. A recent survey by Symantec showed that most people aren't making copies of their data via tape or disk mirroring or other methods. Of those who are, few have ever tried to recover files from their replicated media, a simple test that might raise alarm bells about the efficacy of the implemented process.
Data disasters are increasingly the definition of disaster itself: We determine what constitutes a disaster in terms of time to data - how long it takes to restore normalized access to a valid set of data so that work can proceed. Data access is at the heart of recovery, and every click of the "SAVE FILE" key is a disaster in waiting, since modern file systems overwrite the last valid copy of data every time a new version is saved.
This is not to minimize the importance of other component strategies of DR and BCP, such as recovering application hosting environments, networks, or end user work facilities. The logistics and plans developed in these categories are important, but they are always subject to shift in the face of an actual disaster. Nothing ever goes exactly as planned, and implementation of these strategies requires the abandonment of the "script" more often than not. Data recovery, however, is key.
One does not require a certification to plan for a disaster. It requires only a straightforward application of common sense, some knowledge of the options and processes that commonly make up a plan (information widely available), some project management and communications skills, and a good grounding in information technology itself. Contrary to what others may say, the real purpose of the exercise is to get irreplaceable assets out of harm's way and to rehearse recovery personnel so they can think rationally in the face of a great irrationality.
Recovery (Continuity and Resilience )Excluded from Infrastructure Design
Some technically savvier folk are discouraged from planning because of what they correctly perceive as significant challenges posed by technology itself. This is especially the case when disaster recovery considerations have not been designed into the applications or infrastructure that the organization is now seeking to protect.
Years ago, in homogeneous (single vendor) mainframe environments, it was, comparatively speaking, far easier to "bolt on" a recovery strategy after the fact. Hardware and software replacement was a simple matter of a hot site agreement. Data was protected by a disciplined cadre of IT professionals working in their own little world: the glass house.
Today, a combination of distributed heterogeneous computing and business process deconstruction have created a greater threat profile and a much expanded set of recovery targets, many of which defy recovery planning due to features enhancing user-friendliness. Hot site service vendors rub their hands together with glee when a prospective customer approaches them with a requirement to back up heterogeneous client/server infrastructure. Because of the way these multi-tier environments are commonly constructed, their recovery requires server, network, and storage component replacement on a one-for-one basis: a very lucrative engagement for the vendor.
Much of the cost of infrastructure recovery could be minimized by designing recovery into the infrastructure itself. Selecting appropriate middleware, leveraging good standards-based technology wherever possible, and practicing hardware-neutrality in application engineering could go a long way toward building resiliency and recovery flexibility into infrastructure.
In most organizations this isn't being done, mainly because disaster recovery and business continuity planning are still treated as standalone endeavors rather than being integrated into the processes by which technology solutions are first defined and implemented. Fixing the situation may take corporate mandates requiring that recoverability become a criterion in the selection of hardware and software products, the definition of technology infrastructure solutions, and the validation of platforms once deployed. When this occurs, a lot of cost will be driven out of DRP and BCP and recovery will be built into the infrastructure.
The problem is that such a mandate has not been forthcoming. Partly, this is because many disaster recovery and business continuity planners are out of their depth when it comes to technology itself. Many lack the knowledge of technology to communicate effectively with IT or MIS planners, while those steeped in technology lack the ability to communicate in terms and language that business managers can appreciate. Bridging this knowledge and communication gap should be a top priority for all involved in developing a sound BCP.
Another explanation for the absence of a mandate is that common sense alone does not seem to provide an effective driver in many organizations for management interest in such matters. Business leaders do seem to respond well to regulatory mandates, as the current activity around HIPAA, Graham-Leech-Bliley, Sarbanes-Oxley, and new SEC rules suggest.
That is why many of us were counting on the financial regulators, in their review of the impact of 9/11, to establish clear requirements (in the financial sector, at least) for addressing issues such as the required safe distances for data mirroring and recovery site location. Post 9/11 studies showed that the impact of a different terrorist scenario, such as a "dirty bomb" attack combining conventional explosives with low grade radioactive waste products, would have rendered useless most if not all of the recovery strategies of financial institutions who had successfully recovered operations to mirrored data centers across the river from Manhattan in New Jersey. Early on, the Fed was razor-focused on this issue, but when they released their final report, they waffled, stopping short of mandating a required safe distance between primary and backup facilities.
In the absence of clear regulatory requirements that specify certain recovery plan characteristics, engaging management in the specification of recovery-related policies and getting such policies enforced if they are articulated may be a much more difficult task than the planning itself.
From the points above, the requirements for addressing the conundrum of continuity planning in 2005 may seem to be a diverse mixture of the obvious and obtuse. In the final analysis, continuity planning must evolve from a set of tasks aimed at producing a plan document that subsequently must be maintained to a mindset that influences business and technology decisions.
Evolution takes time, and it is doubtful that all aspects of the continuity conundrum will suddenly resolve themselves by December 31 of this year - or even by December 31 of 2010. What we can do, what we must do, commencing today is to develop strategies for preventing avoidable disasters and to apply our common sense and all of our business savvy and technological acumen to work the immediate problem of developing and testing recovery strategies.
It is the most we can do. And it is a lot.
You can contact Jon directly at firstname.lastname@example.org