Alternate Communications During Times of Disaster

ALTERNATE COMMUNICATIONS DURING TIMES OF DISASTER

By Dr. Jim Kennedy, NCE, MRP, MBCI, CBRM

We have witnessed over the last three to five years many disasters both in the United States and abroad. Based on what we are hearing from NOAA and the National Weather Service the US is likely to see the same number, if not more, tropical storms this year. Storms like those of the size and ferocity of the type that were so devastating to the southern portion of the US in 2005. So, tropical storms in the US , earthquakes in South America and Asia or volcanoes anywhere else on the globe, we, humanity, face another year of potential emergencies that will need to be responded to.

One thing that all of these natural disasters have in common, besides the tremendous loss of life and disruption to everyday lives of the populous, is that they are immediately followed by an almost total loss of the ability to communicate with the outside world. Power is lost, telephone services are discontinued, and cell phone service is either non-existent or is so congested that it takes hours to get a call through.

So, every year, companies and emergency planners face the problem of providing continued communication before, during, and after a disaster strikes their areas. This year, more than any other time, in the southern part of America small, medium and large company business continuity planners are looking for alternatives to standard communications so that they can keep their business and critical operations running in the aftermath of a devastating event.

I thought that I would present some alternatives for the spectrum of business types so that those business continuity planners would have choices to make informed decisions about backup communications from.

Before we discuss back-up communications solutions let’s first discuss the failure mechanisms for the communications used during normal times.

Failure Modes

Most companies continue to rely upon the standard telephone system for their communications needs. In order to provide this service the telecommunications carrier, regardless of where you are located in the world, relies upon either copper wire or fiber optic cables from its central offices to its customers’ premises. This ‘last mile’ can either be above ground, which is in the majority of cases, or underground. We have all seen those graphic pictures of poles and trees uprooted and thrown to the ground after a hurricane or tornado have devastated an area. When this happens that last mile of connectivity between the business and its telephone provider, Internet provider, or application service provider are abruptly disconnected and utility power is lost. Underground cables are not entirely safe from disruption of service either. Many times due to flooding and/or power loss these underground services are disrupted as well. In the case of cell phone providers the cell towers receive your cell phone’s call they then route it to a local central office. These towers or the equipment inside of them can also be damaged or destroyed as well as the last mile circuits which connect those cell towers to the local telephone network. So cell phone service is as tenuous as the regular telephone service when a disaster strikes. I should also mention that the southeast US is not the only area where loss of communications services takes place and hurricanes and tornadoes are not the only natural disasters that disrupt communications and power. In the northeast US over the last several years ice storms and blizzards have also taken their toll on communications and power utilities, for example.

Usually following an event like a tornado, hurricane, blizzard or the like, the communications and power service providers work very hard to restore service, however, in most cases we are talking several days if not a week for the restoration of power and phone service. This restoration time varies depending on the size and intensity of the disaster. If it is localized, as it could be for a tornado, then service could be restored more quickly.

These copper and fiber optic cables also interconnect the local telephone company’s central offices to other central offices in the region and to long distance providers, cell phone carriers, Internet and data communications service providers anywhere in the world. These inter-exchange or ‘long haul’ circuits provide the ability of interconnectivity and communication to beyond the local area. So if your business communicates between offices in Baton Rouge LA and St. Louis MO there are probably several service providers and miles of cables involved in carrying the information from one point to the other. These cables travel above and underground and suffer the same fate as the local last mile circuits do. However, because of the number of calls, subscribers and the importance of these circuits, the carriers or the businesses that use them generally employed circuit ‘diversity’. What this means is that there are multiple paths for the voice or data to travel. If one path fails there is another which can be used to take the call to its intended destination. This works well for such things as car vs. pole accidents, isolated incidents like localized fires and floods, but with mass devastation like we experienced with Hurricane Katrina or the tornadoes in the midwest US, even the diverse routes are consumed in the overall damage toll.

Power is another failure mode. The central offices and cell phone sites have their own power sources in the form of batteries and emergency generators. If the event is limited to a few hours or a few days they will be fully operational. However, it was found that in the case of the hurricanes and earthquakes of the last few years power has been interrupted for several days even up to several weeks and the power plants, central offices, or cell towers in the areas of devastation were inaccessible for most of that time. This meant that the fuel trucks needed to refuel the generators were unable to get to their destinations and subsequently the central offices and cell sites went off-line.

So now that we understand that the power and communications utilities have pl anne d for adverse events, but the intensity and massive area of devastation often make these plans fail. It is left to the individual business owner or operator to determine the criticality of their services and to properly plan for potential communication and power failures that might impact them.

In the next part of this article, I will endeavor to present the alternatives that exist in case you experience a disastrous event with a communication failure.

Alternatives

Before I discuss the alternatives I feel that it is important to note that power is a main component of any recovery or mitigation strategy. That is, without power to run these technologies they will not operate. So, it is important to have reliable and sustainable power for the duration of the resumption and/or recovery effort. If you cannot verify that this is the case then alternate site recovery is the only viable alternative.

Infrared

One such alternative to commercial communication systems is infrared. This alternative is used if a company needs to interconnect two buildings together. Infrared provides an optical data, voice and video transmission system. Like fiber optic cable, infrared communications systems use laser light to transmit a digital signal between two transceivers. However, unlike fiber, the laser light is transmitted through the air. In order for the digital signal to be transmitted and received, there must be clear line of site between each unit. In other words, there should be no obstructions such as trees or buildings between the transceiver units. So, if your wireline or wireless communications fails you can still provide communications between two points. The only drawback is the distance and the line-of-sight requirements.

This solution provides low-cost, high-speed wireless connectivity for a variety of last-mile applications. It provides narrowband voice and broadband data connectivity and the various products provide scalable, wireless alternatives to leased lines. These infrared systems operate at data rates of 1 Megabit to Multi Gigabit speeds and they are deployable in one day, without requiring right-of-way or government permits for installation. They can provide an alternative communication link in hours instead of weeks or months. This is probably not an option for a small business, but for a medium or large business owner the cost is affordable. Cost can range from $10K to $25K per installation capable of distances of up to 1000 meters.

Microwave

Another alternative to commercial communication systems is microwave (wireless). This alternative is used if a company needs to interconnect two buildings together that are spaced farther apart than the conventional infrared can operate (i.e., in excess of 1000m). Microwave also provides a data, voice and video transmission system. Unlike infrared communications systems, which use laser light to transmit a digital signal between two transceivers, microwave uses ultra-high frequency radio frequency (wireless) transmission. In order for the digital signal to be transmitted and received, there again must be clear line of site between each unit. However, the distance that this alternative can span is up to 60 miles as long as no obstructions such as trees or buildings are located between the two locations. If wireline or wireless communications fails communications between two points can still take place. There are several drawbacks to this solution:

  • Distance limited to up to 60 miles
  • Requires an FCC license to operate
  • Right of Way Permits may be required
  • Needs highly trained technicians to install equipment
  • Cost can be prohibited to small businesses

The cost of a microwave system can be between $50K and $100K with installation and license preparation charges to be in the area of another $15K. It still provides a viable alternative for medium and large businesses.

Small businesses also have an alternative of smaller wireless systems which utilize non-licensed frequencies and which can be installed by an IT person in the business operation. Cost is about $1000 to $2000, but I must warn you that this is not as reliable a solution as the microwave wireless option and reliable speeds may be slower.

Satellite

So far I have provided solutions that have been better suited for the medium and large business operations. Satellite provides alternatives for small, medium and large enterprises and there are various speed and pricing options, which make it a very attractive alternative or mitigation strategy.

Satellite Phones

There are several types of satellite alternatives. If a company is only interested in providing a short term telephone back-up alternative then satellite phone service like INMARSAT, at&t, Iridium, Satcom, Skytel, Worldcell, or Globalstar to name only a few offer basic voice, fax and basic v and e-mail services. They offer mobile phone services and are not usually capable of providing sustained data communication or Internet types of services. However, this communications strategy is good for keeping your senior executives and critical operations personnel in contact during disasters. You can rent phones for about $40/week and then pay about $1.00/minute for basic service or you can buy the phones for $700 to $2000 each and negotiate rates in the area of $0.85/minute. So as you can see this is not an inexpensive option, but usable depending on the need for communications.

Vsat

VSAT is an acronym for Very Small Aperture Terminal, an earthbound station used in satellite communications of data, voice and video signals. A VSAT consists of two parts, a transceiver that is placed outdoors in direct line of sight to the satellite and a device that is placed indoors to interface the transceiver with the end user’s communications device, such as a PC. It is very much like a satellite TV setup. VSAT service can be placed into two categories: those that provide basic Internet access services and those that are enterprise grade. For the small and medium sized business the Internet access type service is often what is selected. Such offerings as: DirectWay, WildBlue, and Connexstar all offer low cost, small business types of back up solutions which use equipment much like the in-home satellite television services. The data rates are in the area of 200 kbps uplink and 1.5 Mbps downlink which is very much like residential DSL service. The cost is about $300 for the equipment and around $100 or less each month. This would provide a small business the ability to utilize VoIP, VPN and connect to the Internet. For medium and large size businesses there are more sophisticated satellite services. They require satellite antennas, which are 3 to 5 meters in diameter and much more sophisticated and expensive equipment. Installation of these more sophisticated satellite services can cost in the range of $100K to $250K with monthly operational service charges from $1000 to $5000/month. They provide quality of service and committed information rates as part of the service. They can provide for up to 150 toll-quality phone lines, broadband Internet, and high speed data communications and also provide secure communication (encrypted) is required. Satellite services can also be rented as part of a contract or call up service. But, rental services are on a first-come-first served basis. As we witnessed during the tropical storms of last year these portable rental satellite service providers were inundated with requests and try as they would there were only so many units to go around. Those who did not plan or contract ahead were left without service.

Last Thoughts

I hope that I have given business continuity planners some food for thought in developing alternative communication mitigation strategies. Each strategy has its benefits and drawbacks. You need to look at each potential possibility and determine what is right for you. If you are overwhelmed there are many consulting organizations and even your own telecommunications services provider who can help you to identify and select the best options. However, you need to get started today for the next hurricane, tornado, flood, of catastrophe season in your geographic region. It will be too late to plan after an event occurs.

Dr. Jim Kennedy is the Business Continuity Services Practice Lead and a Consulting Member of Technical Staff for Lucent Technologies. Dr. Kennedy has over 25 years experience in the business continuity and disaster recovery fields and holds numerous Master level certifications in network engineering, information security and business continuity.

He has developed more than 30 recovery plans, planned or participated in more than 100 business continuity and disaster recovery tests, helped to coordinate three actual recovery operations, authored many technical articles on business continuity and disaster recovery and is a contributing author for two books, the “Blackbook of Corporate Security” and “Disaster Recovery Planning: An Introduction.”


jtkennedy@lucent.com

brcci.org

Critical Infrastructure

CRITICAL INFRASTRUCTURE PROTECTION IS ALL ABOUT OPERATIONAL RESILIENCE AND CONTINUITY

By Dr. Jim Kennedy, MRP, MBCI, CBRM

It has always been the policy of the United States to ensure the continuity and security of the critical infrastructures that are essential to the minimum operations of our economy and government. This critical infrastructure includes essential government services, public health, law enforcement, emergency services, information and communications, banking and finance, energy, transportation, and water supply.

So even before the events of 9/11, the Executive Branch of our government, the President through Presidential Decision Directive 63 (PDD 63) issued May 22, 1998, ordered the strengthening of the nation’s defenses against emerging unconventional threats to the United States, including those involving terrorist acts, weapons of mass destruction, assaults on critical infrastructures, and cyber-based attacks.

But how many of us really understand what an immense undertaking that was? What is the critical infrastructure in the United States?

  • More than 3,000 government facilities
  • 7,569 Hospitals
  • Telecommunications: 2 billion miles of cable; 1000s of telephone switching central offices
  • Energy: 2800 Electric power plants; 300,000 oil and natural gas producing sites; 104 nuclear power plants
  • Transportation
    • 5000 public airports
    • 500,000 highway bridges
    • 2 million miles of pipelines
    • 300 coastal ports
    • 500 major urban public transit operators
  • 4,893 banks or savings institutions have more than $100 billion in assets
  • 66,000 chemical and hazardous material producing plants
  • 75,000 dams
  • 51,450 fire stations responding to 22,616,500 calls for assistance each year.

US business and every individual rely in some manner on the above every day. We depend on their operational resiliency and continuity of operations.

Initially, critical infrastructure assurance was essentially a state and local concern. With the massive use of information technologies and their significant interdependencies it has become a national concern, with major implications for the defense of our homeland and the economic security of the United States.

However, given all of the focus on critical infrastructure still one in three critical infrastructure operations goes without a business continuity or continuity of operations plan and three out of five of those operations with plans have never tested their plans as ‘fit for purpose.’

Up until this year the electrical energy sector had no single body setting security and availability standards and practices for their operation. In 2006 the Federal Energy Regulatory Commission (FERC) selected the North American Electric Reliability Council (NERC) as the Electric Reliability Organization (ERO) and standard setting body in the US for electric utilities. Contingency and continuity of operations plans in this segment of the critical infrastructure is minimal at best as is typical across the entire energy sector (e.g. transmission, generation, oil and gas distribution and etc.).

In the financial sector many institutions, despite regular audits and increased governmental regulations, still do not have adequate continuity plans in place and information security is marginal.

Although the deadline for HIPAA compliance has officially passed, a significant percentage of covered health care organizations still have not achieved basic HIPAA compliance, according to a recent industry survey. They lack emergency operations plans and even in some cases proper disaster recovery plans for patient care systems, which contain critical patient healthcare information.

So even though there are laws and regulations and a very clear focus on the protection and resilience of critical infrastructure operations it has not seemed to translate into practice for the actual critical infrastructure operations across the US.

Critical infrastructure protection is all about operational resilience. In the GAO’s ‘Critical Infrastructure Protection – Significant Challenges in Safeguarding Government and Privately Controlled Systems from Computer-Based Attacks’ the report refers to service continuity controls as: “controls that ensure that when unexpected events occur, critical operations will continue without undue interruption and that crucial, sensitive data are protected.” It (the report) goes on to say that: “Service continuity controls should address the entire range of potential disruptions including relatively minor interruptions, such as temporary power failures or accidental loss or erasure of files, as well as major disasters, such as fires or natural disasters, that would require reestablishing operations at a remote location.”

So how is this to be accomplished? The most effective way is for the development of a thorough and comprehensive business continuity or business resiliency management program. That program can be based on the NIPP Risk Management Framework, which consists of:

  • Setting Security Goals
  • Identify Assets, Systems, Networks, and Functions
  • Assess Risks
  • Prioritize Mitigation Efforts
  • Implement Mitigations Strategies and Protective Programs
  • Measure Effectiveness
  • Start back at the beginning

I have attempted to outline below a process to aid critical infrastructure operations, utilizing the above CIPP Risk Management Framework coupled with an effective governance model, in addressing business continuity and resiliency needs.

First a certified business continuity planner needs to be selected and must obtain senior management agreement and sponsorship for the program to be developed. With this sponsorship budgets and manpower can be allocated for the project.

Second, the planner must solicit the aid from multiple areas of the operation or business. This can be accomplished by establishing a Business Continuity or Business Resiliency Steering Committee. This committee will be comprised of middle management from across the operation (e.g. technical, operational, financial, HR and etc.). The function of this committee is to establish the direction and approve the program, identify tools to be used, establish metrics, and report to senior management on progress.

Next, if the amount of work to be done is substantial or if the business continuity or resiliency program is starting from scratch, is the development of a Business Continuity or Resiliency Program Office. This may be comprised of one or more individuals who are responsible (using project management disciplines) for ensuring that the planning and mitigation tasks are implemented consistently throughout the organization. They must also track and report on progress.

With the governance in place, the CIPP framework can be implemented and work can begin to implement it within the organization. The steering committee will work with senior management to establish the direction and communicating the goals within the organization.

Identifying the critical assets is the next step. In everyday business continuity planning this equates to performing a business impact analysis. Here business continuity planners will work to develop a clear picture of what components (people, process, and/or technology) of the operation are critical to it carrying out its mission and to identify how long it can do without or work-around those components if they are to become unavailable.

Next step in the CIPP Risk Management Framework is the assessment of risk. This equates to the business continuity planner’s risk assessment. The risk assessment is the process of identifying the risks to an organization, assessing the critical functions necessary for an organization to continue business operations, defining the controls in place to reduce organization exposure and evaluating the cost for such controls. Risk analysis often involves an evaluation of the probabilities of a particular event.

Once the risk assessment is complete it will be necessary to move to the next step in the CIPP Framework, that of prioritizing the risks and developing mitigation strategies based on the operations risk appetite. Here is where the organization determines how to address the risk. Mitigate it, pass it on to another entity (insurance) or simply ignore it.

Whatever makes the best business sense is then translated into a protective plan which is then implemented under the direction of the program office. At this point in time, when the mitigation strategies are identified and are being implemented, is where the business continuity or resiliency plan can be developed. Again business continuity subject matter experts are best utilized to accomplish this task as they have developed plans for similar business operations. Once the mitigation efforts are in place and the plans completed awareness training and exercising of the plan is appropriate.

Lastly, before starting the whole effort over again, is measuring effectiveness. Is the plan and are the mitigation strategies “fit for purpose?” Does it adequately protect the operation from adverse events? If not, then the plan and mitigation efforts will have to be reviewed and modified as appropriate.

What has been accomplished is the beginning of a continuing effort to maintain the operation of the critical infrastructure. It has no end. It needs to be reviewed for every change to the operation.

I have been fortunate to help many critical infrastructure organizations build business continuity and resiliency into their operations. It is not easy but, as Presidents past and present indicate, it is of the utmost importance to make sure that the United State’s critical infrastructure is adequately protected as its citizens rely upon it every day for their safety, protection, and wellbeing. It is difficult but as has been said: the beginning of any important journey starts with a single step.

Dr. Jim Kennedy is the Business Continuity Services Practice Lead and a Consulting Member of Technical Staff for Lucent Technologies. Dr. Kennedy has over 25 years experience in the business continuity and disaster recovery fields and holds numerous Master level certifications in network engineering, information security and business continuity. He has developed more than 30 recovery plans, planned or participated in more than 100 business continuity and disaster recovery tests, helped to coordinate three actual recovery operations, authored many technical articles on business continuity and disaster recovery and is a co-author for two books, the ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and author of the e-Book entitled: ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic.’ 


jtkennedy@lucent.com

brcci.org

Developing Seamless Business Continuity and Disaster Recovery Plans

DEVELOPING SEAMLESS BUSINESS CONTINUITY AND DISASTER RECOVERY PLANS

By Dr. Jim Kennedy.

Introduction
The development of recovery times for both the business organization’s business continuity plan and the IT department’s disaster recovery plan need to be developed through the collaboration of both parties for either plan to provide the proper protection. However in my thirty-five years in the business continuity and resiliency field I have found in many situations they are not.

The reasons for this can be timing or a lack of knowledge of the overall business continuity and/or disaster recovery planning process coupled with a lack of understanding of each other’s real recovery timing needs.

The purpose of this article is to provide a framework in which the recovery time objectives (RTOs) for the business continuity and the disaster recovery plan can be developed together.

Reason for inconsistencies and failures
Generally the drivers for business continuity and disaster recovery planning are considered to be one and the same, but this is not always the case. Many times the very design process for IT infrastructure requires that the IT organization develop disaster recovery planning thoughts and plans early in the application and/or systems development process. So, early in the project’s timescale of the development of a new application or system, IT must have some understanding of what kind of recovery timing and recovery point timing will be needed to support the technology to be deployed. IT will try to obtain the RTO and RPO (recovery point objective) numbers, but the business is most often focused on insuring that the deployment of the new business process or function is rolled out on time and within budget. The business organization is not thinking about business continuity planning at this time. So, IT will take it on itself to develop a best guess of the required recovery times either based on conversations with the business organization or on its own, if the latter cannot or will not commit to a number.

In other cases that I have seen, there is a clear lack of knowledge about business continuity and disaster recovery planning. Each organization knows that they need either a business continuity or a disaster recovery plan but they are not trained in the overall steps in developing such plans. As such the business organization does not understand the risks, tradeoffs, and costs involved in developing a proper business continuity plan. The business organization also often does not understand that it needs to properly analyze the operation to better understand the recovery requirements during the process/systems/application development phase of the systems/process development life cycle or, as ITIL defines it, the application life cycle (ALC). The business organization needs to quantify the impacts of loss of that process or system; and may not be sure of the right questions to ask – not only in terms of loss of productivity, but in terms of costs to process manually in case of a system loss or failure. Can the organization develop and use manual processes at all if the system or IT infrastructure fails? Does the organization have the human resources to perform the necessary manual processes or will they need to bring in contingent workers and for how long and for what cost? Every business organization needs to clearly understand and to articulate their operation’s maximum tolerable period of disruption (MTPD).

MTPD is the maximum time an activity or resource can be unavailable before irreparable harm is caused to the organization. This applies to both customer-facing and internal activities. Note that the recovery time objective specifies the time by which an organization intends to recover an activity or resource: the maximum tolerable period of disruption is the upper bound on this time.

The business needs to utilize the MTPD to develop its processes and contingency processes, and the IT organization need to understand the MTPD to properly develop its technology and RTO which, in turn, will enable the business to achieve its RTO objectives.

At the same time, IT needs to utilize the recovery time numbers developed by the business organization as a basis for its system and infrastructure RTO values.

Standards and planning process
There are so many business continuity and disaster recovery standards to choose from, as well as other related standards of practice, that this might be the reason for all of the confusion. The fact that none of these standards really talk of integrating the business recovery and the IT technology recovery plans together in to the overall process or application development life cycle complicates the matter even further.

There is also the issue that business continuity and/or disaster recovery planning classes are usually only electives in business administration or computer technology/information systems curriculum. So we are not exactly preparing our next batch of business or technology leaders to properly understand the methods, or importance, of contingency planning.

All that being said, most of the standards that exist do have a pretty consistent set of predefined steps to be reasonably successful. So if we take all of the contingency planning steps and align them with the ITIL ALC phases the planning cycle will integrate system development with continuity planning together at the best possible time in the development process.

I will outline the steps below in developing business continuity and disaster recovery plans with their corresponding points within the ITIL application development life cycle:

STEPS IN BUSINESS CONTINUITY AND DISASTER RECOVERY PLANNING

ITIL APPLICATION LIFE CYCLE PHASES

1) Understand the Organization

a. Risk Assessment 

b. Business Impact Assessment 

            i. Determine MTPD for operation 

           ii. Develop RTO for Critical Systems

           iii. Develop RPO for Critical Systems

Requirements – requirements gathered based on business needs of the organization

2) Evaluate and Determine Strategy

a. BC strategy to meet RTO/RPO 

b. DR strategy to meet RTO/RPO

Design – requirements translated into specifications

3) Develop Plans

a. BCP – Business Organization 

b. DRP –IT Organization

Build – Application and the operational model are made ready for deployment

4) Exercise Plan

Operate — IT operates the application as part of the business service

5) Audit and Maintain Plan

Optimize

Using the standards and good practices
During the requirements gathering phase of the ITIL ALC the business owner should have also conducted the risk assessment and business impact analysis or BIA. The results of these two activities allow the business owner to clearly see the impact on the business of a failure or discontinuation of operations in either, or both, of the business or IT operations. They can then translate that knowledge from the risk assessment and business impact analysis into quantifiable RTO and RPO numbers to be used in the next phase of business continuity and disaster recovery planning (Evaluate and Determine Strategy) and the Design phase of the ITIL ALC.

The RTO and RPO numbers are used to develop alternative strategies that meet the recovery time and point needs. A cost for each alternative design is developed. The cost is the total of the IT cost to design, implement, build and operate; and the business cost for any workarounds or special handling during the outage period; plus costs to load any transactions processed during that outage period into the system (processing resynchronization) after they are brought back on-line and are processing again as before the incident.

The alternative strategies are then looked at using a cost and benefit (time, reduced workaround complexity, and etc.) analysis of each alternative. The best option will accomplish return to operation in a reasonable time with an acceptable cost to the business and IT. However, the alternative selected will require input from both IT and the business to properly address the risk of outage. The business will need to insure that it can perform the workarounds and still meet all of the business, regulatory and audit needs of the operation for the time period that the alternative defines the IT organization to need for restoring the IT systems needed to restart the application and its associated services.

For the plans to be effective and ‘fit for purpose’ it is very important that the business and IT are on the ‘same sheet of music’ as to recovery times and points. It is no good if the business has planned its resources and workarounds expecting a system recovery time of 24 hours only to find that the system will be down for 48 hours. On the other side of the coin it is not fiscally responsible to pay the cost to expedite the recovery time of an IT system to less than four hours if the business can tolerate an outage period of 24 hours or more at much less cost for the final IT solution.

Once it has been concluded that both plans are consistent with each other, the actual plans can be developed. While the business prepares for implementation of the new application and/or service, IT will make ready the systems and infrastructure needed to also meet the business schedule for implementation.

Exercising the plans
There is one caveat, however. Even if both sides have planned together and developed their plans based on a single and consistent recovery time, the two planning activities still need to verify (via exercising the plans together) that the IT recovery timing (the disaster recovery plan which includes hardware restoration, software restoration, synchronization of databases, and etc.) actually comes in on time to meet the business’ needs as provided for in the business continuity plan.

Only in testing and timing the two recovery processes to ensure that they are coincident can an organization truly be confident that the overall plans will be successful.

The author
Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV, CRISC has a PhD in Technology and Operations Management and is the chief consulting officer for Recovery-Solutions. Dr. Kennedy has over 30 years’ experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of three books, ‘Security in a Web 2.0 World – a standards based approach,’ ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and is author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. Dr. Kennedy can be reached at Recovery-Solutions@xcellnt.com

brcci.org

Disaster Recovery Planning & Cloud Computing

DISASTER RECOVERY PLANNING & CLOUD COMPUTING

Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV

January 2011

If you asked a group of IT practitioners or business people what cloud computing is they would probably answer in a manner consistent with blind men trying to describe an elephant with only the sense of touch. Each would have an answer consistent with their own specific perceptions.

In fact Public Cloud Computing is a relatively new term that has been around for only a few years and refers to the use of information technology services, infrastructure, and resources that are provided on a subscription basis. Public Cloud Computing is a Web or Internet accessed business solution where most or the entire computing infrastructure (computers, network, storage, and etc.) are contained remotely from the actual business site and is managed by a third party.

Many companies rely upon Public Cloud Computing in part or in whole for their business operations critical and other wise. So as we look at disaster recovery and Public Cloud Computing we are looking at a relatively new set of risks that need to be addressed to properly protect a business against unforeseen events.

Before I address the areas of concern to DR planning for public cloud computing let me discuss the various popular forms of public cloud computing available to the business.

There are three basic types:

  • Software as a Service (SaaS)
  • Platform as a Service (PaaS)
  • Infrastructure as a Service (IaaS)

Software as a Service (SaaS) is defined as a service based on the concept of renting software from the service provider rather than buying individually for your business. The software is hosted on network servers which are made functionally available over the web or intranet. This service provides software on demand and is currently the most popular type of public cloud computing because of its flexibility, ability to be scaled, and because maintenance is provided by the service provider as part of the cost of the service. There are many CRM, ERM, and unique applications that are all provided as SaaS services. With web-based services all that employees need to do is register and login to the cloud provided instance. The service provider hosts both the application and the data so the business user is capable of utilizing the service from anywhere potentially across the globe. With SaaS the service provider is responsible for all issues dealing with capacity, upgrades, security and service availability.

Platform as a Service (PaaS) is defined as a service that offers a platform for developers. The business users develop their own code and the service provider uploads that code and allows access to it on the web. The PaaS provider provides services to develop, test, deploy, host and maintain applications on their development environment. The service providers also provide various levels of support for the creation of applications. Thus PaaS offers a quicker and cheaper model for application development and delivery. The PaaS provider will manage upgrades, patches and system maintenance.

Infrastructure as a Service (IaaS) is defined as a service where the service provider delivers the computing infrastructure as a fully outsourced service. The user can purchase various components of the infrastructure according to their requirements when they need it. IaaS operates on a “Pay as you go” model ensuring that the users pay for only what they have contracted for – such as network, computing platforms, rack space, and environmental (HVAC and power). Virtualization has enabled IaaS vendors to high volumes of servers to customers. IaaS users purchase access to enterprise grade IT Infrastructure and resources and personnel to keep the infrastructure running. No application or monitoring of data bases or data is provided by the hosting vendor above the OS level unless contracted at an additional cost.

Basic Flaw in the “. . . as a Service” Offerings

In the cloud computing definitions that are evolving, the services in the cloud are being provided by third-party providers and accessed by businesses via the internet. The resources are accessed as a service on a subscription basis. The users of the services being offered most often have very little knowledge of the technology being used, the security being deployed, the availability of the service being offered, or the operating best practices (monitoring, patching, maintenance, and etc.) utilized by the service provider. The business subscribers also have little or no control over the infrastructure that supports the technology or service they are using.

How to Take Control

Under the standard of “Due Care” and charged with the ultimate responsibility for meeting business information technology objectives or mission requirements, senior management must ensure that the services they contract, which include these “. . . as a Service” solutions are appropriate to meet all of the necessary business requirements including the areas: legal, technical, financial, and operational.

This business continuity due diligence comes only through a thorough vetting of the “. . . as a Service” provider in several areas. I have listed some of the more important ones below.

Legal & Regulatory

  • Will the service provider meet any of you data breach notification requirements (remember even though you are hosting you are responsible for the data under your protection i.e. PHI, PII, and etc.)?
  • Will the provider meet data retention requirements of the business?
  • Will the provider meet the standards for data encryption and protection you require?
  • Are “Safe Harbor” needs met?
  • Data destruction or return on end of contract well defined to meet your business requirements?
  • What is their incident management program?
  • Are they prepared to react in a timely fashion in case of any eDiscovery needs of data they store for you?

Service Availability

  • Are the facilities housing the service provider adequately secured (video surveillance, access control, and etc.)?
  • Are the RPOs and RTOs consistent with the business’ requirements?
  • How often are backups taken, are they maintained off-site, and have backups and restores been tested to your satisfaction?
  • Are standard backup methods and media used just in case the business needs to bring data back into house?
  • Maintenance and maintenance windows satisfactory with your operational needs?
  • What types of technical security do they employ (i.e., firewalls, virus protection, Intrusion Detection Devices, and etc.)
  • Are their hours of operation coincident with yours?
  • If you are a global company do they provide multilingual support?
  • Are there clear escalation procedures in case of an incident?
  • Does the vendor provide global diversity so if one sitre goes down another can be used in its place?

Operational

  • Do they have a current SAS 70 Type II audit findings report?
  • Have they corrected any areas of concern to your business?
  • What capacity planning do they have in place to meet the growing needs of your business?
  • What standards of practice do they adhere to (i.e., ISO 27001, BS25999, and etc.)?
  • Do they have a patch management program in place and what is it? Does it meet your requirements?
  • Do their SLAs meet your business and operational requirements?

I have developed a hosting questionnaire which each “. . . as a Service” vendor is required to answer to the satisfaction of my client and I would recommend that you do the same. Sometimes it takes a few iterations to complete the form to the satisfaction of the client, but when completed it does provide documentation of due diligence and a clearer picture of what can be expected from the service provider. If the vendor will not complete the questionnaire then it would be best to move on to another vendor – regardless of cost. If you can’t come to terms before a contract or Statement of Work is signed it will be ten times more difficult after signature to come to an agreement.

In Summary

Now this article has only scratched the surface and provided information on the basic questions that should be asked and answered to protect businesses utilizing “ . . . as a Service” providers. However, the intent of this article was to inform the reader that there are many types of “. . . as a Service” offerings and ways to reduce and/or eliminate problems that I have experienced over the last few years. The issue the article wants to impress upon the reader is one of due diligence. We as corporate or governmental IT security or business continuity experts need to make sure that our organizational leaders have the necessary information to make informed choices for the protection of critical and sensitive information. To allow them to decide between implementing adequate controls and safeguards now to protect against risks or to potentially pay later in reparations and lost confidence of those whose data they (senior management) have been entrusted to protect but have lost or allowed to be taken.

The author

Dr. Jim Kennedy, MRP, MBCI, CBRM, CHS-IV has a PhD in Technology and Operations Management and is the Chief Consulting Officer for Recovery-Solutions. Dr. Kennedy has over 30 years’ experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of two books, ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. Author can be reached at Recovery-Solutions@xcellnt.com

Implementing a Good Information Security Program

IMPLEMENTING A GOOD INFORMATION SECURITY PROGRAM

The frequency and potential impacts of information security breaches are increasing. Dr. Jim Kennedy explains why and looks at what organizations can do about it.

Computer, network, and information security is based on three pillars: confidentiality, integrity, and availability. In my business as an information & cyber security, business continuity and disaster recovery consultant, I see every day how various sized and types of companies address these three areas. Some very well, some not so well, and some really poorly.

Given all the regulations and standards (like HIPAA, SOX, NERC-CIP, FISMA, PIPEDA, and etc.), developed and published over the last five years you would think that business and government should be doing much better in securing their computing systems and network infrastructures. However, based on the on-going events prominent in the press and trade journals almost every day this does not seem to be the case.

We continue to be informed that government agencies and private sector companies continue to have numerous cases of data leakage: a politically correct way of saying data loss, theft, or compromise. We hear about the theft of credit card and personal information and worst of all we hear of companies that have lost critical personal and health related information despite the many security controls that were supposed to be in place. Worse yet we hear of extremely large sums of monies extorted from banks and other financial institutions and also of the fragility of our power grids and gas distribution systems world-wide.

And from time-to-time the media will provide on screen experts that speak of ‘script kiddies’ or non-expert computer hackers that use pre-packaged software to break into systems without the use of their own intellect. Often the term is used in a derogatory or sarcastic fashion to denote the less than knowledgeable hacker.

So when it comes to information security, where exactly are we?

Current state

Every government entity or private enterprise business generally has a security plan in place which utilizes numerous types of controls to reduce or attempt to eliminate the adverse effects coming from security risks to their operations. For the most part there are three basic types of controls in use:

  • Technology – software and hardware used to address internal and external threats to the security of the organization.
  • Process – policies, processes, and practices to address vulnerabilities and to reduce security risks while establishing baseline standards of secure operations.
  • Ignore the vulnerability and threat.

The third control type is, disturbingly enough, used more frequently than one would think. However, I will focus on the first two types of controls which are more realistic and really do attempt to provide some safety and security for the information and/or systems being protected. In the controls of the first type (Technology) we find firewalls, intrusion detection/protection systems (IDS/IPS), virus scanning software (AV), data loss prevention systems (DLP) and malware detection software (to protect against key loggers, Trojans, and backdoors).

In the controls of the second type (Process) we find the corporate or government policies, standards of practice, and standard operating procedures.

All of these types of controls, if implemented and maintained correctly, form a good and sound basis for protecting the organization that uses them.

Yet despite the risk and vulnerability assessments, and the implementation of the above mentioned controls, security breaches and information leakage continues to rise. Why?

Failing controls

I have been reviewing, over the last fifteen years, the security breach and incident reports collected by Verizon, AT&T, Ponemon, amongst many others which are published yearly. My research shows that the trend of data breaches and security intrusions continues to be on the increase, despite new government regulations and laws in addition to the advances in technology and understanding of potential threats, as a whole year-after-year. Oh yes, we (the information/cyber security experts) have made some progress in some areas only to fall back in others.

However, one thing that I have found is that many of the breaches and intrusions which succeeded did so by attacking known vulnerabilities that had been identified and had been around for years: not from some sophisticated ‘zero-day’ attack which was unidentified and unknown until only yesterday by the security community at large. And, even more disturbing, social engineering continues to be a most successful way to begin and/precipitate an attack.

So let’s look at why.

One simple thing to remember is that if we look at very successful predators in general (such as the lion or the cheetah) they do not attack the fastest prey or the most protected; they attack the sick, the slow, the tired, or the unwary. Why? Because it presents the least expenditure of energy with the most potential for a successful outcome or food source. So also is the case with information and cyber attacks where the predator is the hacker.

For some small and medium sized companies (and, more often than not, some very large) cost and manpower is always an issue. So the upgrade of hardware and software is often slow and arduous and takes time to occur. Often budgets for security software and/or hardware upgrades are sparse of put off for more business important reasons or for when security comes to the forefront of board thinking and can be made available. Virus software and signatures are often out of date, systems often go un-patched, and hardware is often years old and cannot run the newer, more secure operating systems. Many times the implementation of hardware security devices, such as firewalls and intrusion detection systems, are done without giving the employees installing them, often for the first time, adequate training making the installations improper or marginal at best. I have found many large companies who do not have proper or adequate firewall rules established prior to installation of the device leaving holes for hackers to easily find and to penetrate.

Further, I have also found from personal experience that a majority of security breaches could have been avoided if only the security policies and processes already in place and in effect were actually followed.

Companies have done a fairly good job creating policies, but a less than admirable job in insuring that people are trained on the policies and in making sure that those policies are followed. Often failure of compliance with the policies, when uncovered, result in only a stern warning, followed by everyone going back to the ‘business as usual’ of not following the policies already in place. Many times this non-adherence of policy has resulted in the loss of thousands of personal information and/or health records or company intellectual property, and in still more acted as the vector for the hacker to use to focus their efforts on to break into the networks or systems of that agency or company.

Another big reason for the increase of security breaches and information leakage is the continuing success of social engineering (the art of manipulating people into performing actions or divulging confidential information).

Why is social engineering so successful? Because most people, who work for companies or government, generally want to be helpful wherever possible: that is their organization’s mantra. This is preyed upon by malicious hackers every day. To compound the problem government and companies spend less money and time on security awareness training for their employees than they do yearly on copy paper: and hackers know it. So calling up and indicating that they are from Tech Support and need to fix the boss’s computer so they need to have his secretary change his password to ‘ABC123’ may find a secretary who is happy to comply. Or compliance may follow when the VP of the Marketing and Sales organization gets an unsolicited phone call where the caller indicates that they are from a virus protection firm and they know, based on some trumped up information, that the VP’s computer is infected, but they will clean it up if he or she just logs into a specific web site and then relinquishes control to the tech support person on that site. Once the VP links to the site they find that minutes later their computer stops working and their files copied and/or erased. Both of the above situations are actual examples from true situations that I have been called upon to investigate.

Lastly the sophistication of hackers is also increasing. Just as many companies and government agencies purchase off-the-shelf software to accomplish normal business functions rather than develop it on their own, so do hackers. Today, less than successful hackers can purchase or acquire pre-packaged malware (such as backtrack, metasploit, nmap, and etc.) which is produced by very expert and knowledgeable hackers. This sophisticated ‘shrink wrap’ malware is capable of identifying what versions of hardware and software are being run on computers or network systems and what types of attacks will be successful. Then would be hackers using that knowledge along with well-publicized known vulnerabilities are very capable of breaking into many computer systems and networks that are not properly protected. Hacking has become a commodity business, accomplishable by anyone capable of buying, loading and executing pre-packaged software.

Oh, and one last thing. Do not think that because your organization has placed their computing infrastructure in the cloud that it is any safer. The security of the cloud has the same issues and short comings as your own internal computing infrastructure, as I have explained above. I have personally performed security assessments on over 100 cloud providers over the last few years and have found some are very secure and many are very vulnerable as well.

So what can we do?

I have found that some basic steps can have an order of magnitude improvement of security management as it stands today in your environment. Remember these steps will only be effective if top management agree that security is important and endorse (act as champions) the security activities to be undertaken.

Step one: Conduct a risk assessment to determine exactly what information and data is most important (mission critical) to your organization and identify security vulnerabilities to those resources. Create a risk register which identifies critical systems, vulnerabilities, internal & external threats, and controls needed. This is a very important first step, so, if you do not feel that you have the expertise in-house it would be prudent to have a knowledgeable security consultant perform this task for you to give you a good baseline from which to operate. It also provides a mechanism to identify projects for budgeting and planning purposes.

Step two: Based on the vulnerabilities and threats identified develop policies (like password policies, acceptable use policies, encryption policies, and etc.) to identify proper process and standards of practice the organization wants followed. However, recognize that people do not always follow these policies, process and procedures.

Step three: Implement necessary technical controls (insure that they are designed and implemented by knowledgeable personnel – proper training to internal staff on the new technologies). The reason for technical controls is that, wherever possible, we should endeavor to protect humans from their own bad practices. So if they feel pressured to work around security controls the technology will not allow them to do so.

Step four: Implement security awareness training across the entire staff – from board to lowest levels in the organization. Again this should be conducted by knowledgeable people and bringing in experienced trainers would not only be smart but most cost effective. Training to address social engineering and Internet/email good practices will go a long way to protecting an organization.

Step five: Implement a good security monitoring program. Often many anomalies or inconsistencies in network traffic or systems access is a precursor for a more intensive attack to come. Make sure that security logs are kept and reviewed on a weekly basis, more if the assets you are protecting are extremely critical to the survival of your organization or its customers.

Step six: In security we have our own mantra: Trust but Verify. So, do not simply trust that steps one through five when complete are sufficient. Technology, business operations, hackers, and threats are all continually changing and evolving. What works today may not work tomorrow. So, conduct regular (at least once a year) vulnerability tests. Use an independent third party so you get the real scoop on you security posture not what your organization’s people think is politically correct.

Information and computer security continues to be a ‘work in progress’ never complete. So, treat it that way.

The author
Dr. Jim Kennedy, MRP, MBCI, CBRM, CEH, CHS-IV, CRISC has a PhD in Technology and Operations Management and is the Lead and Principal Consultant for Recovery-Solutions. Dr. Kennedy has over 35 years’ experience in the information/cyber security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of three books, ‘Blackbook of Corporate Security,’ ‘Disaster Recovery Planning: An Introduction,’ and ‘Security in a Web 2.0+ World – a standards based approach,’ and is author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. Dr. Kennedy can be reached at Recovery-Solutions@xcellnt.com

Pearson Grounds Flights During Ice Storm

PEARSON GROUNDS FLIGHTS DURING ICE STORM

In any BC Plan, it is critical to define when and how a disaster should be declared.

Was the GTAA correct in making the decision to impose ground stop during frigid temperature of -25 to -45 Celsius?”

News Article 1:

Pearson right to ground flights during ice storm

These are the facts from those of us who were working on the ground when this decision was made:

Simply put, there is a very good chance that the GTAA’s decision saved people’s lives. In the proceeding 30 hours before the ground stop, there were two airplane crashes in similar conditions at New York’s JFK Airport and Aspen, Colorado. Two days later, an aircraft slid off the runway shortly after landing in Saskatoon.

Years of two-tiered wages and contracting out has forced thousands of our co-workers into precarious, near-minimum-wage jobs. This is creating a high turnover rate and a lost opportunity to retain the experience needed to work in irregular operations. Many airports around the world, particularly in the U.S., are implementing Living Wage Ordinances in recognition that skilled, properly paid people on the ground are necessary for your safety.

Most importantly we need to remember that we are all people first. None of us can control bad weather in an industry with zero room for error. Nothing is achieved when we are abusive to each other — worker or passenger. After all, these decisions are made for both of our groups’ safety.

Sheri Cameron, Martyn Smith and Sean Smith are airline workers and representatives on the Toronto Airport Council of Unions encompassing over 20,000 airport ground handlers and flight attendants in both Terminals 1 and 3 at Pearson Airport.

Source Article (Toronto Star News):

https://www.thestar.com/opinion/commentary/2014/01/21/airport_workers_pearson_was_right_to_ground_flights.html

News Article 2:

GTAA criticized for “Ground Hold” at Pearson International Airport

The Greater Toronto Airports Authority is being harshly criticized for their decision to stop all arriving North American flights for more than eight hours at Pearson International Airport, which literally stranded thousands of frustrated passengers and caused serious delays since that day.

As a result, more than 50 per cent of all 774 arriving flights, i.e. 381, had to be cancelled as of Tuesday evening. Consequently, hundreds of weary travelers slept on seats or trudged forward in hours-long lines to rebook their cancelled or missed flights.

Vice President of strategy development for the GTAA, Toby Lennox, revealed that the decision to impose ‘Ground Stop’ at the airport is the CEO’s first in his 15-year career. He alleged that usually stops are only imposed due to snowstorms or lightning and last only a few hours. Although, he also admitted that “it’s just never been this extreme,” and “no matter how much you prepare, you’re not going to be able to make the event go away. I can’t prepare to make the weather go away.”

Source Article (Oye! Times):

https://www.oyetimes.com/news/canada/57358-gtaa-criticized-for-ground-hold-at-pearson-international-airport

Vital Records and Business Continuity Planning

VITAL RECORDS AN BUSINESS CONTINUITY PLANNING

By Dr. Jim Kennedy, MRP, MBCI, CBRM CHS-IV.

As business continuity and disaster recovery professionals we continue to address the rapidly changing face of business and technology. We are caught up in the frenzy of our employers or clients wishing to converge their voice and data networks. We must maintain the RTOs and RPOs necessary to restore mission critical infrastructures along with all of the electronic data that moves across networks or is stored on magnetic media. We know that companies that go through a severe loss of mission critical computerized records may never reopen.

However, as we have seen from past disasters, like those suffered during hurricanes Katrina and Rita or even the most recent floods in the Midwestern portions of the United States, that electronic and digital data is not the only medium of information critical to an organization’s business mission. Neither is electronic data the only storage medium of importance to customers or patients who rely upon critical paper records and their protection for their financial futures or health and wellbeing.

Disasters such as floods, fires and tornadoes can happen almost anywhere and at any time. Some come with prior warning, but most do not. With hurricanes there is often advanced warning, but the actual ultimate severity is still pretty much a ‘best guess’ due to the complex factors which can change a category three into a category five or change the final direction of a storm. Those changes can mean the difference between severe flooding, levee breaches, and near absolute destruction of property or just a lot of rain and some local street flooding and wind damage.

We have seen and experienced some of the most destructive weather and natural disasters imaginable in the last ten years. We also know that more localized incidents like a roof collapse under the weight of above average snowfall or a pipe bursting due to age can also cause catastrophic outcomes. As contingency planners we continue to learn and base our future efforts on lessons learned from the past. We have learned to apply an ‘all hazards’ approach when planning.

We also need to take an ‘all media’ approach to data protection. As such, we all need to look very closely at the continued reliance of businesses such as financial, healthcare, government, and education on paper records and information.

Until businesses can move entirely to the use of electronic records and adequately back up that information, organizations will continue to remain vulnerable to all types of disasters. Many organizations today could fail and never reopen their doors if they suffered a loss of just paper records due to a fire or flood.

As we saw during the catastrophic destruction of hurricane Katrina and then Rita, thousands of medical records were permanently lost and healthcare was ultimately compromised in the region. Doctors in attempting to treat their patients could not find their medical records. So they (the doctors) could not look for past allergies to medications or previous illnesses. Patients often did not know the names of their critical prescriptions so they were forced to go without.

Small and medium businesses that had lost their computers in the storm had also lost several weeks of paper business transactions. Architectural and engineering firms lost many important drawings not maintained on computers and numerous local and county governments lost paper deeds, court records, birth records and many other valuable papers and documents.

Natural disasters are not the only incidents to threaten vital paper records. I was personally involved several years ago in an incident in which a local bank vault, used to archive not only financial records but other vital records of the community, became fully engulfed in a fire. The only way to get to and then put the fire out was to drill several holes in the concrete ceiling above the vault and then fill it with water to extinguish the blaze. The very water used to extinguish the blaze and save the building from the fire also compromised and/or destroyed important documents and records, many of which had been there for over one hundred years. Luckily with the help of a document recovery company the bank was able to restore some of the records over time, but with a very expensive price tag.

Even paper files and records that are kept in an off-site storage facility can be susceptible to the same types of damage and destruction that other businesses are. In many instances widespread natural disasters, like floods, often compromise off-site storage facilities in the same manner as the primary sites that sent them there for protection.

So as you can see paper continues to be a medium on which many critical records and irreplaceable information continues to reside. So as contingency planners we need to ensure that our evaluation of business includes any and all data that is critical to the operation of that business – that includes vital paper documents and records.

Defining, identifying and inventorying vital paper records

This is possibly the most important and sometime the most difficult first step to proper data protection. This is where organizations need to distinguish between important data and a vital record. A vital record is defined by the Business Continuity Institute as: Computerized or paper record which is considered to be essential to the continuation of the business following an incident.

Typically only between 3 to 15 percent of the paper records archived are typically categorized as vital. However, in the case of healthcare and governmental organizations this number can be quite a bit higher. So, someone at a senior level in the organization must make the final judgment as to what is vital and what is not. Also, many paper records are maintained for legal reasons. Many need to be maintained due to some type of regulation from the FDA, SEC, Internal Revenue, or HIPAA. The terms of the retention period can vary from three years to seven years for tax information to the life of a patient for some medical records. So an organization’s legal council should also be contacted for their recommendations.

Categories of recorded data, on paper, that typically fall under the category of vital may include:

  • Patient healthcare records, controlled drug administration, results of clinical trials, and etc.
  • Birth records, court records, vital statistics and etc.
  • Contracts/agreements that prove ownership of property, equipment, and etc.
  • Operational records such as Sarbanes-Oxley accounting records, architectural drawings, shipping delivery records, software licenses, maintenance contracts, and etc.
  • Current client files and account information
  • Intellectual property such as source code, formulas, schematics, SOPs, and etc.
  • Legal documents such as tax records and correspondence or other documents which is a part of ongoing litigation

Assessing the threat to vital records

The identification of hazards that can result in damage or destruction of paper records is the very important next step. Flooding or water damage of records in storage areas can occur due to:

  • Pipes bursting or leaking
  • Roof leaks or collapse (rain, snow)
  • Localized flooding (water main breaks, traffic accidents)
  • Chemical spills

The risk of damage due to fire is possible when:

  • Fire detection and protection mechanisms are not proper for the types of materials being protected or are in place and not maintained and checked annually (e.g., sprinklers can cause more damage from water than fire would have caused)
  • NO SMOKING protocols are not established and adhered to
  • Improper housekeeping is found in document storage areas (e.g., flammable liquids, cleaning solvents, or other materials are found in the same area or in close proximity as document storage, and there is an accumulation of flammable materials)
  • Paper records are not stored in a UL or CSA rated fireproof/fire safe and water retardant storage cabinet

Other threats to paper records:

  • Theft
  • Sabotage
  • Mishandling
  • Negligence
  • Loss

Some paper records due to age or paper material used can be damaged due to improper handling or environmental excesses such as temperature, humidity, or sun or fluorescent light. As such these need to be protected by:

  • Air conditioning to maintain constant temperature and humidity levels
  • In storage cabinets to keep the document from direct light of any kind

So any threats to the maintenance and operation of air conditioning or environmental controls must be considered as well.

Another red flag is a lack of adequate levels of security protection in storage containers or spaces used for on or off site storage locations. Adequate access controls and proper 7 X 24 X 365 monitoring of the records must be maintained at any storage facility selected to house vital records.

Establish a plan to protect vital records

In order to protect vital records from disaster many organizations:

  • Move and store the paper records off-site at a facility specializing in transportation of vital records and providing secured vaulting services;
  • Convert paper to other media such as: optical disk, microfiche, microfilm, magnetic disk or tape and etc.

Each of these contingencies is good provided that it provides the necessary flexibility to access records when needed and provides the necessary protection to properly preserve those records. That is whether or not the vital records will be kept on or off site the vaulting facility must have adequate security, provide proper environmental controls (humidity and air conditioning), have adequate fire protection facilities, and employ trusted or bonded workers.

In any case all threats identified in the risk assessment should be addressed, either by: elimination through mitigation; adequately insuring against loss; or a cognizant decision by senior management is made to ignore.

Once the threats have been addressed the business continuity plan can proceed in the development of the sections on vital records protection, restoration and recovery. The plan should include a thorough inventory of all vital records stored on or off site. The plan should also include a description of how records will be identified, transported, and handled during restoration. Also the plan should designate who is the responsible party within the organization to authorize initial storage and any subsequent recovery of vital records so that the confidentiality and integrity of the data can be maintained.

One component of the vital paper records plan should include an agreement or contract with a document recovery and restoration company in case documents are compromised during an incident. This saves time by identifying one of the first organizations to be contacted if paper records are damaged. If not a contract, at least have emergency contact information of such an organization included in the plan.

Once the plan has been exercised, including the vital records component, and found to be ‘fit-for-purpose’ the contingency planner can breathe somewhat easier and the plan can be finalized and released.

Summing it up

Paper records can be as critical to the operation and survival of a business as other forms of media. We as business continuity or resilience planners need to adopt an ‘all hazards’ and an ‘all media’ approach when developing plans to ensure that we have provided the necessary due diligence to protect our businesses and its associated operations.

Author

Dr. Jim Kennedy has a PhD in Technology and Operations Management and is the business continuity/security services practice lead and principal consultant for Alcatel-Lucent. Dr. Kennedy has over 30 years’ experience in the information security, business continuity and disaster recovery fields and has been published nationally and internationally on those topics. He is the co-author of two books, ‘Blackbook of Corporate Security’ and ‘Disaster Recovery Planning: An Introduction’ and author of the e-book, ‘Business Continuity & Disaster Recovery – Conquering the Catastrophic’. jtkennedy@alcatel-lucent.com

brcci.org

What is Business Continuity?

WHAT IS BUSINESS CONTINUITY?

Dr. Akhtar Syed, Phd, CBRM, MABR, CISSP.

Disasters can strike quickly and without warning. Webster’s dictionary defines disaster as:

“A calamitous event, especially one occurring suddenly and causing great loss of life, damage, or hardship, as a flood, airplane crash, or business failure”.

Floods, earthquakes, tornadoes, and hurricanes are examples of major calamitous events.

Businesses are vulnerable to the impact of not only major calamities but also minor business disruptions. Factors such as increased dependency on technology and “speed to market” pressures have made businesses sensitive to even minor disruptions. Some examples of minor disruptive events are power outages, information technology (IT) system failures, manufacturing equipment failures, hazardous material contamination, voice and data communication failure, and computer viruses.

Over the past decade, the risks of natural disasters, technical and accidental failures, and malicious activities have increased the possibility of business disruptions. In spite of increased risks, studies show that many businesses have remained complacent. According to Gartner, “… many enterprises that experience a disaster never recover. Gartner estimates that two out of five enterprises that experience a disaster go out of business within five years”. These findings reflect the failure of businesses to invest in adequate disaster planning and preparations.

Serious consequences of business disruptions can be avoided through business continuity planning (BCP). BCP is a discipline that prepares an organization to maintain continuity of business during a disaster through an implementation of a business continuity plan. A business continuity plan is a document that contains procedures and guidelines to help recover and restore disrupted processes and resources to normal operational status within an acceptable time frame.

A business continuity plan cannot function effectively without the collective efforts of the people assigned to various roles and responsibilities defined in the plan. Continuity of business cannot be maintained without the continuous support of critical business processes—tasks and operations performed by business units or functions—and various resources required by these processes.

The figure below depicts the typical resources involved in a business continuity plan, namely, IT infrastructure, data centers, manufacturing and production facilities, critical machinery and equipment, critical records, office work areas, critical data, voice and data communication infrastructure, and off‑site storage facilities.

Conceptually, BCP can be divided into two areas:

1. Business continuity planning management (BCP management)

2. Business continuity planning process (BCP process)

The typical activities of BCP management and BCP process are shown in the figure below on a time line relative to a business disruption.

BCP management focuses on management and organizational components of BCP.  Some of the key activities of BCP management are:

Issue an organization wide business continuity policy that directs management and staff of each business unit to take responsibility for maintaining continuity of critical business functions and processes in the event of a business disruption.

  • Establish a steering committee with members from senior management to define the BCP scope, provide ongoing BCP support and direction, monitor BCP status and progress, and allocate BCP funding.

  • Initiate a formal project for developing a business continuity plan that covers the entire organization.

  • Ensure that personnel involved in the development and implementation of the business continuity plan are adequately trained.  Develop and implement a BCP awareness and training program for the entire organization.

  • Ensure that BCP is in compliance with pertinent government laws and regulations, and industry standards.

  • Coordinate BCP activities with relevant disaster recovery and business continuity agencies and local authorities.

  • Ensure that the business continuity plan remains in a state of readiness at all times.

  • Execute the business continuity plan at the time of disaster.

Together, BCP management and BCP process enable an organization to develop a business continuity plan, maintain it in a constant ready-state, and execute in the event of a business disruption.

The BCP process defines a life cycle for developing and maintaining a business continuity plan. The BCP process life cycle model consists of the following stages:

Stage 1—Risk Management

Stage 1, risk management, assesses the threats of disaster, existing vulnerabilities, potential disaster impacts, and identifies and implements controls needed to prevent or reduce the risks of disaster.

Stage 2—Business Impact Analysis (BIA)

Stage 2, business impact analysis, identifies mission-critical processes, and analyzes impacts to business if these processes are interrupted as a result of a disaster.

Stage 3—Business Continuity Strategy Development

Stage 3, business continuity strategy development, assesses the requirements and identifies the options for recovery of critical processes and resources in the event they are disrupted by a disaster.

Stage 4—Business Continuity Plan Development

Stage 4, business continuity plan development, develops a plan for maintaining business continuity based on the results of previous stages, specifically, risk management, BIA, and business continuity strategy development.

Stage 5—Business Continuity Plan Testing

Stage 5, business continuity plan testing, tests the business continuity plan document to ensure its currency, viability, and completeness.

Stage 6—Business Continuity Plan Maintenance

Stage 6, business continuity plan maintenance, maintains the business continuity plan in a constant ready state for execution.

Stages 1 through 5 are part of the “Plan Development Project” activities of BCP management. Stage 6 is part of “Maintain Disaster Readiness” activity of BCP management.

At the time of a disaster, business continuity plan becomes the most critical document to guide the organization towards timely and effective disaster recovery. Adequate and proper training of business continuity team is crucial in developing, maintaining and executing a comprehensive, effective and reliable business continuity plan.

For a comprehensive training and certification in business continuity planning, Audit and IT disaster recovery planning, contact BRCCI (www.brcci.org, 1-888-962-7224):

1. 3-day CBRM (Certified Business Resilience Manager) is a comprehensive, all-in-one, 3-day Business Continuity Planning and Management Training and Certification course which is designed to teach practical methods to develop, test, and maintain a business continuity plan and establish a business continuity program.

2. 3-day CBRITP (Certified Business Resilience IT Professional) his is a comprehensive training on how to assess, develop, test, and maintain an information technology (IT) Disaster Recovery Plan for recovering IT and telecommunications systems and infrastructure in the event of a disaster or business disruption. The training provides a step-by-step methodology to ensure a reliable and effective IT disaster recovery and continuity plan consistent with the industry’s standards and best practices.

3. 2-day CBRA (Certified Business Resilience Auditor) It provides 2 days of intensive, Business Continuity Audit training to enable students to determine the effectiveness, adequacy, quality and reliability of an organization’s Business Continuity Program. Students will learn an audit methodology to evaluate compliance of Business Continuity and IT Disaster Recovery Programs with the current industry’s best practices and standards including:

  • ISO 22301: Business Continuity Management Systems – Requirements
  • NFPA 1600: Standard on Disaster/Emergency Management and Business Continuity Programs
  • ITIL v4: Information Technology Infrastructure Library