Tuesday, May 27, 2008

Disaster recovery planning-The need

Having been around the network awhile I have to admit that I have been a party to some disaster recovery efforts. Thankfully most of them have been pretty minor. Some have been averted due to some prior planning and some have been recoverable with some ingenious use of technology, such as it was at the time. It was also not like we could not see it coming.

In some of my past positions I have done some strange things to get the information we needed in order to plan for a disaster scenario. The good news was we had forward thinking management at the time who saw the need to plan ahead. The most odd thing I ever did was buy the power company lineman coffee if he would allow me to see his site planning books with enough time to copy the relevant pages. It took a while but we got what we needed not only to prove we were single threaded at our major data center but that there was an alternate feed available for us to split our power feeds with little effort on the part of the utility. The comment on the part of the field engineering manager was "Boy you guys sure know a lot about our physical plant"! That "knowledge" cost me a cheap cup of coffee and came in handy on the day that a 150KV line went up in smoke without explanation, and no outage.

The other scenario was the recognition that we were single threaded in our main telecom feeds going along a well travelled road up north to the first "PoP" to a national network. I do not take credit for the discovery, but when the ultimate disaster scenario came true one day we were all standing in the data center looking at each other wondering what we could do about it. We were told that we were hours or longer away from having service restored to one of our primary service systems. Insult to injury was our "dial backup" service followed the same path as it turned out. I will take credit for using my calling card from an alternate provider to establish 14 alternate paths out of our data center and up to a backup site a thousand miles away. It turns out we were down on our primary service for over 13 hours.

If the message has not gotten through already, everyone and every organization needs to think about what they would do in the event of a disaster of some kind. More on this topic to follow based on current events.

No comments: