Getting critical data replicated in near real-time to a cloud-based Disaster Recovery (DR) site is all fun and games. Conversely, creating an environment whereby this data can be rapidly deployed in a functional live state is all games and no fun.
Here, we highlight a few of the problems that you’re guaranteed to encounter.
If your DR systems are all internal to your organization, network concerns may be limited. If employees will be working from an alternate location, a secondary VPN server located at the DR site may be all that’s necessary to regain productivity.
The primary point of contention for non-public deployments is the occasional vendor or business partner who restricts access based upon IP address. This method of access control is a bit dated but surprisingly common. In this case, failover to a DR site designed with different public IP addressing will incur a mad scramble for vendor contact details. In this case, you’d better hope that your failover has occurred on a weekday between 9 a.m. and 5 p.m.
Any servers requiring inbound connections from the Internet present a problem. There are multiple potential solutions but the right one depends on a number of variables.
The most common approaches include:
- BGP – If you control your own netblock(s) you can advertise them through divergent ISPs and weight the backup site routes appropriately. This is usually the most ideal solution from a technical perspective, though some organizations may not have the resources to configure and maintain a BGP implementation.
- DNS – It is possible to configure DNS zone timeouts to be cached for only a few minutes. With this, the DNS administrator has the ability to affect hostname changes rather quickly. There are, however, a few significant downsides to the DNS approach. Hostname record maintenance (creating/updating records for each site) and possible zone “flapping” or transience due to the low timeouts are just a couple.
- NAT – In cases where publicly addressed systems cannot be brought back online at the DR site using BGP or DNS, it may be necessary to employ NAT. Keep in mind that is not necessary for systems to be addressed with RFC1918 “private” IP space. It is perfectly acceptable, in the context of a DR scenario, to keep existing public IPs (from the primary site) in place and configure the network interior accordingly. NAT or PAT can then be used to map the DR site’s public IPs with the original “internal” IPs.
Have any software or hardware licenses linked to IP addresses or CPU serial numbers? Be prepared to get bit if you don’t spend some time thinking this one through.
Getting firewall and administrative policies correct for the DR site is relatively straightforward so long as you remember where all of those policies live. One critical point of concern arises when both the primary and DR sites are using identical IP space. When the primary site recovers, will you have difficulty accessing the DR site’s “shadow” system to recover updated data if you’re accessing it via VPN or private line?
Since DR systems typically exist in a “down” state until a recovery operation is manually initiated, this hurdle is commonly and easily overlooked.
While architecting your DR solution, don’t forget that the door swings both ways. No one (at this point) expects automated failback procedures and everyone is willing, perhaps too much so, to accept a degree of manual reconstitution of data at the primary site. Don’t succumb to the “we’ll cross that bridge when we reach it” mentality. When you’re ready to switch back to your primary facility, your team’s technical abilities, patience and sleep schedules have likely been pushed to their limits. You won’t want to delay failback as your problems will assuredly increase as a function of time. At this point, you also don’t want a stressed team resolving complex problems concerning your organization’s critical resources. Procedures for synchronizing databases are best devised without flames licking at your feet.
Take time, while you have it, to think through all scenarios that your organization is likely to encounter in the event of DR site deployment. Consider bringing in an experienced professional services organization to help your team address potential challenges and implement a solid DR strategy. Flexible network design and general system maneuverability make cloud a great foundation for DR sites, but keep in mind that even with all of cloud’s wonders, fundamental Newtonian principles still apply.
Edited by Braden Becker