Now in the MX: high availability for datacenters

Avoid downtime and client disruption if a critical datacenter goes offline.

Datacenters (DCs) are used the world over by companies looking to reap the benefits—cost, management, and computing—made possible by centralizing IT resources. If you’re an organization with multiple branch sites connected to datacenters, ensuring these branches can still access resources if a datacenter goes offline is crucial.

 

We’re thrilled to announce that new functionality in our Cisco Meraki MX security appliances will enable immediate, automatic failover for branches tunneled to datacenters via VPN. The MX supports DC-to-DC VPN failover for both mesh and hub-and-spoke topologies.

 

Hub-and-spoke failover

 

When configuring a Meraki MX for hub-and-spoke datacenter failover, typically the network resembles the image below: a select number of branch sites (“spokes”) are tunneled back to an individual datacenter (the “hub”).  Other branch sites may be tunneled to a separate datacenter (e.g. datacenter 2). Both DC1 and DC2 are configured as failover datacenters for the other’s respective branches.

 

A key aspect of this topology is that branch sites cannot communicate directly with each other; all traffic routes through the designated hub first. So, for Branch A to speak to Branch B in the diagram below, traffic routes from Branch A → DC1 → Branch B. For Branch A to speak to Branch E, traffic routes from Branch A → DC1 → DC2 → Branch E.

mx-architecture-HnS

Hub-and-spoke DC failover using the Meraki MX. MXs deployed in Branches A-C tunnel to DC1. In the event of DC1 becomes unreachable, these branches will immediately fail over to DC2.

 

MXs deployed in hub-and-spoke DC-to-DC failover can be configured either as VPN concentrators (shown above) or in NAT mode, typically selected if an MX is to be the default gateway. The number of supported branch sites and datacenters is scalable up to the maximum number of VPN peers allowed by the MX model deployed in the hub (check our MX sizing guide for specifics).

 

Some common hub-and-spoke datacenter failover scenarios (assume the priority order for Branches A-C is DC1, followed by DC2; the concentrator priority for Branches D-F is DC2, followed by DC1):

 

  • If DC1 goes down completely: all connected branches (Branches A-C) will fail over to DC2
  • If Branch A cannot reach DC1 but Branch B can: Branch A will fail over to DC2 while Branch B will continue to send traffic to DC1
  • If DC1 is online, and Branch A wishes to access resources in DC2 (on a subnet not shared between DC1 and DC2):  this is possible; the traffic path will be Branch A → DC2
  • If DC1 is online, and Branch A wishes to access resources in DC2 (on a subnet shared between DC1 and DC2): this is not possible; because Branch A has selected DC1 as its priority, all shared subnets will only be accessible via DC1
  • If DC1 is unavailable, and Branch A wishes to access resources in DC2 (on a subnet shared between DC1 and DC2): this is possible, since Branch A will automatically fail over to DC
  • If Branch A wishes to communicate with Branch E (both DCs are online): this is possible; the traffic path will be Branch A → DC1 → DC2 → Branch E

Hub priorities can be set individually for each branch network. To configure this, navigate to Configure > Site-to-Site VPN in the Meraki dashboard. Enable VPN, select a hub-and-spoke topology, and then create a list of priority hubs:
 

HnS DC-DC

Configuring concentrator priorities in hub-and-spoke failover mode.
 
 

Mesh failover

 

When configuring a Meraki MX for mesh DC-to-DC failover, typically the network resembles the image below: branch sites have redundant communication pathways between them. So, for Branch A to speak to Branch B in the diagram below, traffic routes directly from Branch A → Branch B.

mx-architecture-Mesh

Mesh DC failover using the Meraki MX. MXs deployed in Branches A-D all tunnel to DC1. In the event DC1 becomes unreachable, all branches will immediately fail over to DC2.

 

MXs deployed in mesh DC-to-DC failover have some limitations when compared to a hub-and-spoke model. Specifically, datacenter MXs must be configured in VPN concentrator mode, and all branch sites share the same organization-wide concentrator priority list—so by default, all meshed branches will tunnel through the same datacenter, only rerouting to a secondary in the event of failover.

 

As with the hub-and-spoke failover model, MXs deployed in mesh failover can scale to as many locations as however many VPN peers are supported by the MX model deployed.

 

Here’s how mesh DC-to-DC failover works (assume DC1 is configured as the priority):

  • If there’s a shared subnet between online datacenters: DC1 will receive all traffic destined for the shared subnet, since it is the designated primary
  • If DC1 goes down completely: all connected branches will reroute to DC2 for the duration of DC1’s outage
  • If Branch A cannot reach DC1 but Branch B can: Branch A will send traffic destined for any shared subnets to DC2 while Branch B will continue to send traffic to DC1. Traffic between Branch A and Branch B is unaffected
  • If Branch A wishes to access a subnet shared between DC1 and DC2 while both are online: Branch A can only access the shared subnet via DC1 (as determined by the concentrator priority list)

 

To configure mesh DC-to-DC failover, navigate in the Meraki dashboard to Configure > Site-to-site VPN. As with configuring the hub-and-spoke DC failover, you must first enable VPN and then select to deploy a mesh topology. Organization-wide concentrator priorities can be configured further down on the same page.
 
Mesh DC-DC config

Mesh DC failover relies on Organization-wide concentrator priorities to determine the order in which nodes tunnel to particular hubs.
 
 
If your organization relies on datacenters for critical data storage, computing, or management, having high availability between these hubs in the event of failure or catastrophe is necessary for risk planning. The Meraki MX now offers DC-to-DC VPN failover in two topologies to ensure smooth access to datacenter resources and avoid costly downtime and disruption—allowing IT admins to sleep easier at night.