In-country data backup using Interoute Virtual Data Centre

Introduction
Prerequisites
Requirements for a multi-zone backup architecture
Create private and public networks
Deploy virtual machines
Connecting to the virtual machines
Deploy and mount datadisks
Example backup setup: In-country replication for Linux

 

Introduction

One of the strengths of the Interoute Virtual Data Centre (VDC) platform is that there are three countries in Europe where VDC has two geographically-separated compute zones: Frankfurt and Berlin in Germany, Geneva and Zurich in Switzerland, and London and Slough in the United Kingdom. VDC also has two zones in the USA: New York and Los Angeles. Moreover, all of the VDC zones are inter-connected by Interoute's high performance 'backbone' network offering exceptional low latency and high throughput.

This tutorial shows how to create and configure in Interoute VDC a multi-zone architecture of private networks and virtual machines which provides the infrastructure for any kind of backup or replication setup. Simply by choosing your deployment zones and following the steps, you can create 2-zone architectures within geographical borders, or create distributed architectures which inter-connect any or all of the 14 VDC zones distributed around the world (with 3 new zones planned to open in 2016-17).

Interoute's network and data centres in Europe are owned and operated by Interoute Communications, which is a European-based company, and are backed up by all the certifications you would expect of a network and cloud platform that is built for the enterprise. Interoute's network and services are used and trusted by governments, major scientific research organisations, and most of the major global Internet companies.

Prerequisites

You need to have a VDC account, with API access turned on, and you have the two 'API keys' to configure your API connection.

You can follow this tutorial with a VDC free trial account: it only takes a couple of minutes to sign up and get your trial running.

For setup instructions for the VDC API and the Cloudmonkey API tool, see VDC API: Introduction to the API.

Requirements for a multi-zone backup architecture

  • Secure virtual data storage: In Interoute VDC, data is always stored in the precise location of the zone's data centre, and the data will never be moved elsewhere except under your control.

  • Secure private network between zones: Interoute VDC is built into Interoute's backbone network, based on MPLS IPVPN mesh network protocols; the mesh functionality makes it very easy for private networks within zones to be connected together between zones.

  • User-controlled, automated management of networks, virtual machines and storage: VDC has a comprehensive API to control all of its cloud computing elements.

  • Network performance and (free!) cost: Data transfers between VDC zones are free of charge, thus there are no barriers to setting a high frequency of backup (in other words, a very short Recovery Point Objective).

Note: Routing of data over the Interoute backbone network using standard VDC 'Private Direct Connect' networks cannot be guaranteed to stay on any particular route or within geographical areas. Interoute VPN services (at additional cost) can provide a private network with guaranteed routing and throughput capacity between specific endpoints, and with an SLA-backed performance guarantee. It is also possible to create integrated VPNs which connect to your own corporate VPN, or VPNs from other providers.

Create private and public networks

I am going to use the VDC Frankfurt and VDC Berlin zones for this demonstration. I am thinking of Frankfurt as being the primary zone where users will connect to use software applications, with Berlin being used as a backup zone.

The first step is to create the networks within and between the two zones.

Each zone will have a private network ('Private Direct Connect') for the movement of data, and these can be interconnected very simply via the MPLS IPVPN mesh networking of the Interoute backbone network. Therefore you don't need to deploy routers, gateways, tunnels or any other such network paraphernalia, as is required by many other public cloud providers.

The network interconnection is achieved due to the networks belonging to the same 'Direct Connect Group'. You can read more about how this works, and the different network types at VDC API: How to create a network.

In the Frankfurt zone I will create a 'Local with Internet Gateway' network to allow Internet access, as follows.

(local) > createLocalNetwork displaytext='Gateway-Frankfurt' zonename='Frankfurt (ESX)' cidr=192.168.113.0/24 gateway=192.168.113.254

My 'backup' virtual machine(s) in the Berlin zone do not require continuous Internet access and so I will connect these to the Internet via the Internet-connected VM in the Frankfurt zone (and this can be made easier by setting up SSH port-forwarding tunnels and reverse tunnels).

An egress rule needs to be created for the Local Gateway network, so that VMs can communicate outwardly to the Internet. I am going to use the simplest and very permissive rule, which would not be recommended for a production system:

(local) > create egressfirewallrule networkid=UUID protocol=all cidr=0.0.0.0/0

Replace UUID here with the network ID for your Local Gateway network.

Now for the private networks. These 'Private Direct Connect' networks require that I have a 'Direct Connect Group' (DCG) in my VDC account. So first I check that a DCG is present:

(local) > listDirectConnectGroups
count = 1
directconnectgroups:
+------+-------+---------+----------+
| sids |   id  |   name  | networks |
+------+-------+---------+----------+
|  []  | 35934 | Default |    []    |
+------+-------+---------+----------+

And I create the networks, using this DCG:

(local) > createPrivateDirectConnect zonename='Frankfurt (ESX)' cidr=10.0.101.0/24 gateway=10.0.101.254 displaytext='privateDC-dcg35934-FRA' dcgid=35934
(local) > createPrivateDirectConnect zonename='Berlin (ESX)' cidr=10.0.102.0/24 gateway=10.0.102.254 displaytext='privateDC-dcg35934-BER' dcgid=35934

Deploy virtual machines

One virtual machine will be deployed in each zone, using an Ubuntu version 14.04 Certified image template, with 1 GByte of RAM and 1 virtual CPU. Certified templates offer improved performance over the standard public Ubuntu templates, and VMs running on Interoute VDC can be covered by Canonical's paid support services for Ubuntu. (And note that Ubuntu Certified templates have the default admin user set to 'ubuntu' and not root.)

In the following API calls, you will need to replace the 'networkids' values with your own UUIDs and note that, for the Frankfurt VM which has two connected networks, you need to put the Local Gateway network first in the list so that network will become the default network for the Ubuntu operating system.

Frankfurt virtual machine:

(local) > deploy virtualmachine networkids=6202e7b7-38a5-40a0-8ca6-511e2a16159a,e8110a4a-1c38-49d6-b03d-9b22c7cda659 displayname=ubuntu1404-FRA name=ubuntu1404-FRA zoneid=7144b207-e97e-4e4a-b15d-64a30711e0e7 keypair=keypair-germany templateid=ba6490ea-d8f4-4642-86bb-d9e8412798eb serviceofferingid=4cb92069-e001-4637-8848-76d74f406bb8

Berlin virtual machine:

(local) > deploy virtualmachine networkids=d4888f2a-2329-4320-915c-abf409464b91 displayname=ubuntu1404-BER name=ubuntu1404-BER zoneid=fc129b38-d490-4cd9-acf8-838cf7eb168d keypair=keypair-germany templateid=ba6490ea-d8f4-4642-86bb-d9e8412798eb serviceofferingid=4cb92069-e001-4637-8848-76d74f406bb8

I have included the keypair option so that I will be able to connect to the VMs using an SSH keypair, which saves effort with using passwords and is much more secure. You will need to upload an SSH public key in advance (here it is named 'keypair-germany') using the registerSSHKeyPair API call.

Note that you can 'vertical scale' the VMs to a different configuration of CPU and RAM at any time.

Now that the virtual machines are deployed, I need to create a port-forwarding rule so that the Frankfurt VM will be accessible from the Internet for SSH connection. First I need to look up the UUID of the public IP address, which needs to be included in the port-forwarding rule.

(local) > list publicipaddresses filter=id,ipaddress,zonename
publicipaddress:
+----------------+--------------------------------------+-------------------+
|   ipaddress    |                  id                  |      zonename     |
+----------------+--------------------------------------+-------------------+
| 213.xx.xx.139  | 76839351-3323-4791-9dc5-34845bc5b24c |  Frankfurt (ESX)  |
+----------------+--------------------------------------+-------------------+

And the port-forwarding rule:

(local) > create portforwardingrule protocol=tcp enablefirewall=true ipaddressid=76839351-3323-4791-9dc5-34845bc5b24c privateendport=22 publicendport=52201 privateport=22 publicport=52201 virtualmachineid=096417e2-5eaa-40a3-bf92-f9c92b04ff3c

Note that for Local Gateway networks, the firewall is completely closed by default; you need to create port-forwarding rules for every port that will be in use, such as port 80 for a webserver. Remember to add the parameter 'enablefirewall=true' to create the firewall ingress rule, otherwise it will require an additional API call.

After all of the above, I have got the following configuration of networks and virtual machines:

― 'Gateway-Frankfurt' (Zone: Frankfurt (ESX), CIDR: 192.168.113.0/24, IP: 213.xxx.xxx.139)
   └ 192.168.113.247: 'ubuntu1404-FRA'  (ports: [52201]->[22]) 
 
― 'Network Private Direct Connect Interoute Tutorial 1'|'privateDC-dcg35934-FRA' (Zone: Frankfurt (ESX), CIDR: 10.0.101.0/24)
   └ 10.0.101.111: 'ubuntu1404-FRA'  

 ― 'Network Private Direct Connect Interoute Tutorial 4'|'privateDC-dcg35934-BER' (Zone: Berlin (ESX), CIDR: 10.0.102.0/24)
   └ 10.0.102.109: 'ubuntu1404-BER' 

Connecting to the virtual machines

I use the 'ssh' command in Linux and MacOS, or PuTTY in Windows:

$ ssh -p 52201 ubuntu@213.xxx.xxx.221 -i ~/.ssh/id_rsa_germany

The file ~/.ssh/id_rsa_germany contains the private key for my SSH keypair. I have xxx'd out the public IP address for reasons of privacy.

One final configuration thing that I need to do is to set a routing rule on the Frankfurt VM so that the networks will function correctly, by routing private network traffic through the private gateway.

I login to the Frankfurt VM and input:

ubuntu1404-FRA:~$ sudo ip route add 10.0.0.0/8 via 10.0.101.254

You will need to set the gateway address (above, 10.0.101.254) according to the actual address that you specified when you setup your network.

To connect to the Berlin VM, I will do a second 'ssh' from the Frankfurt VM. In order for the keypair authentication to work, I need to copy the the private key file into this VM. So I did a 'vi' command to create a new file to edit:

ubuntu1404-FRA:~$ vi ~/.ssh/id_rsa_germany

And then I used copy-paste to insert the contents of the private key file. Then I did:

ubuntu1404-FRA:~$ chmod 400 ~/.ssh/id_rsa_germany

which sets the file permissions to 'owner read-only'; if you don't do this the 'ssh' program will not accept the key file. (It goes without saying that protecting the private key file from unauthorised use is critical; you should only communicate private keys through encrypted network connections.)

ubuntu1404-FRA:~$ ssh ubuntu@10.0.102.109 -i ~/.ssh/id_rsa_germany

Note that for Private Direct Connect networks, there are no firewalls (therefore no access rules to set up), and you simply connect to any remote VM using its 'private' network IP address (which is sometimes called 'RFC 1918' network addressing). You can always switch on Ubuntu's native firewall if you need a higher level of protection for your virtual machines and data.

I am going to do a quick latency test with the ping program, which is pre-installed in Ubuntu. You can see from the sample output below that the latency between the Frankfurt and Berlin VDC zones is 15.7 milliseconds, and this performance is very reliable.

ubuntu1404-FRA:~$ ping 10.0.102.109
PING 10.0.102.109 (10.0.102.109) 56(84) bytes of data.
64 bytes from 10.0.102.109: icmp_seq=1 ttl=60 time=15.7 ms
64 bytes from 10.0.102.109: icmp_seq=2 ttl=60 time=15.6 ms
64 bytes from 10.0.102.109: icmp_seq=3 ttl=60 time=15.7 ms
64 bytes from 10.0.102.109: icmp_seq=4 ttl=60 time=15.6 ms
64 bytes from 10.0.102.109: icmp_seq=5 ttl=60 time=15.7 ms
64 bytes from 10.0.102.109: icmp_seq=6 ttl=60 time=15.6 ms
--- 10.0.102.109 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5009ms
rtt min/avg/max/mdev = 15.681/15.717/15.779/0.130 ms

Deploy and mount datadisks

To store data in my two-zone backup architecture, I will create a 'datadisk' in each zone, then attach each disk to the virtual machine in the zone. These discs are based on EBS (elastic block storage), which can be configured with user-specified size and (for a higher unit cost) they can be automatically backed up ('EBS protected' or 'EBS mirrored').

Each new disk will have to be partitioned, formatted and mounted in order to be usable by the virtual machine. But once it is formatted the data stored on the disk is permanent and you can un-attach the disk from a VM and re-attach to any other VM, at any time.

Here is the Cloudmonkey API command to create a new datadisk of size 200 GBytes in the Frankfurt zone:

(local) > create volume name=datadisk-ubuntu01-FRA size=200 zoneid=7144b207-e97e-4e4a-b15d-64a30711e0e7 diskofferingid=e3752c57-6742-4e45-b259-c6b225f155a7

And attach it to the virtual machine:

(local) > attach volume id=73037c3c-0583-4714-a679-93be20f2f3fe virtualmachineid=096417e2-5eaa-40a3-bf92-f9c92b04ff3c

You will need to substitute the UUIDs for your own volumes and VMs. And do the same operations for a datadisk in the Berlin zone.

I am using the 'diskofferingtype' for 'EBS Protected' which features automated snapshot backup (every 4 hours) of the datadisk, with the snapshots stored in the same zone ('EBS Mirrored' offers additional remote snapshot backup into another VDC zone).

After attaching the datadisks, you will need to restart the VMs so that the Ubuntu OS detects the new disk (which will appear initially as '/dev/sdb'). The steps to partition, format and mount the disk are as follows:

ubuntu1404-FRA:~$ $ sudo fdisk /dev/sdb

(This command presents a series of prompts, for which I entered the values: 'n', 'p', '1', press enter for the default options for the sectors, then finally 'w' to write this configuration to the disk.)

ubuntu1404-FRA:~$ $ sudo mkfs -t ext4 /dev/sdb1
ubuntu1404-FRA:~$ $ sudo mkdir /media/datadisk01
ubuntu1404-FRA:~$ $ sudo mount /dev/sdb1 /media/datadisk01

The mount point name 'datadisk01' is arbitrary and at your choice. See Ubuntu: Installing a New Hard Drive for more details about the disk installation process in Ubuntu.

Example backup setup: In-country replication for Linux

That completes the VDC infrastructure of networks, virtual machines and EBS disk storage, for two VDC zones within Germany. I can use this as the basis for a backup system using one of the many different data backup applications in Linux.

As a simple demonstration I am going to install and use the backup program Duplicity. This has the nice feature that all data can be encrypted before leaving the primary machine, and thus the data is transported and stored more securely.

To install Duplicity:

ubuntu@ubuntu1404-FRA:~$ sudo apt-get install duplicity python-paramiko

To get some data to work with, I downloaded and expanded the source image for SageMath, which comprises about 4.4 GBytes of data and 96,000 individual files.

I ran duplicity with this command, which applies encryption by default:

ubuntu@ubuntu1404-FRA:~$ sudo duplicity --volsize 250 --ssh-options="-oIdentityFile=/home/ubuntu/.ssh/id_rsa_germany" /media/datadisk01/SageMath scp://ubuntu@10.0.102.109//media/datadisk01
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
GnuPG passphrase: 
Retype passphrase to confirm: 
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1461114651.88 (Wed Apr 20 01:10:51 2016)
EndTime 1461115150.03 (Wed Apr 20 01:19:10 2016)
ElapsedTime 498.15 (8 minutes 18.15 seconds)
SourceFiles 95954
SourceFileSize 4735205214 (4.41 GB)
NewFiles 95954
NewFileSize 4735205214 (4.41 GB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 95954
RawDeltaSize 4685920181 (4.36 GB)
TotalDestinationSizeChange 1352516590 (1.26 GB)
Errors 0
-------------------------------------------------

The first run of duplicity creates a full backup, while repeated runs will be incremental, that is, they only copy changes that have taken place since the last backup. You can see how this works from the backup files on the Berlin VM:

ubuntu@ubuntu1404-BER:~$ ls -al -h /media/datadisk01
total 1.4G
-rw-rw-r-- 1 ubuntu ubuntu  596 Apr 20 01:19 duplicity-full.20160420T011027Z.manifest.gpg
-rw-rw-r-- 1 ubuntu ubuntu 251M Apr 20 01:12 duplicity-full.20160420T011027Z.vol1.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu 251M Apr 20 01:13 duplicity-full.20160420T011027Z.vol2.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu 250M Apr 20 01:14 duplicity-full.20160420T011027Z.vol3.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu 250M Apr 20 01:15 duplicity-full.20160420T011027Z.vol4.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu 250M Apr 20 01:18 duplicity-full.20160420T011027Z.vol5.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu  40M Apr 20 01:19 duplicity-full.20160420T011027Z.vol6.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu  64M Apr 20 01:19 duplicity-full-signatures.20160420T011027Z.sigtar.gpg
-rw-rw-r-- 1 ubuntu ubuntu  218 Apr 20 01:23 duplicity-inc.20160420T011027Z.to.20160420T012238Z.manifest.gpg
-rw-rw-r-- 1 ubuntu ubuntu  102 Apr 20 01:23 duplicity-inc.20160420T011027Z.to.20160420T012238Z.vol1.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu  228 Apr 20 01:26 duplicity-inc.20160420T012238Z.to.20160420T012528Z.manifest.gpg
-rw-rw-r-- 1 ubuntu ubuntu 2.2K Apr 20 01:26 duplicity-inc.20160420T012238Z.to.20160420T012528Z.vol1.difftar.gpg
-rw-rw-r-- 1 ubuntu ubuntu  102 Apr 20 01:23 duplicity-new-signatures.20160420T011027Z.to.20160420T012238Z.sigtar.gpg
-rw-rw-r-- 1 ubuntu ubuntu  430 Apr 20 01:26 duplicity-new-signatures.20160420T012238Z.to.20160420T012528Z.sigtar.gpg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Social

youtubelinkedinfacebookgoogle plustwitter

Interoute's Awards

UK Cloud Awards 2016
Best Cloud Service Provider at the UK Cloud Awards 2016

Gartner Magic Quadrant
2017 Leader in Gartner's Magic Quadrant for Managed Hybrid Cloud Hosting, Europe
2016 Leader in Gartner's Magic Quadrant for Managed Hybrid Cloud Hosting, Europe

Gartner 2015 Magic Quadrant
Leader in Gartner's Cloud-Enabled Managed Hosting Magic Quadrant, Europe

IT Europa European IT & Software Excellence Award
2015 Managed Service Solution of the Year for Interoute Virtual Data Centre

ECO Internet Award
2014 Carrier/ISP Winner for Interoute One Bridge

Capacity Awards
2014 Best pan-European Wholesale Carrier
2013 Best European Product/Service Interoute SmartPoint

Data Center Market
2014 Data Centre Innovation