In-country data backup using Interoute Virtual Data Centre
- Requirements for a multi-zone backup architecture
- Create private and public networks
- Deploy virtual machines
- Connecting to the virtual machines
- Deploy and mount datadisks
- Example backup setup: In-country replication for Linux
One of the strengths of the Interoute Virtual Data Centre (VDC) platform is that there are three countries in Europe where VDC has two geographically-separated compute zones: Frankfurt and Berlin in Germany, Geneva and Zurich in Switzerland, and London and Slough in the United Kingdom. VDC also has two zones in the USA: New York and Los Angeles. Moreover, all of the VDC zones are inter-connected by Interoute's high performance 'backbone' network offering exceptional low latency and high throughput.
This tutorial shows how to create and configure in Interoute VDC a multi-zone architecture of private networks and virtual machines which provides the infrastructure for any kind of backup or replication setup. Simply by choosing your deployment zones and following the steps, you can create 2-zone architectures within geographical borders, or create distributed architectures which inter-connect any or all of the 14 VDC zones distributed around the world (with 3 new zones planned to open in 2016-17).
Interoute's network and data centres in Europe are owned and operated by Interoute Communications, which is a European-based company, and are backed up by all the certifications you would expect of a network and cloud platform that is built for the enterprise. Interoute's network and services are used and trusted by governments, major scientific research organisations, and most of the major global Internet companies.
You need to have a VDC account, with API access turned on, and you have the two 'API keys' to configure your API connection.
For setup instructions for the VDC API and the Cloudmonkey API tool, see VDC API: Introduction to the API.
Secure virtual data storage: In Interoute VDC, data is always stored in the precise location of the zone's data centre, and the data will never be moved elsewhere except under your control.
Secure private network between zones: Interoute VDC is built into Interoute's backbone network, based on MPLS IPVPN mesh network protocols; the mesh functionality makes it very easy for private networks within zones to be connected together between zones.
User-controlled, automated management of networks, virtual machines and storage: VDC has a comprehensive API to control all of its cloud computing elements.
Network performance and (free!) cost: Data transfers between VDC zones are free of charge, thus there are no barriers to setting a high frequency of backup (in other words, a very short Recovery Point Objective).
Note: Routing of data over the Interoute backbone network using standard VDC 'Private Direct Connect' networks cannot be guaranteed to stay on any particular route or within geographical areas. Interoute VPN services (at additional cost) can provide a private network with guaranteed routing and throughput capacity between specific endpoints, and with an SLA-backed performance guarantee. It is also possible to create integrated VPNs which connect to your own corporate VPN, or VPNs from other providers.
I am going to use the VDC Frankfurt and VDC Berlin zones for this demonstration. I am thinking of Frankfurt as being the primary zone where users will connect to use software applications, with Berlin being used as a backup zone.
The first step is to create the networks within and between the two zones.
Each zone will have a private network ('Private Direct Connect') for the movement of data, and these can be interconnected very simply via the MPLS IPVPN mesh networking of the Interoute backbone network. Therefore you don't need to deploy routers, gateways, tunnels or any other such network paraphernalia, as is required by many other public cloud providers.
The network interconnection is achieved due to the networks belonging to the same 'Direct Connect Group'. You can read more about how this works, and the different network types at VDC API: How to create a network.
In the Frankfurt zone I will create a 'Local with Internet Gateway' network to allow Internet access, as follows.
createLocalNetwork displaytext='Gateway-Frankfurt' zonename='Frankfurt (ESX)' cidr=192.168.113.0/24 gateway=192.168.113.254
My 'backup' virtual machine(s) in the Berlin zone do not require continuous Internet access and so I will connect these to the Internet via the Internet-connected VM in the Frankfurt zone (and this can be made easier by setting up SSH port-forwarding tunnels and reverse tunnels).
An egress rule needs to be created for the Local Gateway network, so that VMs can communicate outwardly to the Internet. I am going to use the simplest and very permissive rule, which would not be recommended for a production system:
create egressfirewallrule networkid=UUID protocol=all cidr=0.0.0.0/0
Replace UUID here with the network ID for your Local Gateway network.
Now for the private networks. These 'Private Direct Connect' networks require that I have a 'Direct Connect Group' (DCG) in my VDC account. So first I check that a DCG is present:
listDirectConnectGroupscount = 1 directconnectgroups: +------+-------+---------+----------+ | sids | id | name | networks | +------+-------+---------+----------+ |  | 35934 | Default |  | +------+-------+---------+----------+
And I create the networks, using this DCG:
createPrivateDirectConnect zonename='Frankfurt (ESX)' cidr=10.0.101.0/24 gateway=10.0.101.254 displaytext='privateDC-dcg35934-FRA' dcgid=35934
createPrivateDirectConnect zonename='Berlin (ESX)' cidr=10.0.102.0/24 gateway=10.0.102.254 displaytext='privateDC-dcg35934-BER' dcgid=35934
One virtual machine will be deployed in each zone, using an Ubuntu version 14.04 Certified image template, with 1 GByte of RAM and 1 virtual CPU. Certified templates offer improved performance over the standard public Ubuntu templates, and VMs running on Interoute VDC can be covered by Canonical's paid support services for Ubuntu. (And note that Ubuntu Certified templates have the default admin user set to 'ubuntu' and not root.)
In the following API calls, you will need to replace the 'networkids' values with your own UUIDs and note that, for the Frankfurt VM which has two connected networks, you need to put the Local Gateway network first in the list so that network will become the default network for the Ubuntu operating system.
Frankfurt virtual machine:
deploy virtualmachine networkids=6202e7b7-38a5-40a0-8ca6-511e2a16159a,e8110a4a-1c38-49d6-b03d-9b22c7cda659 displayname=ubuntu1404-FRA name=ubuntu1404-FRA zoneid=7144b207-e97e-4e4a-b15d-64a30711e0e7 keypair=keypair-germany templateid=ba6490ea-d8f4-4642-86bb-d9e8412798eb serviceofferingid=4cb92069-e001-4637-8848-76d74f406bb8
Berlin virtual machine:
deploy virtualmachine networkids=d4888f2a-2329-4320-915c-abf409464b91 displayname=ubuntu1404-BER name=ubuntu1404-BER zoneid=fc129b38-d490-4cd9-acf8-838cf7eb168d keypair=keypair-germany templateid=ba6490ea-d8f4-4642-86bb-d9e8412798eb serviceofferingid=4cb92069-e001-4637-8848-76d74f406bb8
I have included the
keypair option so that I will be able to connect to the VMs using an SSH keypair, which saves effort with using passwords and is much more secure. You will need to upload an SSH public key in advance (here it is named 'keypair-germany') using the registerSSHKeyPair API call.
Note that you can 'vertical scale' the VMs to a different configuration of CPU and RAM at any time.
Now that the virtual machines are deployed, I need to create a port-forwarding rule so that the Frankfurt VM will be accessible from the Internet for SSH connection. First I need to look up the UUID of the public IP address, which needs to be included in the port-forwarding rule.
list publicipaddresses filter=id,ipaddress,zonenamepublicipaddress: +----------------+--------------------------------------+-------------------+ | ipaddress | id | zonename | +----------------+--------------------------------------+-------------------+ | 213.xx.xx.139 | 76839351-3323-4791-9dc5-34845bc5b24c | Frankfurt (ESX) | +----------------+--------------------------------------+-------------------+
And the port-forwarding rule:
create portforwardingrule protocol=tcp enablefirewall=true ipaddressid=76839351-3323-4791-9dc5-34845bc5b24c privateendport=22 publicendport=52201 privateport=22 publicport=52201 virtualmachineid=096417e2-5eaa-40a3-bf92-f9c92b04ff3c
Note that for Local Gateway networks, the firewall is completely closed by default; you need to create port-forwarding rules for every port that will be in use, such as port 80 for a webserver. Remember to add the parameter 'enablefirewall=true' to create the firewall ingress rule, otherwise it will require an additional API call.
After all of the above, I have got the following configuration of networks and virtual machines:
― 'Gateway-Frankfurt' (Zone: Frankfurt (ESX), CIDR: 192.168.113.0/24, IP: 213.xxx.xxx.139) └ 192.168.113.247: 'ubuntu1404-FRA' (ports: ->) ― 'Network Private Direct Connect Interoute Tutorial 1'|'privateDC-dcg35934-FRA' (Zone: Frankfurt (ESX), CIDR: 10.0.101.0/24) └ 10.0.101.111: 'ubuntu1404-FRA' ― 'Network Private Direct Connect Interoute Tutorial 4'|'privateDC-dcg35934-BER' (Zone: Berlin (ESX), CIDR: 10.0.102.0/24) └ 10.0.102.109: 'ubuntu1404-BER'
I use the 'ssh' command in Linux and MacOS, or PuTTY in Windows:
ssh -p 52201 email@example.com -i ~/.ssh/id_rsa_germany
~/.ssh/id_rsa_germany contains the private key for my SSH keypair. I have xxx'd out the public IP address for reasons of privacy.
One final configuration thing that I need to do is to set a routing rule on the Frankfurt VM so that the networks will function correctly, by routing private network traffic through the private gateway.
I login to the Frankfurt VM and input:
sudo ip route add 10.0.0.0/8 via 10.0.101.254
You will need to set the gateway address (above, 10.0.101.254) according to the actual address that you specified when you setup your network.
To connect to the Berlin VM, I will do a second 'ssh' from the Frankfurt VM. In order for the keypair authentication to work, I need to copy the the private key file into this VM. So I did a 'vi' command to create a new file to edit:
And then I used copy-paste to insert the contents of the private key file. Then I did:
chmod 400 ~/.ssh/id_rsa_germany
which sets the file permissions to 'owner read-only'; if you don't do this the 'ssh' program will not accept the key file. (It goes without saying that protecting the private key file from unauthorised use is critical; you should only communicate private keys through encrypted network connections.)
ssh firstname.lastname@example.org -i ~/.ssh/id_rsa_germany
Note that for Private Direct Connect networks, there are no firewalls (therefore no access rules to set up), and you simply connect to any remote VM using its 'private' network IP address (which is sometimes called 'RFC 1918' network addressing). You can always switch on Ubuntu's native firewall if you need a higher level of protection for your virtual machines and data.
I am going to do a quick latency test with the ping program, which is pre-installed in Ubuntu. You can see from the sample output below that the latency between the Frankfurt and Berlin VDC zones is 15.7 milliseconds, and this performance is very reliable.
ping 10.0.102.109PING 10.0.102.109 (10.0.102.109) 56(84) bytes of data. 64 bytes from 10.0.102.109: icmp_seq=1 ttl=60 time=15.7 ms 64 bytes from 10.0.102.109: icmp_seq=2 ttl=60 time=15.6 ms 64 bytes from 10.0.102.109: icmp_seq=3 ttl=60 time=15.7 ms 64 bytes from 10.0.102.109: icmp_seq=4 ttl=60 time=15.6 ms 64 bytes from 10.0.102.109: icmp_seq=5 ttl=60 time=15.7 ms 64 bytes from 10.0.102.109: icmp_seq=6 ttl=60 time=15.6 ms --- 10.0.102.109 ping statistics --- 6 packets transmitted, 6 received, 0% packet loss, time 5009ms rtt min/avg/max/mdev = 15.681/15.717/15.779/0.130 ms
To store data in my two-zone backup architecture, I will create a 'datadisk' in each zone, then attach each disk to the virtual machine in the zone. These discs are based on EBS (elastic block storage), which can be configured with user-specified size and (for a higher unit cost) they can be automatically backed up ('EBS protected' or 'EBS mirrored').
Each new disk will have to be partitioned, formatted and mounted in order to be usable by the virtual machine. But once it is formatted the data stored on the disk is permanent and you can un-attach the disk from a VM and re-attach to any other VM, at any time.
Here is the Cloudmonkey API command to create a new datadisk of size 200 GBytes in the Frankfurt zone:
create volume name=datadisk-ubuntu01-FRA size=200 zoneid=7144b207-e97e-4e4a-b15d-64a30711e0e7 diskofferingid=e3752c57-6742-4e45-b259-c6b225f155a7
And attach it to the virtual machine:
attach volume id=73037c3c-0583-4714-a679-93be20f2f3fe virtualmachineid=096417e2-5eaa-40a3-bf92-f9c92b04ff3c
You will need to substitute the UUIDs for your own volumes and VMs. And do the same operations for a datadisk in the Berlin zone.
I am using the 'diskofferingtype' for 'EBS Protected' which features automated snapshot backup (every 4 hours) of the datadisk, with the snapshots stored in the same zone ('EBS Mirrored' offers additional remote snapshot backup into another VDC zone).
After attaching the datadisks, you will need to restart the VMs so that the Ubuntu OS detects the new disk (which will appear initially as '/dev/sdb'). The steps to partition, format and mount the disk are as follows:
$ sudo fdisk /dev/sdb
(This command presents a series of prompts, for which I entered the values: 'n', 'p', '1', press enter for the default options for the sectors, then finally 'w' to write this configuration to the disk.)
$ sudo mkfs -t ext4 /dev/sdb1ubuntu1404-FRA:~$
$ sudo mkdir /media/datadisk01ubuntu1404-FRA:~$
$ sudo mount /dev/sdb1 /media/datadisk01
The mount point name 'datadisk01' is arbitrary and at your choice. See Ubuntu: Installing a New Hard Drive for more details about the disk installation process in Ubuntu.
That completes the VDC infrastructure of networks, virtual machines and EBS disk storage, for two VDC zones within Germany. I can use this as the basis for a backup system using one of the many different data backup applications in Linux.
As a simple demonstration I am going to install and use the backup program Duplicity. This has the nice feature that all data can be encrypted before leaving the primary machine, and thus the data is transported and stored more securely.
To install Duplicity:
sudo apt-get install duplicity python-paramiko
To get some data to work with, I downloaded and expanded the source image for SageMath, which comprises about 4.4 GBytes of data and 96,000 individual files.
I ran duplicity with this command, which applies encryption by default:
sudo duplicity --volsize 250 --ssh-options="-oIdentityFile=/home/ubuntu/.ssh/id_rsa_germany" /media/datadisk01/SageMath scp://email@example.com//media/datadisk01Local and Remote metadata are synchronized, no sync needed. Last full backup date: none GnuPG passphrase: Retype passphrase to confirm: No signatures found, switching to full backup. --------------[ Backup Statistics ]-------------- StartTime 1461114651.88 (Wed Apr 20 01:10:51 2016) EndTime 1461115150.03 (Wed Apr 20 01:19:10 2016) ElapsedTime 498.15 (8 minutes 18.15 seconds) SourceFiles 95954 SourceFileSize 4735205214 (4.41 GB) NewFiles 95954 NewFileSize 4735205214 (4.41 GB) DeletedFiles 0 ChangedFiles 0 ChangedFileSize 0 (0 bytes) ChangedDeltaSize 0 (0 bytes) DeltaEntries 95954 RawDeltaSize 4685920181 (4.36 GB) TotalDestinationSizeChange 1352516590 (1.26 GB) Errors 0 -------------------------------------------------
The first run of duplicity creates a full backup, while repeated runs will be incremental, that is, they only copy changes that have taken place since the last backup. You can see how this works from the backup files on the Berlin VM:
ls -al -h /media/datadisk01total 1.4G -rw-rw-r-- 1 ubuntu ubuntu 596 Apr 20 01:19 duplicity-full.20160420T011027Z.manifest.gpg -rw-rw-r-- 1 ubuntu ubuntu 251M Apr 20 01:12 duplicity-full.20160420T011027Z.vol1.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 251M Apr 20 01:13 duplicity-full.20160420T011027Z.vol2.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 250M Apr 20 01:14 duplicity-full.20160420T011027Z.vol3.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 250M Apr 20 01:15 duplicity-full.20160420T011027Z.vol4.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 250M Apr 20 01:18 duplicity-full.20160420T011027Z.vol5.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 40M Apr 20 01:19 duplicity-full.20160420T011027Z.vol6.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 64M Apr 20 01:19 duplicity-full-signatures.20160420T011027Z.sigtar.gpg -rw-rw-r-- 1 ubuntu ubuntu 218 Apr 20 01:23 duplicity-inc.20160420T011027Z.to.20160420T012238Z.manifest.gpg -rw-rw-r-- 1 ubuntu ubuntu 102 Apr 20 01:23 duplicity-inc.20160420T011027Z.to.20160420T012238Z.vol1.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 228 Apr 20 01:26 duplicity-inc.20160420T012238Z.to.20160420T012528Z.manifest.gpg -rw-rw-r-- 1 ubuntu ubuntu 2.2K Apr 20 01:26 duplicity-inc.20160420T012238Z.to.20160420T012528Z.vol1.difftar.gpg -rw-rw-r-- 1 ubuntu ubuntu 102 Apr 20 01:23 duplicity-new-signatures.20160420T011027Z.to.20160420T012238Z.sigtar.gpg -rw-rw-r-- 1 ubuntu ubuntu 430 Apr 20 01:26 duplicity-new-signatures.20160420T012238Z.to.20160420T012528Z.sigtar.gpg