Building a SaaS application in multiple zones of Interoute Virtual Data Centre

Introduction
Prerequisites
Design overview
Create private and public networks
Deploy virtual machines
Connecting to the virtual machines
Testing the private network latency and throughput
Deploy and mount datadisks
Install and configure a database cluster (MariaDB and Galera)
Install the rest of the LAMP stack: Apache and PHP
Install a SaaS application: Piwigo

 

Introduction

This tutorial shows how to build a photo-sharing web app running in two VDC zones, VDC Frankfurt and VDC Paris, with a MariaDB-Galera clustered database running in active-active synchronisation mode. With this it is possible to show the real achievable performance of network latency and throughput, between Frankfurt and Paris. And to show the possibility of fast sychronisation of (large) volumes of data. Because there are no charges for data transfers in Interoute VDC, it is a very cost-effective cloud platform for distributed computing applications.

Prerequisites

You need to have a VDC account, with API access turned on, and you have the two 'API keys' to configure your API connection.

You can follow this tutorial with a VDC free trial account: it only takes a couple of minutes to sign up and get your trial running.

For setup instructions for the VDC API and the Cloudmonkey API tool, see VDC API: Introduction to the API.

Design overview

The goal of this tutorial is to create a 'Software as a Service' type application in two zones of Interoute VDC, Frankfurt and Paris. In each zone there will be a virtual machine (Ubuntu 14.04 Linux) running a MariaDB database server, an Apache web server and PHP libraries, and a 'LAMP stack' photo-sharing application, Piwigo. There are many applications designed to run on a LAMP stack base, and these should drop-in exactly where Piwigo has been used in this tutorial. The last element of the design is that the Galera database clustering system is used to create a live synchronisation between the two MariaDB database servers (sometimes called an 'active-active', or multiple master, setup).

(Some LAMP applications store configuration and/or content data in the local file system, outside of the database. Obviously some or all of this configuration and data will need to be shared across the multiple instances of the application, and the Galera clustering won't do it. So additional file synchronisation would have to be implemented. I am going to ignore such additional complexity for this tutorial.)

This design could be used as a distributed application where both virtual machines are in use simultaneously, the idea being to give users more responsiveness by locating the data and compute closer to users, or it could be used as a type of 'failover' disaster recovery setup where one virtual machine is used live (say Frankfurt), and its data is constantly copied to the database at the secondary site (Paris), where the webserver is switched off. In the event that the live machine experiences a failure (in the machine itself, or in the rare case of an outage of the whole zone, or its network connection), the secondary site could immediately take over with the least possible loss of service for users. If the two sites are connected to a proxy webserver such as HAProxy, an automated failover of DNS records would ensure that users experienced a near-continuity of service.

Create private and public networks

The first step is to create the networks that the virtual machines will use. Each VM will have a public network ('Local with Internet Gateway') connecting to the Internet, and a private network ('Private Direct Connect') through which the databases will synchronise data. The network interconnection is achieved due to the networks belonging to the same 'Direct Connect Group'. You can read more about the different network types at VDC API: How to create a network.

First the Local with Internet Gateway networks.

(local) > createLocalNetwork displaytext='Gateway-Frankfurt' zonename='Frankfurt (ESX)' cidr=192.168.113.0/24 gateway=192.168.113.254
(local) > createLocalNetwork displaytext='Gateway-Paris' zonename='Paris (ESX)' cidr=192.168.104.0/24 gateway=192.168.104.254

And now the private networks. These 'Private Direct Connect' networks require that I have a 'Direct Connect Group' (DCG) in my VDC account. So first I check that a DCG is present:

(local) > listDirectConnectGroups
count = 1
directconnectgroups:
+------+-------+---------+----------+
| sids |   id  |   name  | networks |
+------+-------+---------+----------+
|  []  | 35934 | Default |    []    |
+------+-------+---------+----------+

And I create the networks:

(local) > createPrivateDirectConnect zonename='Frankfurt (ESX)' cidr=10.0.101.0/24 gateway=10.0.101.254 displaytext='privateDC-dcg35934-FRA' dcgid=35934
(local) > createPrivateDirectConnect zonename='Paris (ESX)' cidr=10.0.105.0/24 gateway=10.0.105.254 displaytext='privateDC-dcg35934-PAR' dcgid=35934

Egress rules need to be created for each of the Local Gateway networks, so that the VMs can communicate outward to the Internet to download software. I am going to use the simplest and very permissive rule, which would not be recommended for a production system.

(local) > create egressfirewallrule networkid=UUID protocol=all cidr=0.0.0.0/0

Replace UUID with the network ID for your Local network.

Deploy virtual machines

One virtual machine will be deployed in each zone, using an Ubuntu version 14.04 Certified image template, with 2 GByte of RAM and 1 virtual CPU. Certified templates offer improved performance over the standard public Ubuntu templates, and VMs running on Interoute VDC can be covered by Canonical's paid support services for Ubuntu. (And note that Ubuntu Certified templates have the default admin user set to 'ubuntu' and not root.)

In the following API calls, you will need to replace the 'networkids' with your own UUIDs and note that you need to put the Local Gateway network first in the list so that network becomes the default network for the Ubuntu operating system.

Frankfurt virtual machine:

(local) > deploy virtualmachine networkids=6202e7b7-38a5-40a0-8ca6-511e2a16159a,e8110a4a-1c38-49d6-b03d-9b22c7cda659 displayname=ubuntu1404-SaaS-test-FRA name=ubuntu1404-SaaS-test-FRA zoneid=7144b207-e97e-4e4a-b15d-64a30711e0e7 keypair=keypair-saas templateid=cd136c46-4b74-4968-b59a-00e78c577d34 serviceofferingid=7e8da6c8-586b-4226-b12d-151861aba887

Paris virtual machine:

(local) > deploy virtualmachine networkids=1399bc16-01d6-47cf-a471-308ee763d413,4b32613c-e610-40fe-a281-99f6d84218b0 displayname=ubuntu1404-SaaS-test-PAR name=ubuntu1404-SaaS-test-PAR zoneid=374b937d-2051-4440-b02c-a314dd9cb27e keypair=keypair-saas templateid=cd136c46-4b74-4968-b59a-00e78c577d34 serviceofferingid=7e8da6c8-586b-4226-b12d-151861aba887

I have included the keypair option so that I will be able to connect to the VMs using an SSH keypair, which saves effort with using passwords and is much more secure. You will need to upload an SSH public key in advance using the registerSSHKeyPair API call.

Now that the virtual machines are deployed, I need to create some port-forwarding rules so that the VMs will be accessible from the Internet for SSH connection and web access. First I need to look up the UUIDs of the public IP addresses, which need to be included in the port-forwarding rule.

(local) > list publicipaddresses filter=id,ipaddress,zonename
count = 2
publicipaddress:
+----------------+--------------------------------------+-------------------+
|   ipaddress    |                  id                  |      zonename     |
+----------------+--------------------------------------+-------------------+
| 213.xx.xx.139  | 76839351-3323-4791-9dc5-34845bc5b24c |  Frankfurt (ESX)  |
| 213.xx.xx.221  | 14d081f4-593e-42d1-95a8-16aa7e8a0e62 |    Paris (ESX)    |
+----------------+--------------------------------------+-------------------+

And the four rules:

(local) > create portforwardingrule privateport=22 publicport=52200 ipaddressid=76839351-3323-4791-9dc5-34845bc5b24c virtualmachineid=52c0faf0-c15d-495f-8314-e24f1bfb6134 enablefirewall=true protocol=tcp
(local) > create portforwardingrule privateport=80 publicport=80 ipaddressid=76839351-3323-4791-9dc5-34845bc5b24c virtualmachineid=52c0faf0-c15d-495f-8314-e24f1bfb6134  enablefirewall=true protocol=tcp
(local) > create portforwardingrule privateport=22 publicport=52200 ipaddressid=14d081f4-593e-42d1-95a8-16aa7e8a0e62 virtualmachineid=3a108d7b-3d0e-4d13-b0bd-51d52b80197a enablefirewall=true protocol=tcp
(local) > create portforwardingrule privateport=80 publicport=80 ipaddressid=14d081f4-593e-42d1-95a8-16aa7e8a0e62 virtualmachineid=3a108d7b-3d0e-4d13-b0bd-51d52b80197a enablefirewall=true protocol=tcp

After all of the above I have got the following configuration of networks and virtual machines:

― 'Gateway-Frankfurt' (Zone: Frankfurt (ESX), CIDR: 192.168.113.0/24, IP: 213.xxx.xxx.139)
   └ 192.168.113.40: 'ubuntu1404-SaaS-test-FRA'  (ports: [80]->[80] [52200]->[22]) 
 
 ― 'Network Private Direct Connect Interoute Tutorial 1'|'privateDC-dcg35934-FRA' (Zone: Frankfurt (ESX), CIDR: 10.0.101.0/24)
   └ 10.0.101.2: 'ubuntu1404-SaaS-test-FRA'  

― 'Gateway-Paris' (Zone: Paris (ESX), CIDR: 192.168.104.0/24, IP: 213.xxx.xxx.221)
   └ 192.168.104.230: 'ubuntu1404-SaaS-test-PAR'  (ports: [52200]->[22] [80]->[80]) 
 
 ― 'Network Private Direct Connect Interoute Tutorial 2'|'privateDC-dcg35934-PAR' (Zone: Paris (ESX), CIDR: 10.0.105.0/24)
   └ 10.0.105.147: 'ubuntu1404-SaaS-test-PAR' 

Connecting to the virtual machines

I use the 'ssh' command in Linux and MacOS, or PuTTY in Windows:

$ ssh -p 52200 ubuntu@213.xxx.xxx.221 -i ~/.ssh/id_rsa_SaaSkey

The file ~/.ssh/id_rsa_SaaSkey contains the private key for my SSH keypair. I have xxx'd out the public IP address for reasons of privacy.

One final configuration thing that I need to do is to set a routing rule on the VMs so that the networks will function correctly, by routing private network traffic through the private gateway.

I login to the Frankfurt VM and input:

ubuntu1404-SaaS-test-FRA:~$ sudo ip route add 10.0.0.0/8 via 10.0.101.254

And login to the Paris VM and input:

ubuntu1404-SaaS-test-PAR:~$ sudo ip route add 10.0.0.0/8 via 10.0.105.254

You will need to set the gateway addresses (above, 10.0.101.254) according to the actual addresses that you specified when you setup the networks.

Testing the private network latency and throughput

I am going to do a latency test with the ping program, which is pre-installed in Ubuntu. You can see from the sample output below that the latency is just over 9 milliseconds, and this performance is very reliable.

ubuntu1404-SaaS-test-FRA:~$ ping 10.0.101.2
PING 10.0.101.2 (10.0.101.2) 56(84) bytes of data.
64 bytes from 10.0.101.2: icmp_seq=1 ttl=62 time=9.09 ms
64 bytes from 10.0.101.2: icmp_seq=2 ttl=62 time=9.09 ms
64 bytes from 10.0.101.2: icmp_seq=3 ttl=62 time=9.10 ms
64 bytes from 10.0.101.2: icmp_seq=4 ttl=62 time=9.16 ms
64 bytes from 10.0.101.2: icmp_seq=5 ttl=62 time=9.09 ms
64 bytes from 10.0.101.2: icmp_seq=6 ttl=62 time=9.15 ms
64 bytes from 10.0.101.2: icmp_seq=7 ttl=62 time=9.11 ms
64 bytes from 10.0.101.2: icmp_seq=8 ttl=62 time=9.08 ms
64 bytes from 10.0.101.2: icmp_seq=9 ttl=62 time=9.16 ms
64 bytes from 10.0.101.2: icmp_seq=10 ttl=62 time=9.08 ms

Next, a TCP throughput test with the 'iperf' program, which requires that you install iperf on both virtual machines:

ubuntu1404-SaaS-test-FRA:~$ sudo apt-get install iperf
ubuntu1404-SaaS-test-PAR:~$ sudo apt-get install iperf

Then on one of the VMs I type the following to run an iperf server process:

ubuntu1404-SaaS-test-PAR:~$ iperf -s

And on the other VM:

ubuntu1404-SaaS-test-FRA:~$ iperf -c 10.0.105.147 -P 10
------------------------------------------------------------
Client connecting to 10.0.105.147, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 11] local 10.0.101.2 port 50996 connected with 10.0.105.147 port 5001
[  3] local 10.0.101.2 port 50988 connected with 10.0.105.147 port 5001
[  4] local 10.0.101.2 port 50989 connected with 10.0.105.147 port 5001
[  5] local 10.0.101.2 port 50990 connected with 10.0.105.147 port 5001
[  6] local 10.0.101.2 port 50991 connected with 10.0.105.147 port 5001
[  8] local 10.0.101.2 port 50993 connected with 10.0.105.147 port 5001
[  9] local 10.0.101.2 port 50994 connected with 10.0.105.147 port 5001
[  7] local 10.0.101.2 port 50992 connected with 10.0.105.147 port 5001
[ 10] local 10.0.101.2 port 50995 connected with 10.0.105.147 port 5001
[ 12] local 10.0.101.2 port 50997 connected with 10.0.105.147 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   224 MBytes   188 Mbits/sec
[  7]  0.0-10.0 sec   100 MBytes  83.8 Mbits/sec
[ 11]  0.0-10.0 sec   547 MBytes   457 Mbits/sec
[  4]  0.0-10.0 sec   202 MBytes   168 Mbits/sec
[ 10]  0.0-10.0 sec   494 MBytes   413 Mbits/sec
[ 12]  0.0-10.0 sec   549 MBytes   459 Mbits/sec
[  5]  0.0-10.1 sec   218 MBytes   182 Mbits/sec
[  6]  0.0-10.0 sec   174 MBytes   145 Mbits/sec
[  9]  0.0-10.1 sec   272 MBytes   227 Mbits/sec
[  8]  0.0-10.1 sec   229 MBytes   191 Mbits/sec
[SUM]  0.0-10.1 sec  2.94 GBytes  2.51 Gbits/sec

The specification for throughput performance on Interoute VDC private networks between zones is 3 Gbits/sec, and the measurements which I made with iperf ranged from about 2.5 to 3.5 Gbits/sec. TCP throughput does decline as the network latency increases because TCP communication is two-way: it requires the recipient machine to acknowledge the arrival of every data packet back to the sender machine (and re-sending of packets in case of data loss), a process which takes longer as the latency/distance grows. So if you measure with iperf for widely-separated zones the throughput wil be lower than measured in this demonstration.

Deploy and mount datadisks

In a SaaS setup, data is by far the most fragile element, and you will be doing a range of things to keep your data safe and accessible. Besides what is done at the operating system level, the main way to protect data is to create 'datadisks' in VDC. These discs are based on EBS (elastic block storage) and can be configured with user-specified size and (for a higher unit cost) can be automatically backed up ('EBS protected' or 'EBS mirrored').

I need to create a datadisk in each of the zones that I am using, then attach the disk to the virtual machine. The new disk will have to be partitioned, formatted and mounted in order to be usable by the virtual machine.

Here is the Cloudmonkey API command to create a new datadisk of size 200 GBytes in the Frankfurt zone:

(local) > create volume name=datadisk-mariadb-FRA size=200 zoneid=7144b207-e97e-4e4a-b15d-64a30711e0e7 diskofferingid=e3752c57-6742-4e45-b259-c6b225f155a7

And attach it to the virtual machine:

(local) > attach volume id=14ff2aac-95d0-416a-acc1-5076167f4fbe virtualmachineid=5c8fe455-ff67-41a8-a002-015fdcfff527

You will need to substitute the UUIDs for your own volumes and VMs.

I am using the 'diskofferingtype' for 'EBS Protected' which features automated snapshot backup of the datadisk, with the snapshots stored in the same zone; 'EBS Mirrored' offers additional snapshot backup into another geographically-separated zone, but this is less necessary here as I am already using two zones for the databases.

The steps to partition, format and mount the disk are as follows.

ubuntu1404-SaaS-test-FRA:~$ $ sudo fdisk /dev/sdb
ubuntu1404-SaaS-test-FRA:~$ $ sudo mkfs -t ext4 /dev/sdb1
ubuntu1404-SaaS-test-FRA:~$ $ sudo mkdir /media/datadisk-mariadb
ubuntu1404-SaaS-test-FRA:~$ $ sudo mount /dev/sdb1 /media/datadisk-mariadb

(see Ubuntu: Installing a New Hard Drive for more details.)

Below, I will need to configure MariaDB to use the new folder '/media/datadisk-mariadb' rather than the default location in the main OS filesystem.

Install and configure a database cluster (MariaDB and Galera)

MariaDB is an open source database project founded by the original creators of MySQL, and MariaDB programs are drop-in compatible with MySQL programs. But MariaDB has some useful extensions, the particular advantage here is that the Galera software for clustering is installed and ready-to-use.

To get the latest version of MariaDB (10.1.x), with Galera already integrated, it is necessary to add a software repository to Ubuntu's default list. To find the necessary parameters for each type/version of Linux, I used this webpage: https://downloads.mariadb.org/mariadb/repositories. From that webpage I found the following three commands:

ubuntu1404-SaaS-test-FRA:~$ sudo apt-get install software-properties-common
ubuntu1404-SaaS-test-FRA:~$ sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xcbcb082a1bb943db
ubuntu1404-SaaS-test-FRA:~$ sudo add-apt-repository 'deb [arch=amd64,i386] http://mirrors.coreix.net/mariadb/repo/10.1/ubuntu trusty main'

Then, refresh the Ubuntu repositories, and install MariaDB:

ubuntu1404-SaaS-test-FRA:~$ sudo apt-get update
ubuntu1404-SaaS-test-FRA:~$ sudo apt-get install mariadb-server

The next step is to change the 'data directory' to be located in the additional datadisk. Copy the 'mysql' folder from the default location to the new location:

ubuntu1404-SaaS-test-FRA:~$ sudo cp -R -p /var/lib/mysql /media/datadisk-mariadb

Create the file '/etc/mysql/my.cnf', remove the default configuration file.

ubuntu1404-SaaS-test-FRA:~$ sudo cp /etc/mysql/my.cnf /etc
ubuntu1404-SaaS-test-FRA:~$ sudo mv /etc/mysql/my.cnf /etc/mysql/my.cnf.old

In '/etc/mysql/my.cnf' change the value of 'datadir' as follows:

datadir=/media/datadisk-mariadb/mysql

Now restart the database server program to update to the new configuration:

ubuntu1404-SaaS-test-FRA:~$ sudo /etc/init.d/mysql restart

The next part is to configure and activate the Galera clustering system, which will synchronise the two database servers automatically.

The minimum recommended cluster size for Galera is three database instances. For demonstration purposes I am only using two instances here.

I edited the MariaDB config file 'etc/my.cnf' so that the '[galera]' section looks like the following, on the Paris VM:

[galera]
wsrep_cluster_name=VDCCluster
wsrep_cluster_address="gcomm://10.0.101.2,10.0.105.147"
wsrep_node_name=VDCNode1
wsrep_node_address="10.0.105.147"
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_on=ON
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
# Allow server to accept connections on all interfaces.
bind-address=0.0.0.0

And on the Frankfurt VM:

[galera]
wsrep_cluster_name=VDCCluster
wsrep_cluster_address="gcomm://10.0.101.2,10.0.105.147"
wsrep_node_name=VDCNode2
wsrep_node_address="10.0.101.2"
wsrep_provider=/usr/lib/galera/libgalera_smm.so
wsrep_on=ON
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
# Allow server to accept connections on all interfaces.
bind-address=0.0.0.0

Now, in the Paris VM, restart the database server with the new configuration, and initiate the cluster, based on the information given in the '/etc/my.cnf' file:

ubuntu1404-SaaS-test-FRA:~$ sudo /etc/init.d/mysql stop
ubuntu1404-SaaS-test-FRA:~$ sudo /etc/init.d/mysql start --wsrep-new-cluster

You only need to initiate the cluster one time, in one of the VMs. The other VMs just need to have the database server re-started, and the Galera process on each VM should automatically discover the other members of the cluster:

ubuntu1404-SaaS-test-FRA:~$ sudo /etc/init.d/mysql restart

One last thing to do is to copy the contents of the file '/etc/mysql/debian.cnf' from the Paris VM to the Frankfurt VM. If you don't do this the mysql program will throw up errors like this: "ERROR 1045 (28000): Access denied for user 'debian-sys-maint'@'localhost'". (What happens is that the user password for 'debian-sys-maint' is stored inside the database, and in a config file; when the databases connect and synchronise, a password mis-match is created.)

Install the rest of the LAMP stack: Apache and PHP

There are two more pieces of the LAMP stack to install—on both VMs, though I will only show the commands for one. First, the Apache web server:

ubuntu1404-SaaS-test-FRA:~$ sudo apt-get install apache2

If this installs correctly, and the port-forwarding rules on the VDC Local Gateway network are correct, then I can test the web server by pointing a web browser at 'http://213.xxx.xxx.139' and I should see the default Apache welcome screen. If this fails to work, try running this command on the VM itself:

ubuntu1404-SaaS-test-FRA:~$ wget -O - localhost

If you don't see a string of HTML as output, the webserver may have a problem; or if the HTML is good, there must be a problem with the VDC network configuration.

Second, a bunch of PHP programs and libraries need to be installed. Following are the standard set for Ubuntu; you may not need all of these for specific web applications, or you might need to install additional ones.

ubuntu1404-SaaS-test-FRA:~$ sudo apt-get install php5 php5-mysql php5-cli php5-imagick php5-gd php5-mcrypt php5-curl

Install a SaaS application: Piwigo

Piwigo is an open source photo-sharing application. I had some errors using it, which seemed to be due to having incompatible versions of the various PHP libraries or possibly wrong PHP configurations (either of which are usually time-consuming to solve). Anyway, it worked well enough to demonstrate having two instances of the Piwigo program sychronising via their databases.

Here is what I did:

  • Download the Piwigo program sources into both VMs, in the default Apache web folder '/var/www/html'.

  • Use 'mysql' in each VM to create a new, empty database with name 'piwigo'.

  • Point a web browser at 'http://213.xxx.xxx.139/piwigo' which initiates the Piwigo install process. I filled in the configuration options as in the following screenshot.

  • Do the same thing for the other virtual machine: 'http://213.xxx.xxx.221/piwigo'

  • Login to both instances of Piwigo.

  • Create users, albums, upload photos, etc.