Synchronise a GlusterFS volume to a remote site using geo replication
Category : How-to
GlusterFS can be used to synchronise a directory to a remote server on a local network for data redundancy or load balancing to provide a highly scalable and available file system.
The problem is when the storage you would like to replicate to is on a remote network, possibly in a different location, GlusterFS does not work very well. This is because GlusterFS is not designed to work when there is a high latency between replication nodes.
GlusterFS provides a feature called geo replication to perform batch based replication of a local volume to a remote machine over SSH.
The below example will use three servers:
- gfs1.jamescoyle.net is one of the two running GlusterFS volume servers.
- gfs2.jamescoyle.net is the second of the two running GlusterFS volume servers. gfs1 and gfs2 both server a single GlusterFS replicated volume called datastore.
- remote.jamescoyle.net is the remote file server which the GlusterFS volume will be replicated to.
GlusterFS uses an SSH connection to the remote host using SSH keys instead of passwords. We’ll need to create an SSH key using ssh-keygen to use for our connection. Run the below command and press return when asked to enter the passphrase to create a key without a passphrase.
ssh-keygen -f /var/lib/glusterd/geo-replication/secret.pem
The output will look like the below:
Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /var/lib/glusterd/geo-replication/secret.pem. Your public key has been saved in/var/lib/glusterd/geo-replication/secret.pem.pub. The key fingerprint is: 46:ba:02:fd:2f:9c:b9:39:ec:6c:90:50:d8:ec:7b:00 [email protected] The key's randomart image is: +--[ RSA 2048]----+ | + | | E + | | + . | | ..o o | | ...+. S | | .+..o | | .=oo | | oOo | | o=+. | +-----------------+
Now you need to copy the public certificate to your remote server in the authorized_keys file. The remote user must be a super user (currently a limitation of GlusterFS) which is root in the below example. If you have multiple GlusterFS volumes in a cluster then you will need to copy the key to all GlusterFS servers.
cat /var/lib/glusterd/geo-replication/secret.pem.pub | ssh [email protected] "cat >> ~/.ssh/authorized_keys"
Make sure the remote server has glusterfs-server installed. Run the below command to install glusterfs-server on remote.jamescoyle.net. You may need to use yum instead of apt-get for Red Hat versions of Linux.
apt-get install glusterfs-server
Create a folder on remote.jamescoyle.net which will be used for the remote replication. All data which transferrs to this machine will be stored in this folder.
mkdir /gluster mkdir /gluster/geo-replication
Create the geo-replication volume with Gluster and replace the below values with your own:
- [SOURCE_DATASTORE] – is the local Gluster data volume which will be replicated to the remote server.
- [REMOTE_SERVER] – is the remote server to receive all the replication data.
- [REMOATE_PATH] – is the path on the remote server to store the files.
gluster volume geo-replication [SOURCE_DATASTORE] [REMOTE_SERVER]:[REMOTE_PATH] start
gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ start Starting geo-replication session between datastore & remote.jamescoyle.net:/gluster/geo-replication/ has been successful
Sometimes on the remote machine, gsyncd (part of the GlusterFS package) may be installed in a different location to the local GlusterFS nodes.
Your log file may show a message similar to below:
Popen: ssh> bash: /usr/lib/x86_64-linux-gnu/glusterfs/gsyncd: No such file or directory
In this scenario you can specify the config command the remote gsyncd location.
gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication config remote-gsyncd /usr/lib/glusterfs/glusterfs/gsyncd
You will then need to run the start command to start the volume synchronisation.
gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ start
You can view the status of your replication task by running the status command.
gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ status
You can stop your volume replication at any time by running the stop command.
gluster volume geo-replication datastore remote.jamescoyle.net:/gluster/geo-replication/ stop