Synchronise a GlusterFS volume to a remote site using geo replication

Synchronise a GlusterFS volume to a remote site using geo replication

Get Social!

gluster-orange-antGlusterFS can be used to synchronise a directory to a remote server on a local network for data redundancy or load balancing to provide a highly scalable and available file system.

The problem is when the storage you would like to replicate to is on a remote network, possibly in a different location, GlusterFS does not work very well. This is because GlusterFS is not designed to work when there is a high latency between replication nodes.

GlusterFS provides a feature called geo replication to perform batch based replication of a local volume to a remote machine over SSH.

The below example will use three servers:

  • gfs1.jamescoyle.net is one of the two running GlusterFS volume servers.
  • gfs2.jamescoyle.net is the second of the two running GlusterFS volume servers. gfs1 and gfs2 both server a single GlusterFS replicated volume called datastore.
  • remote.jamescoyle.net is the remote file server which the GlusterFS volume will be replicated to.

GlusterFS uses an SSH connection to the remote host using SSH keys instead of passwords. We’ll need to create an SSH key using ssh-keygen to use for our connection. Run the below command and press return when asked to enter the passphrase to create a key without a passphrase. 

The output will look like the below:

Now you need to copy the public certificate to your remote server in the authorized_keys file. The remote user must be a super user (currently a limitation of GlusterFS) which is root in the below example. If you have multiple GlusterFS volumes in a cluster then you will need to copy the key to all GlusterFS servers.

Make sure the remote server has glusterfs-server installed. Run the below command to install glusterfs-server on remote.jamescoyle.net. You may need to use yum instead of apt-get for Red Hat versions of Linux.

Create a folder on remote.jamescoyle.net which will be used for the remote replication. All data which transferrs to this machine will be stored in this folder.

Create the geo-replication volume with Gluster and replace the below values with your own:

  • [SOURCE_DATASTORE] – is the local Gluster data volume which will be replicated to the remote server.
  • [REMOTE_SERVER] – is the remote server to receive all the replication data.
  • [REMOATE_PATH] – is the path on the remote server to store the files.

Example:

Sometimes on the remote machine, gsyncd (part of the GlusterFS package) may be installed in a different location to the local GlusterFS nodes.

Your log file may show a message similar to below:

In this scenario you can specify the config command the remote gsyncd location.

You will then need to run the start command to start the volume synchronisation.

You can view the status of your replication task by running the status command.

You can stop your volume replication at any time by running the stop command.


Visit our advertisers

Search

Quick Poll

Do you use GlusterFS in your workplace?

Visit our advertisers