GlusterFS performance tuning

GlusterFS performance tuning

Category : How-to

Get Social!

gluster-orange-antI have been using GlusterFS to provide file synchronisation over two networked servers. As soon as the first file was replicated between the two nodes I wanted to understand the time it took for the file to be available on the second node. I’ll call this replication latency.

As discussed in my other blog posts, it is important to understand what the limitations are in the system without the GlusterFS layer. File system and network speed need to be understood so that we are not blaming high replication latency on GlusterFS when it’s slow because of other factors.

The next thing to note is that replication latency is affected by the type of file you are transferring between nodes. Many small files will result in lower transfer speeds, whereas very large files will reach the highest speeds. This is because there is a large overhead with each file replicated with GlusterFS meaning the larger the file the more the overhead is reduced when compared to transferring the actual file.

With all performance tuning, there are no magic values for these which work on all systems. The defaults in GlusterFS are configured at install time to provide best performance over mixed workloads. To squeeze performance out of GlusterFS, use an understanding of the below parameters and how them may be used in your setup.

After making a change, be sure to restart all GlusterFS processes and begin benchmarking the new values.

GlusterFS specific

GlusterFS volumes can be configured with multiple settings. These can be set on a volume using the below command substituting [VOLUME] for the volume to alter, [OPTION]  for the parameter name and [PARAMETER] for the parameter value.

Example:

Or you can add the parameter to the glusterfs.vol config file.

  • performance.write-behind-window-size – the size in bytes to use for the per file write behind buffer. Default: 1MB.
  • performance.cache-refresh-timeout – the time in seconds a cached data file will be kept until data revalidation occurs. Default: 1 second.
  • performance.cache-size – the size in bytes to use for the read cache. Default: 32MB.
  • cluster.stripe-block-size – the size in bytes of the unit that will be read from or written to on the GlusterFS volume. Smaller values are better for smaller files and larger sizes for larger files. Default: 128KB.
  • performance.io-thread-count – is the maximum number of threads used for IO. Higher numbers improve concurrent IO operations, providing your disks can keep up. Default: 16.

Other Notes

When mounting your storage for the GlusterFS later, make sure it is configured for the type of workload you have.

  • When mounting your GlusterFS storage from a remote server to your local server, be sure to dissable direct-io as this will enable the kernel read ahead and file system cache. This will be sensible for most workloads where caching of files is beneficial.
  • When mounting the GlusterFS volume over NFS use noatime and nodiratime to remove the timestamps over NFS.

I haven’t been working with GlusterFS for long so I would be very interested in your thoughts on performance. Please leave a comment below.


Related posts:


17 Comments

Mark

16-Oct-2013 at 9:24 pm

James, I’m confused by the first bullet in Other Notes. Are you saying that the bricks should be mounted with direct-io disabled or that the glusterfs mounted volume should be mounted with direct-io disabled?

    james.coyle

    17-Oct-2013 at 10:49 am

    Hi Mark,

    I have re-worded the bullet point as it was both misleading and incorrect!

    direct-io-mode, when enabled will disable any kernel based caching mechanisms such as read ahead which is generally bad for performance. There are some things which break this rule – applications which implement their own caching (such as databases) would likely suffer a performance degradation by using direct-io-mode=disable.

    My apologies for the confusion – I hope this clears it up.

    Cheers,

    James.

      Alan

      12-Nov-2013 at 12:52 pm

      The mount option for fstab is actually direct-io-mode=[enable|disable]. This is for GlusterFS 3.4.0.

      keiviw

      18-Jun-2016 at 1:05 am

      By “mount -t glusterfs XXX:/testvol -o direct-io-mode=enable mountpoint”, the GlusterFS client will work in direct-io mode,but the GlusterFS server will ignore the O_DIRECT flag,in other words,the data will be cached in server.Can the server work in direct-io mode?How to?

higkoo

22-Jun-2014 at 5:51 am

volume set: failed: option : direct-io-mode does not exist
Did you mean read-hash-mode?

    james.coyle

    23-Jun-2014 at 11:35 am

    Where are you trying to apply this option?

    I haven’t used it for a while so there is a chance it has changed in the more recent versions of GlusterFS.

opsokkebalje

26-Aug-2014 at 9:19 am

I’ve setup a gluster 2 node setup (centos 7)
With 1 glusterfs client (a kvm host with 1 guest).

*all servers here are IBM 3550, 32Gbram, 15k sas disks + 1gb cisco switch

The 3550’s native disk speed :
dd if=/dev/zero of=./out1 bs=1G count=1
1073741824 bytes (1.1 GB) copied, 1.48769 s, 722 MB/s

But the same dd test inside the kvm guest (kvm storage is glusterfs mount) is unbelievably slow:
dd if=/dev/zero of=/tmp/out1 bs=1G count=1
1073741824 bytes (1.1 GB) copied, 564.394 s, 1.9 MB/s

So glusterfs appears to be just another rich man’s solution where high availability + speed will only come into play with 10+ nodes, 10Gb eth and SSD … So as a poorman I’m still stuck with drbd.

    Nico

    29-Aug-2014 at 2:44 pm

    I paste my results as a long day trying GlusterFS.
    Both are running on KVM centos 6.5, standar kernel, running over proxmox 3.2
    GlusterFS version 3.5.4
    Proxmox is running in a Xeon E5430, 16 GB ram and SAS 15K Raid 1 146 GB, nothing really impressive, but:

    Direct write to folder:
    [[email protected] ~]# dd if=/dev/zero of=/opt/deleteme1.img bs=1M count=1024 conv=fsync
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1,1 GB) copied, 6,6138 s, 162 MB/s

    Write to a GlusterFS mounted folder
    [[email protected] ~]# dd if=/dev/zero of=/mnt/nethd/deleteme1.img bs=1M count=1024 conv=fsync
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1,1 GB) copied, 14,793 s, 72,6 MB/s

    As you can see, something is really wrong in your setup, you should achieve more speed.
    Mine KVM guests are using Virtio driver on the HD and raw image instead qcow or vmware.

    Hope it helps.

Omish_Man

15-Sep-2014 at 10:24 am

I’ve noticed that when copying data from a “regular” FS to GlusterFS that the GlusterFS files take up more blocks. The number of blocks and blocksize appears to be the same, but Gluster is using more blocks to store the same files.

I assume this is because of replication meta-data, but the sizes sizes are around 25% more. I’m dealing with around 1TB of data replicated over a pair of Raspbery Pi’s.

Have you seen this behavior as well?

Jack

8-Jan-2015 at 2:21 am

james,I just use a glusterfs ,Run the following command:
# gluster volume set v3_upload performance.cache-size 1G
volume set: failed: invalid number format “1G” in option “cache-size”
why?

# gluster –version
glusterfs 3.6.1

Jack

8-Jan-2015 at 2:27 am

# gluster volume set v3_upload performance.cache-size 2GB
volume set: success

punit

8-Apr-2015 at 8:48 am

Hi,

I am getting very slow throughput..i have the following setup :-

A. 4* host machine with Centos 7(Glusterfs 3.6.2 | Distributed Replicated | replica=2)
B. Each server has 24 SSD as bricks…(Without HW Raid | JBOD)
C. Each server has 2 Additional ssd for OS…
D. Network 2*10G with bonding…(2*E5 CPU and 64GB RAM)

I am getting the performance slower then SATA…even i am using all SSD in my ENV..

Gluster Volume options :-

+++++++++++++++
Options Reconfigured:
performance.nfs.write-behind-window-size: 1024MB
performance.io-thread-count: 32
performance.cache-size: 1024MB
cluster.quorum-type: auto
cluster.server-quorum-type: server
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
nfs.disable: on
user.cifs: enable
auth.allow: *
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 36
storage.owner-gid: 36
server.allow-insecure: on
network.ping-timeout: 0
diagnostics.brick-log-level: INFO
+++++++++++++++++++

Test with SATA and Glusterfs SSD….
———————
Dell EQL (SATA disk 7200 RPM)
—-
[[email protected] ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 20.7763 s, 12.9 MB/s
[[email protected] ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 23.5947 s, 11.4 MB/s

GlsuterFS SSD

[[email protected] ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 66.2572 s, 4.1 MB/s
[[email protected] ~]# dd if=/dev/zero of=test bs=64k count=4k oflag=dsync
4096+0 records in
4096+0 records out
268435456 bytes (268 MB) copied, 62.6922 s, 4.3 MB/s
————————

Please let me know what i should do to improve the performance of my glusterfs…

Thanks,
Punit Dambiwal

    Ram

    1-Apr-2017 at 2:03 pm

    Hi,

    what is mounting method for gluster volume and bonding level.

Shawn

19-Apr-2015 at 12:02 am

Has anyone seen this errror?

Has not responded in the last 42 seconds, disconnecting.

Which then leads to:

Transport endpoint is not connected

The only way to recover is to umount and then mount the drive again. The only bad thing is that how long it takes you to find out about this. Is there any way to stop? Is this some type of timeout?

keiviw

17-Jun-2016 at 2:02 pm

The servers will ignore the direct-io,even if you mount the volume in direct-io-mode.According to my test,when the glusterfs client was the glusterfs server(file was hashed in the server),it would cached the data. Will the server not ignore the direct-io,is there something to tune this?

lenovo

28-Jul-2016 at 7:45 pm

These are my parameters only,

Volume Name: v1
Type: Replicate
Volume ID: 97c9a8e3-90d5-49b0-aa59-187fc321c13f
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 10.0.72.5:/gfs/b1/v1
Brick2: 10.0.72.6:/gfs/b1/v1
Options Reconfigured:
performance.io-thread-count: 32
performance.flush-behind: on
performance.write-behind: on
performance.io-cache: on
performance.write-behind-window-size: 16MB
performance.cache-size: 2GB
performance.readdir-ahead: on
nfs.rpc-auth-allow: 10.0.xxxxx

[[email protected] v1]# dd if=/dev/zero of=testfile.bin bs=100M count=3
3+0 records in
3+0 records out
314572800 bytes (315 MB) copied, 0.343407 s, 916 MB/s

Use samsung evo 1TBx3 raid 5, using p420i array, all features turned off and no array caching. my gluster is 3.7x.

Sivabudh Umpudh

27-Nov-2016 at 12:07 pm

The performance problem with small files might just be gone at long last with the newly implemented feature (October, 2016) called “md-cache upcall.” See: http://blog.gluster.org/2016/10/gluster-tiering-and-small-file-performance/

Leave a Reply

Visit our advertisers

Search

Quick Poll

Which type of virtualisation do you use?
  • Add your answer

Visit our advertisers