qcow2 image format and cluster_size
Category : How-to
There are various things you need to consider when creating a virtual disk for a virtual machine, such as the size of the disk, if the disk is sparse or not, compression, encryption, and various other things.
Many of these things will depend on the type of load placed upon the disk, and the requirements that load has. For example, you can reduce the physical size of the disk if you use compression, and you can increase the security of your data if you use encryption.
Often the low level details of virtual disks are overlooked and left with the default values which may or may not be sensible for your scenario. One such detail is the qcow2 virtual disk property cluster_size.
A virtual disk, much like how operating systems treat physical disks, are split up into clusters; each cluster being a predefined size and holding a single unit of data. A cluster is the smallest amount of data that can be read or written to in a single operation. There is then an index lookup that’s often kept in memory that knows what information is stored in each cluster and where that cluster is located.
A qcow2 filesystem is copy-on-write (q’cow’2) which means that if a block of data needs to be altered then the whole block is re-written, rather than just updating the changed data. This means that if you have a block size of 1024 (bytes) and you change 1 byte then 1023 bytes have to be read from the original block and then 1024 bytes have to be written – that’s an overhead of 1023 bytes being read and written above the 1 byte change you created. That over head isn’t too bad in the grand scheme, but imagine if you had a block size of 1 MB and still only changed a single byte!
On the other hand, with much large writes another problem can be noticed. If you are constantly writing 1MB files and have a block size of 1024 bytes then you’ll have to split that 1MB file into 1024 parts and store each part in a cluster. Each time a cluster is written to, the metadata must be updated to reflect the new stored data. Again then, there is a performance penalty in storing data this way. A more efficient way of writing 1MB files would be to have a cluster size of 1MB so that each file will occupy a single block with only one metadata reference.
Testing qcow2 cluster_size
The below table shows how the cluster_size affects the performance of a qcow2 virtual disk image. The tests are all performed on the same hardware and on a single hard disk that’s on it’s own dedicated bus with no other traffic. The disk itself is a Samsung SSD. The tests are using the same size virtual disk image of 4GB provisioned with preallocation of full, encryption is disabled and lazy_refcounts are off.
Several qcow2 virtual disks have been created with varying cluster_size attributes and a single 134MB file has been written to each disk. The below table shows various statistics and timings resulting from each test.
|cluster_size||Time to create disk||Time to write||MB/ s|
The above table covers the smallest cluster_size of 512, the default of 64K (65536) and the largest possible of 2M.
As always, test and tune the parameters for your workload.