qcow2 image format and cluster_size

  • 0

qcow2 image format and cluster_size

Get Social!

There are various things you need to consider when creating a virtual disk for a virtual machine, such as the size of the disk, if the disk is sparse or not, compression, encryption, and various other things.

Many of these things will depend on the type of load placed upon the disk, and the requirements that load has. For example, you can reduce the physical size of the disk if you use compression, and you can increase the security of your data if you use encryption.

Often the low level details of virtual disks are overlooked and left with the default values which may or may not be sensible for your scenario. One such detail is the qcow2 virtual disk property cluster_size.

A virtual disk, much like how operating systems treat physical disks, are split up into clusters; each cluster being a predefined size and holding a single unit of data. A cluster is the smallest amount of data that can be read or written to in a single operation. There is then an index lookup that’s often kept in memory that knows what information is stored in each cluster and where that cluster is located.

A qcow2 filesystem is copy-on-write (q’cow’2) which means that if a block of data needs to be altered then the whole block is re-written, rather than just updating the changed data. This means that if you have a block size of 1024 (bytes) and you change 1 byte then 1023 bytes have to be read from the original block and then 1024 bytes have to be written – that’s an overhead of 1023 bytes being read and written above the 1 byte change you created. That over head isn’t too bad in the grand scheme, but imagine if you had a block size of 1 MB and still only changed a single byte!

On the other hand, with much large writes another problem can be noticed. If you are constantly writing 1MB files and have a block size of 1024 bytes then you’ll have to split that 1MB file into 1024 parts and store each part in a cluster. Each time a cluster is written to, the metadata must be updated to reflect the new stored data. Again then, there is a performance penalty in storing data this way. A more efficient way of writing 1MB files would be to have a cluster size of 1MB so that each file will occupy a single block with only one metadata reference.

Testing qcow2 cluster_size

The below table shows how the cluster_size affects the performance of a qcow2 virtual disk image. The tests are all performed on the same hardware and on a single hard disk that’s on it’s own dedicated bus with no other traffic. The disk itself is a Samsung SSD. The tests are using the same size virtual disk image of 4GB provisioned with preallocation of full, encryption is disabled and lazy_refcounts are off.

Several qcow2 virtual disks have been created with varying cluster_size attributes and a single 134MB file has been written to each disk. The below table shows various statistics and timings resulting from each test.

cluster_size Time to create disk Time to write MB/ s
512 1m41.889s 2.69157s 49.9MB/ s
1K 49.697s 2.30576s 58.2MB/ s
64K 19.085s 1.69912s 79.0MB/ s
2M 1.085s 1.46358s 91.7MB/ s

 

The above table covers the smallest cluster_size of 512, the default of 64K (65536) and the largest possible of 2M.

 

As always, test and tune the parameters for your workload.


  • 1

qcow2 Physical Size With Different preallocation Settings

Get Social!

The qcow2 image format is the defacto image format for KVM/ QEMU virtual machines. The format provides various parameters that can be configured when creating the image, each with their benefits and drawbacks.

The below section describes the preallocation attribute and how it can effect the size and performance of a virtual machine.

Please see this blog post for more information on preallocation, and then continue on to the results!

The below tests are all performed on the same hardware and on a single hard disk that’s on it’s own dedicated bus with no other traffic. The disk itself is a mechanical Western Digital Green 2TB. I’ve done it on this rather than on an SSD so that the results are more dramatic so that we can understand how much IO performance makes a difference. The tests are also using the same size virtual disk image of 4GB, encryption is disabled, cluster_size is the default 65536 and lazy_refcounts are off unless otherwise specified.

Virtual Disk Creation Time

The first example shows how long it takes to create each virtual disk image and how much physical disk space is being used/ reserved for the image.

preallocation setting Time to create Physical size on disk
off 0.312s 196K
metadata 0.507s 844K
falloc 0.015s 4.0G
full 39.402s 4.0G

As you can see, it takes a huge amount of time to use the full allocation setting because the filesystem it’s being written to has to assign the full size of the file and write empty data to it (in our case around 4GB). The least is taken up with falloc and that’s because qemu-img uses the underlying filesystems fallocate function to allocate the disk space without having to write data to consume the full size.

You can download the bash script used for the above test Disk Test preallocation Disk Size Script.

Virtual Disk Performance

The next thing to consider is the performance of each virtual disk type. For this test each virtual disk is mounted and written to using dd. The performance hit here is when the virtual disk has to expand and allocate physical disk space for new data clusters and new metadata, with metadata creation being by far the biggest overhead.

preallocation setting Time to create MB/ s
off 184.23s 729kB/ s
metadata 85.87s 1.6MB/ s
falloc 100.77s 1.3MB/ s
full 84.31s 1.6MB/ s

You can immediately see that virtual disks with no preallocation take by far the longest to write to, and virtual disks with full preallocation are the quickest. Interestingly a preallocation value of metadata is a very close second to full which indicates much of the performance hit is down to assigning and managing metadata.

You can download the bash script used for the above test Disk Test preallocation Write Performance.

 

 


  • 3

Access a qcow2 Virtual Disk Image From The Host

Get Social!

A disk image, such as the popular qcow2 disk image can be read and used as a file system without having to attach it to a running VM. That can be handy when you’ve got information on a backed up virtual image and don’t want to turn on a whole VM in order to access some data held on it.

If you’re using a host such as Proxmox then you’ll already have everything installed, but if you’re on some other Debian based system then you’ll need to install the required package:

What we’re trying to achieve is a standard mount point on the host that we can access like we would any other mounted block device. As you can imagine, it’s a little more tricky than just using a mount command along with a file name, but not by much.

Make sure that you have the required kernel module, nbd, loaded:

You should then find that you have plenty of object in /dev starting with nbd:

Each one of these devices is something you can use to attach a virtual image to, however you can only attach one image per device giving you a total of 16 images you can use at any one time.

Attach a qcow2 Virtual Image File

To attach an image file to one of these devices run the below command, substituting the nbd0 device and /var/lib/vz/images/107/vm-107-disk-1.qcow2 with your own values.

The device /dev/nbd0 will now contain the virtual image file as a block device and any partitions or volumes on the virtual image will be available for mounting.

You can check the partitions available on the virtual disk using your favorite partitioning tool, gparted, fdisk, etc:

Partitions are named slightly differently to what you may be used to. With a normal partitioned disk (with no LVM) you’d reference the first partition with /dev/nbd0p1. For example, using a mount command you might use the below:

If you use LVM on the virtual disk image then you won’t be able to mount the partition directly – you’ll need to use the vg suite of tools to detect the logical volume. Run the two below commands vgscan and vgchange as below to detect the logical volumes.

You can then use pvdisplay to find your volume name and mount it.

Detach a qcow2 Image Virtual Image File

Once you have finished with the virtual image file, you’ll want to detach it and release the nbd process used for IO operations for that image. Assuming that any mounts based on the image have been umount‘d use the qemu-nbd command with the -D switch:

You can also remove the kernel module if you’ve detached all of your virtual images from their /dev/nbdX device: