qcow2 image format and cluster_size

qcow2 image format and cluster_size

Get Social!

There are various things you need to consider when creating a virtual disk for a virtual machine, such as the size of the disk, if the disk is sparse or not, compression, encryption, and various other things.

Many of these things will depend on the type of load placed upon the disk, and the requirements that load has. For example, you can reduce the physical size of the disk if you use compression, and you can increase the security of your data if you use encryption.

Often the low level details of virtual disks are overlooked and left with the default values which may or may not be sensible for your scenario. One such detail is the qcow2 virtual disk property cluster_size.

A virtual disk, much like how operating systems treat physical disks, are split up into clusters; each cluster being a predefined size and holding a single unit of data. A cluster is the smallest amount of data that can be read or written to in a single operation. There is then an index lookup that’s often kept in memory that knows what information is stored in each cluster and where that cluster is located.

A qcow2 filesystem is copy-on-write (q’cow’2) which means that if a block of data needs to be altered then the whole block is re-written, rather than just updating the changed data. This means that if you have a block size of 1024 (bytes) and you change 1 byte then 1023 bytes have to be read from the original block and then 1024 bytes have to be written – that’s an overhead of 1023 bytes being read and written above the 1 byte change you created. That over head isn’t too bad in the grand scheme, but imagine if you had a block size of 1 MB and still only changed a single byte!

On the other hand, with much large writes another problem can be noticed. If you are constantly writing 1MB files and have a block size of 1024 bytes then you’ll have to split that 1MB file into 1024 parts and store each part in a cluster. Each time a cluster is written to, the metadata must be updated to reflect the new stored data. Again then, there is a performance penalty in storing data this way. A more efficient way of writing 1MB files would be to have a cluster size of 1MB so that each file will occupy a single block with only one metadata reference.

Testing qcow2 cluster_size

The below table shows how the cluster_size affects the performance of a qcow2 virtual disk image. The tests are all performed on the same hardware and on a single hard disk that’s on it’s own dedicated bus with no other traffic. The disk itself is a Samsung SSD. The tests are using the same size virtual disk image of 4GB provisioned with preallocation of full, encryption is disabled and lazy_refcounts are off.

Several qcow2 virtual disks have been created with varying cluster_size attributes and a single 134MB file has been written to each disk. The below table shows various statistics and timings resulting from each test.

cluster_size Time to create disk Time to write MB/ s
512 1m41.889s 2.69157s 49.9MB/ s
1K 49.697s 2.30576s 58.2MB/ s
64K 19.085s 1.69912s 79.0MB/ s
2M 1.085s 1.46358s 91.7MB/ s


The above table covers the smallest cluster_size of 512, the default of 64K (65536) and the largest possible of 2M.


As always, test and tune the parameters for your workload.

qcow2 Physical Size With Different preallocation Settings

Get Social!

The qcow2 image format is the defacto image format for KVM/ QEMU virtual machines. The format provides various parameters that can be configured when creating the image, each with their benefits and drawbacks.

The below section describes the preallocation attribute and how it can effect the size and performance of a virtual machine.

Please see this blog post for more information on preallocation, and then continue on to the results!

The below tests are all performed on the same hardware and on a single hard disk that’s on it’s own dedicated bus with no other traffic. The disk itself is a mechanical Western Digital Green 2TB. I’ve done it on this rather than on an SSD so that the results are more dramatic so that we can understand how much IO performance makes a difference. The tests are also using the same size virtual disk image of 4GB, encryption is disabled, cluster_size is the default 65536 and lazy_refcounts are off unless otherwise specified.

Virtual Disk Creation Time

The first example shows how long it takes to create each virtual disk image and how much physical disk space is being used/ reserved for the image.

preallocation setting Time to create Physical size on disk
off 0.312s 196K
metadata 0.507s 844K
falloc 0.015s 4.0G
full 39.402s 4.0G

As you can see, it takes a huge amount of time to use the full allocation setting because the filesystem it’s being written to has to assign the full size of the file and write empty data to it (in our case around 4GB). The least is taken up with falloc and that’s because qemu-img uses the underlying filesystems fallocate function to allocate the disk space without having to write data to consume the full size.

You can download the bash script used for the above test Disk Test preallocation Disk Size Script.

Virtual Disk Performance

The next thing to consider is the performance of each virtual disk type. For this test each virtual disk is mounted and written to using dd. The performance hit here is when the virtual disk has to expand and allocate physical disk space for new data clusters and new metadata, with metadata creation being by far the biggest overhead.

preallocation setting Time to create MB/ s
off 184.23s 729kB/ s
metadata 85.87s 1.6MB/ s
falloc 100.77s 1.3MB/ s
full 84.31s 1.6MB/ s

You can immediately see that virtual disks with no preallocation take by far the longest to write to, and virtual disks with full preallocation are the quickest. Interestingly a preallocation value of metadata is a very close second to full which indicates much of the performance hit is down to assigning and managing metadata.

You can download the bash script used for the above test Disk Test preallocation Write Performance.



Convert Virtual Disk Image VMWare VMDK to VirtualBox VDI

Get Social!

virtual-boxOracle’s VirtualBox can use a few different virtual disk types, however their own disk type is VDI (VirtualBox Disk Image). It’s not one of the most widely used formats so if you’ve downloaded a VM, it’s unlikely its disk in is VDI format.

Luckily with the tools from VirtualBox you’re able to move virtual disks between VMDK and VDI formats.

The VBoxManage command is the Swis Army Knife of tools to manage all things VirtualBox. The VBoxManage command will need to be in your path to be able to use it. The easiest way is to navigate to your VirtualBox installation. On Windows open a new Command Prompt and navigate to your VirtualBox installation directory

cd c:\Program Files\Oracle\VirtualBox\

Now run the VBoxManage command with the clonehd switch to create a copy of your VMDK in the VDI format.

Before you start, make sure you remove any snapshots on the source disk, and ensure it’s not attached to a running virtual machine.

Run the below command and substitute your input and output virtual disk image paths:

VBoxManage clonehd --format VDI server1-disk1.vmdk c:\vms\server2\server2-disk1.vdi

In the above example, change server1-disk1.vmdk to your input VMDK disk and c:\vms\server2\server2-disk1.vdi to the path you’d like to store the ouput VDI.

qcow2 Disk Images and Performance

Get Social!

qcow2 is a virtual disk image format developed by the guys who created QEMU and is one of the most versatile virtual disk formats available. It’s the default and preferred virtual disk format for the Proxmox VE hypervisor and should be used for most virtual machines.

qcow2 offers the following features :

  • Sparse space allocation which means that the entire virtual disk size doesn’t need to be allocated on the hard drive when it’s created. Only the physical space needed by actual data stored to the virtual disk is required.
  • Snapshots can be stored and rolled back to thanks to the copy-on-write process which is used to write to qcow2 files.
  • Linked or chained files can be used. For example, a read only base file could be used to hold ‘system’ files (a gold plate image, if you will), and any changes could be written to an additional file leaving the original intact and unchanged. Multiple machines could use this base file at once, therefore reducing space requirements.
  • AES encryption can be used to encrypt all data at rest.
  • Compression, based on zlib, to reduce physical space requirements and reduce read bytes.

Because of all these features, qcow2 files have a processing overhead, when compared to raw files, in that any data read or written to a qcow2 virtual disk would have to go through a process that could slow the read or write operations. This means there is an overhead associated with IO operations on qcow2 files, again, compared to raw type storage that we have to consider when deciding which features to use.

Increase qcow2 Performance

Sparse Space Allocation

Anything stored on a virtual disk has to be, at some point, stored on a physical medium such as a hard disk. In addition to the data, a virtual disk has a small amount of metadata associated with it that is usually stored in the same file. For example, a virtual disk has no physical constraint on how large it can be, like a hard disk, and therefore this is one of the bits of data we need to store in the qcow2 file.

In addition to that, and just like a physical hard drive, data in a qcow2 file is stored in blocks or clusters and a lookup is required to determine what data is in which cluster. Think of this as a shelf full of numbered boxes, and having a book (or index) which tells you what each box number contains. All of this cluster information is also stored within the qcow2 file consuming disk space that is relative to the data capacity of the qcow2 file. For example, a qcow2 file that can store 1GB of data would have a much smaller metadata footprint than a qcow2 file that can store 100GB of data.


Anyway, back to sparse files. The idea of a sparse file is to remove the need to allocate the full size of the file to a physical disk. I can, for example, create a qcow2 image with a data capacity of 10GB that will take up just several KBs of physical space until data is saved to the qcow2 image. As data is saved to the qcow2 image, the physical space used by the image will increase (the data has to be stored somewhere, right?). In addition, as will the metadata because each new cluster that’s required by the qcow2 file will have it’s own entry in the metadata section of the file.

qemu-img comes with various options for setting the allocation when creating new disk images.

  • preallocation=metadata – allocates the space required by the metadata but doesn’t allocate any space for the data. This is the quickest to provision but the slowest for guest writes.
  • preallocation=falloc –  allocates space for the metadata and data but marks the blocks as unallocated. This will provision slower than metadata but quicker than full. Guest write performance will be much quicker than metadata and similar to full.
  • preallocation=full – allocates space for the metadata and data and will therefore consume all the physical space that you allocate (not sparse). All empty allocated space will be set as a zero. This is the slowest to provision and will give similar guest write performance to falloc.

Example command:

qemu-img create -f qcow2 -o preallocation=falloc image.qcow2 1G

The performance impact here is when the virtual image needs to grow in order to store new information written to it. For each new write a new cluster will need to be provisioned and a metadata index entry referencing the new cluster. Depending on the above option selected, the OS may have to allocate a new sector for both the index and the data cluster incurring a performance penalty. Once the disk has been expanded (e.g. or preallocation=full) then there is no penalty on assigning a new cluster as all the clusters are already assigned and available.

See qcow2 preallocation for some examples and benchmarks of the above attributes.


qcow2 images are not encrypted by default, so not using encryption couldn’t be more simple. Of course, your data will not be encrypted (unless you use some other process on top of the virtual storage layer) but you’ll save all those CPU cycles when reading and writing the data.


qcow2 is, at best, a bit weird when it comes to compression (encryption works the same way, too!) in that compression is a one time event, or process that you run to compress an existing image. Any data written after this will be stored uncompressed.

The next thing is to understand compression itself – compression (under the right circumstances) will reduce the size of the data stored on disk at the expense of CPU to compress (one off) and decompress (every time the data is accessed) the data. In certain circumstances, compression can result in a quicker read for the process consuming the data, such as where CPU is abundant and IO bandwidth is very small.

As always, testing your scenarios is the best way to understand the impact.

Access a qcow2 Virtual Disk Image From The Host

Get Social!

A disk image, such as the popular qcow2 disk image can be read and used as a file system without having to attach it to a running VM. That can be handy when you’ve got information on a backed up virtual image and don’t want to turn on a whole VM in order to access some data held on it.

If you’re using a host such as Proxmox then you’ll already have everything installed, but if you’re on some other Debian based system then you’ll need to install the required package:

apt-get install qemu-utils

What we’re trying to achieve is a standard mount point on the host that we can access like we would any other mounted block device. As you can imagine, it’s a little more tricky than just using a mount command along with a file name, but not by much.

Make sure that you have the required kernel module, nbd, loaded:

modprobe nbd

You should then find that you have plenty of object in /dev starting with nbd:

ls /dev/nbd*
/dev/nbd0  /dev/nbd10  /dev/nbd12  /dev/nbd14  /dev/nbd2  /dev/nbd4  /dev/nbd6  /dev/nbd8
/dev/nbd1  /dev/nbd11  /dev/nbd13  /dev/nbd15  /dev/nbd3  /dev/nbd5  /dev/nbd7  /dev/nbd9

Each one of these devices is something you can use to attach a virtual image to, however you can only attach one image per device giving you a total of 16 images you can use at any one time.

Attach a qcow2 Virtual Image File

To attach an image file to one of these devices run the below command, substituting the nbd0 device and /var/lib/vz/images/107/vm-107-disk-1.qcow2 with your own values.

qemu-nbd -c /dev/nbd0 /var/lib/vz/images/107/vm-107-disk-1.qcow2

The device /dev/nbd0 will now contain the virtual image file as a block device and any partitions or volumes on the virtual image will be available for mounting.

You can check the partitions available on the virtual disk using your favorite partitioning tool, gparted, fdisk, etc:

partx -l /dev/nbd0

Partitions are named slightly differently to what you may be used to. With a normal partitioned disk (with no LVM) you’d reference the first partition with /dev/nbd0p1. For example, using a mount command you might use the below:

mount /dev/nbd0p1 /mnt/mntpoint

If you use LVM on the virtual disk image then you won’t be able to mount the partition directly – you’ll need to use the vg suite of tools to detect the logical volume. Run the two below commands vgscan and vgchange as below to detect the logical volumes.

  Reading all physical volumes.  This may take a while...
  Found volume group "pve" using metadata type lvm2
vgchange -ay
   3 logical volume(s) in volume group "pve" now active

You can then use pvdisplay to find your volume name and mount it.


  --- Logical volume ---
  LV Path                /dev/pve/myvolume
  LV Name                myvolume
  VG Name                pve
  LV UUID                jgok7M-c9x1-dTdt-PXXh-8NXf-BzgG-aRsaY7
  LV Write Access        read/write
  LV Creation host, time proxmox, 2015-04-06 20:28:28 +0100
  LV Status              available
  # open                 1
  LV Size                20.00 GiB
  Current LE             5120
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
mount /dev/pve/myvolume /mnt/mntpoint

Detach a qcow2 Image Virtual Image File

Once you have finished with the virtual image file, you’ll want to detach it and release the nbd process used for IO operations for that image. Assuming that any mounts based on the image have been umount‘d use the qemu-nbd command with the -D switch:

qemu-nbd -d /dev/nbd0

You can also remove the kernel module if you’ve detached all of your virtual images from their /dev/nbdX device:

rmmod nbd


Reclaim disk space from a sparse image file (qcow2/ vmdk)

Get Social!

western-digital-diskSparse disk image formats such as qcow2 only consume the physical disk space which they need. For example, if a guest is given a qcow2 image with a size of 100GB but has only written to 10GB then only 10GB of physical disk space will be used. There is some slight overhead associated, so the above example may not be strictly true, but you get the idea.

Sparse disk image files allow you to over allocate virtual disk space – this means that you could allocate 5 virtual machines 100GB of disk space, even if you only have 300GB of physical disk space. If all the guests need 100% of their 100GB disk space then you will have a problem. If you use over allocation of disk space you will need to monitor the physical disk usage very carefully.

There is another problem with sparse disk formats, they don’t automatically shrink. Let’s say you fill 100GB of a sparse disk (we know this will roughly consume 100GB of physical disk space) and then delete some files so that you are only using 50GB. The physical disk space used should be 50GB, right? Wrong. Because the disk image doesn’t shrink, it will always be 100GB on the file system even if the guest is now using less. The below steps will detail how to get round this issue.

On Linux

We need to fill the disk of the guest with zero’s (or any other character) so that the disk image can be re-compressed.

In a terminal, run the below command until you run out of disk space. Before running this, be sure to stop any applications running on the guest otherwise errors may result.

dd if=/dev/zero of=/mytempfile

Once the command errors out (this may take a while depending on your disk image size and physical disk speed) delete the file.

rm -f /mytempfile

Shutdown the guest and follow the steps below under All OS’s.

On Windows

You will need to download a tool called sdelete from Microsoft which is will fill the entire disk with zeros which can be re-compressed later.

Download: http://technet.microsoft.com/en-gb/sysinternals/bb897443.aspx

Once you have downloaded and extracted sdelete, open up a command prompt and enter the following. This assumes that sdelete was extracted into c:\ and c:\ is the disk you would like to use to reclaim space

c:\sdelete.exe -z c:

Once this completes (this may take a while depending on your disk image size and physical disk speed), shutdown the guest and follow the below steps under All OS’s.

All OS’s

The rest of the process is done on the host so open up a terminal window and SSH to your Proxmox host. Move to the directory where the disk image is stored and run the below commands.

Make sure you have shut down the virtual machine which is using the qcow2 image file before running the below commands.

mv original_image.qcow2 original_image.qcow2_backup
qemu-img convert -O qcow2 original_image.qcow2_backup original_image.qcow2

The above commands move the original image file, and then re-compress it to it’s original name. This will shrink the qcow2 image to consume less physical disk space.

You can now start the guest and check that everything is in working order. If it is, you can remove the original_image.qcow2_backup file.

Visit our advertisers

Quick Poll

Do you use ZFS on Linux?

Visit our advertisers