qcow2 image format and cluster_size

  • 0

qcow2 image format and cluster_size

Get Social!

There are various things you need to consider when creating a virtual disk for a virtual machine, such as the size of the disk, if the disk is sparse or not, compression, encryption, and various other things.

Many of these things will depend on the type of load placed upon the disk, and the requirements that load has. For example, you can reduce the physical size of the disk if you use compression, and you can increase the security of your data if you use encryption.

Often the low level details of virtual disks are overlooked and left with the default values which may or may not be sensible for your scenario. One such detail is the qcow2 virtual disk property cluster_size.

A virtual disk, much like how operating systems treat physical disks, are split up into clusters; each cluster being a predefined size and holding a single unit of data. A cluster is the smallest amount of data that can be read or written to in a single operation. There is then an index lookup that’s often kept in memory that knows what information is stored in each cluster and where that cluster is located.

A qcow2 filesystem is copy-on-write (q’cow’2) which means that if a block of data needs to be altered then the whole block is re-written, rather than just updating the changed data. This means that if you have a block size of 1024 (bytes) and you change 1 byte then 1023 bytes have to be read from the original block and then 1024 bytes have to be written – that’s an overhead of 1023 bytes being read and written above the 1 byte change you created. That over head isn’t too bad in the grand scheme, but imagine if you had a block size of 1 MB and still only changed a single byte!

On the other hand, with much large writes another problem can be noticed. If you are constantly writing 1MB files and have a block size of 1024 bytes then you’ll have to split that 1MB file into 1024 parts and store each part in a cluster. Each time a cluster is written to, the metadata must be updated to reflect the new stored data. Again then, there is a performance penalty in storing data this way. A more efficient way of writing 1MB files would be to have a cluster size of 1MB so that each file will occupy a single block with only one metadata reference.

Testing qcow2 cluster_size

The below table shows how the cluster_size affects the performance of a qcow2 virtual disk image. The tests are all performed on the same hardware and on a single hard disk that’s on it’s own dedicated bus with no other traffic. The disk itself is a Samsung SSD. The tests are using the same size virtual disk image of 4GB provisioned with preallocation of full, encryption is disabled and lazy_refcounts are off.

Several qcow2 virtual disks have been created with varying cluster_size attributes and a single 134MB file has been written to each disk. The below table shows various statistics and timings resulting from each test.

cluster_size Time to create disk Time to write MB/ s
512 1m41.889s 2.69157s 49.9MB/ s
1K 49.697s 2.30576s 58.2MB/ s
64K 19.085s 1.69912s 79.0MB/ s
2M 1.085s 1.46358s 91.7MB/ s

 

The above table covers the smallest cluster_size of 512, the default of 64K (65536) and the largest possible of 2M.

 

As always, test and tune the parameters for your workload.


  • 4

qcow2 Disk Images and Performance

Get Social!

qcow2 is a virtual disk image format developed by the guys who created QEMU and is one of the most versatile virtual disk formats available. It’s the default and preferred virtual disk format for the Proxmox VE hypervisor and should be used for most virtual machines.

qcow2 offers the following features :

  • Sparse space allocation which means that the entire virtual disk size doesn’t need to be allocated on the hard drive when it’s created. Only the physical space needed by actual data stored to the virtual disk is required.
  • Snapshots can be stored and rolled back to thanks to the copy-on-write process which is used to write to qcow2 files.
  • Linked or chained files can be used. For example, a read only base file could be used to hold ‘system’ files (a gold plate image, if you will), and any changes could be written to an additional file leaving the original intact and unchanged. Multiple machines could use this base file at once, therefore reducing space requirements.
  • AES encryption can be used to encrypt all data at rest.
  • Compression, based on zlib, to reduce physical space requirements and reduce read bytes.

Because of all these features, qcow2 files have a processing overhead, when compared to raw files, in that any data read or written to a qcow2 virtual disk would have to go through a process that could slow the read or write operations. This means there is an overhead associated with IO operations on qcow2 files, again, compared to raw type storage that we have to consider when deciding which features to use.

Increase qcow2 Performance

Sparse Space Allocation

Anything stored on a virtual disk has to be, at some point, stored on a physical medium such as a hard disk. In addition to the data, a virtual disk has a small amount of metadata associated with it that is usually stored in the same file. For example, a virtual disk has no physical constraint on how large it can be, like a hard disk, and therefore this is one of the bits of data we need to store in the qcow2 file.

In addition to that, and just like a physical hard drive, data in a qcow2 file is stored in blocks or clusters and a lookup is required to determine what data is in which cluster. Think of this as a shelf full of numbered boxes, and having a book (or index) which tells you what each box number contains. All of this cluster information is also stored within the qcow2 file consuming disk space that is relative to the data capacity of the qcow2 file. For example, a qcow2 file that can store 1GB of data would have a much smaller metadata footprint than a qcow2 file that can store 100GB of data.

virtual-disk-data-size

Anyway, back to sparse files. The idea of a sparse file is to remove the need to allocate the full size of the file to a physical disk. I can, for example, create a qcow2 image with a data capacity of 10GB that will take up just several KBs of physical space until data is saved to the qcow2 image. As data is saved to the qcow2 image, the physical space used by the image will increase (the data has to be stored somewhere, right?). In addition, as will the metadata because each new cluster that’s required by the qcow2 file will have it’s own entry in the metadata section of the file.

qemu-img comes with various options for setting the allocation when creating new disk images.

  • preallocation=metadata – allocates the space required by the metadata but doesn’t allocate any space for the data. This is the quickest to provision but the slowest for guest writes.
  • preallocation=falloc –  allocates space for the metadata and data but marks the blocks as unallocated. This will provision slower than metadata but quicker than full. Guest write performance will be much quicker than metadata and similar to full.
  • preallocation=full – allocates space for the metadata and data and will therefore consume all the physical space that you allocate (not sparse). All empty allocated space will be set as a zero. This is the slowest to provision and will give similar guest write performance to falloc.

Example command:

The performance impact here is when the virtual image needs to grow in order to store new information written to it. For each new write a new cluster will need to be provisioned and a metadata index entry referencing the new cluster. Depending on the above option selected, the OS may have to allocate a new sector for both the index and the data cluster incurring a performance penalty. Once the disk has been expanded (e.g. or preallocation=full) then there is no penalty on assigning a new cluster as all the clusters are already assigned and available.

See qcow2 preallocation for some examples and benchmarks of the above attributes.

Encryption

qcow2 images are not encrypted by default, so not using encryption couldn’t be more simple. Of course, your data will not be encrypted (unless you use some other process on top of the virtual storage layer) but you’ll save all those CPU cycles when reading and writing the data.

Compression

qcow2 is, at best, a bit weird when it comes to compression (encryption works the same way, too!) in that compression is a one time event, or process that you run to compress an existing image. Any data written after this will be stored uncompressed.

The next thing is to understand compression itself – compression (under the right circumstances) will reduce the size of the data stored on disk at the expense of CPU to compress (one off) and decompress (every time the data is accessed) the data. In certain circumstances, compression can result in a quicker read for the process consuming the data, such as where CPU is abundant and IO bandwidth is very small.

As always, testing your scenarios is the best way to understand the impact.


  • 3

Access a qcow2 Virtual Disk Image From The Host

Get Social!

A disk image, such as the popular qcow2 disk image can be read and used as a file system without having to attach it to a running VM. That can be handy when you’ve got information on a backed up virtual image and don’t want to turn on a whole VM in order to access some data held on it.

If you’re using a host such as Proxmox then you’ll already have everything installed, but if you’re on some other Debian based system then you’ll need to install the required package:

What we’re trying to achieve is a standard mount point on the host that we can access like we would any other mounted block device. As you can imagine, it’s a little more tricky than just using a mount command along with a file name, but not by much.

Make sure that you have the required kernel module, nbd, loaded:

You should then find that you have plenty of object in /dev starting with nbd:

Each one of these devices is something you can use to attach a virtual image to, however you can only attach one image per device giving you a total of 16 images you can use at any one time.

Attach a qcow2 Virtual Image File

To attach an image file to one of these devices run the below command, substituting the nbd0 device and /var/lib/vz/images/107/vm-107-disk-1.qcow2 with your own values.

The device /dev/nbd0 will now contain the virtual image file as a block device and any partitions or volumes on the virtual image will be available for mounting.

You can check the partitions available on the virtual disk using your favorite partitioning tool, gparted, fdisk, etc:

Partitions are named slightly differently to what you may be used to. With a normal partitioned disk (with no LVM) you’d reference the first partition with /dev/nbd0p1. For example, using a mount command you might use the below:

If you use LVM on the virtual disk image then you won’t be able to mount the partition directly – you’ll need to use the vg suite of tools to detect the logical volume. Run the two below commands vgscan and vgchange as below to detect the logical volumes.

You can then use pvdisplay to find your volume name and mount it.

Detach a qcow2 Image Virtual Image File

Once you have finished with the virtual image file, you’ll want to detach it and release the nbd process used for IO operations for that image. Assuming that any mounts based on the image have been umount‘d use the qemu-nbd command with the -D switch:

You can also remove the kernel module if you’ve detached all of your virtual images from their /dev/nbdX device:

 


  • 0

Proxmox OpenVZ Container and KVM Startup and Shutdown Order

Get Social!

proxmox logo gradFrom time to time you will need to shutdown and startup your Proxmox hardware node. There are many reasons for this such as adding new hardware or to apply new kernel updates.

Before you shutdown the hardware node you must cleanly shutdown all OpenVZ containers and all KVMs. To avoid having to do this manually, Proxmox will issue all running containers and VMs the shutdown command as part of the hardware nodes shutdown procedure. Proxmox also gives us the option to specify which containers and VMs should be started when the hardware node is turned on. This means that the final part of the hardware nodes startup sequence will be to start your containers and VMs. Depending on your environment, this may need to be done in a specific order. For example you may wish your database VM or container to be started before your web application.

Enable automatic startup of containers and VMs

To enable the automatic startup of an OpenVZ container or VM you must specify the Start at boot attribute in the Proxmox web interface. Simply click on the Options tab of the container or VM and double click the Start at boot attribute.

proxmox-start-at-boot

Make sure the checkbox is ticked in the new dialogue box which pops up and click OK.

Your container or VM will now automatically be started when your hardware node next starts.

Startup Order for VMs

It is sometimes necessary to specify which containers and VMs startup first and how long the hardware node should wait until issuing the next startup command. This is easy for KVMs as this can be done through the Proxmox web interface.

Simply click on the Options tab of the container or VM and double click the Start/Shutdown order attribute.

proxmox-start-shutdown-order

You can specify the following attributes for the KVM:

  • Start/Shutdown order – this is the order the VMs will be started in. For example, setting this to 1 will mean that this VM will be instructed to start first. You can specify the same number for multiple VMs which means that all VMs of the same number will be instructed to start at the same time. The reverse order is used when the machines are automatically shutdown.
  • Startup delay – this is the time in seconds which Proxmox will wait until moving on to the next priority. If you set a VM with a Start/Shutdown order of 1 with a Startup delay of 30, Proxmox will wait 30 seconds until instructing the VM with a Start/Shutdown order of 2 to start.
  • Shutdown timeout – when a VM is asked to shutdown, an ACPI shutdown request is sent to the VM which should initiate the guests shutdown procedure. If ACPI requests are not supported by the guest, or an exception occurs during shutdown the process may not complete. In this case the Proxmox will wait for the Shutdown timeout threshold to pass before forcefully terminate the VM. If no value is specified then the defaults are used which are 60 seconds for Containers and 180 seconds for VMs.

Any VMs which have the Start at boot attribute enabled but no Start/Shutdown order attribute will be started after all VMs with a Start/Shutdown order attribute set have been processed.

Startup Order for OpenVZ Containers

Unfortunately Proxmox has not provided this functionality in the web based GUI for OpenVZ containers. To specify the order for a container we need to use the command line on the Proxmox hardware node.

Login to your Proxmox hardware node as the root user and issue the below command to set the machine startup order. You will need to change [PRIORITY] to the priority value to use and [VMID] to the ID of the container.

You can use vzlist if you do not know the [VMID] of the container.

You must have already enabled Start at boot for the container to start up. Any containers which have the Start at boot attribute enabled but no Start/Shutdown order attribute will be started after all containers with a Start/Shutdown order attribute set have been processed.


  • 1

How much disk space does Proxmox 2.x require?

Category : How-to

Get Social!

Proxmox can be installed on very little disk space. I tested Proxmox version 2.2 and installed it on a 10GB hard disk and Proxmox installed without an issue. On an earlier test, a 2GB hard disk was not sufficient and the installer showed an error.

The official webpage for Proxmox hardware requirements does not state a required or recommended size. I would suggest that more is better because each VM or OpenVZ container will be stored on your hard disk. If you have multiple hard disks, you should always put your OpenVZ containers and virtual disks on the fastest one. A common setup with just two disks is detailed below.

Disk A
  • Proxmox install
  • ISO images
  • OpenVZ templates
  • Backups
Disk B
  • OpenVZ containers
  • Virtual disks

This setup should reduce the time it takes to backup and create your OpenVZ containers and virtual machines as the reads and writes are on different disks. In addition it keeps your backups on a separate disk to the OpenVZ containers and virtual disks so if one disk fails, you will still have a copy.

For a test setup, two 250GB disks would be ample and enable you to try Proxmox and create multiple guests. For a production setup, the size of the disks would depend on the work load and volume of guests required. One thing is for sure though, faster is better. With SSD disks you will get the best performance with near zero seek time however they cost the most per gigabyte. For mechanical drives, use 15k rpm disks in a raid 0, or raid 10 for best speed and redundancy.


  • 0

Proxmox 2.2 is now available

Category : Tech News

Get Social!

A new version of Proxmox is now available to download. If you are new to Proxmox you can download the 2.2 version ISO from http://www.proxmox.com/downloads/proxmox-ve/iso-images/132-proxmox-ve-2 or you can simply upgrade your existing Proxmox 2.1 install.

How to upgrade from Proxmox 2.x

Before upgrading to Poxmox 2.2, make sure all your VM’s have been stopped. Run the below commands on each server in your cluster.

Restart all Proxmox servers to complete the installation.

New features of Proxmox 2.2 include:

  • Support for up to 32 KVM network adapters
  • Live snapshot support for KVM images
  • Updated qemu-kvm to 1.2.0
  • New PVE Kernel 2.6.32-042
  • New option for SCSI controller in KVM

You can see the full change log at http://pve.proxmox.com/wiki/Roadmap#Proxmox_VE_2.2

Take a look at the below screenshots for some of the visible changes.

SCSI hardware selection:

Snapshot manager