Virtualization on Linux using the KVM/QEMU/Libvirt stack
Introduction
Virtualization is a great learning tool that provides a risk free environment to experiment with a multitude of topics. Among other things, it allows to experiment with different OSes without requiring new hardware. Additionally, in the case of Linux, one can build their own custom kernel and test it in depth. Finally, virtual networking can also be a great tool to learn about different networking protocols without the need for expensive networking equipment. I have known about these learning opportunities for quite some time, but never gotten into virtualization before. This was due to multiple reasons, chief among them, me not having hardware performant enough for the task. This situation has recently changed, and as I have detailed in my computer build series my current machine provides me with all the horse-power I will need for such an endeavor.
Virtualization under Windows seems to be an easy task, with a lot of the details being hidden away from the user and handled by very complete software suites (e.g. Microsoft’s own Hyper-V). On Linux, things are also more or less straight forward, and there is a multitude of tutorials online which explain in simple steps the few software component required for a complete virtualization setup. However, one common thing I found to be lacking in most tutorials was a simple explanation of why the different listed software components are required, and how they interact.
In this article I would like to cover some basic notions about one of the most common virtualization stacks under Linux. This stack is usually broken into three main layers/components: The hypervisor, the emulator and virtual machines management. In the following, we will highlight the different functions of each of these components, and explain how they interact with each other.
KVM: The hypervisor
What is a hypervisor?
Figure 1: The hypervisor in the virtualization stack
The first component of the virtualization stack we will be looking into is the hypervisor. A hypervisor, is a piece of software which allows to run multiple virtualized operating systems, referred to as virtual machines or VMs, concurrently on a computer. As illustrated in Figure 11, the hypervisor sits between the virtual machines and the actual physical machine, and manages system calls to the hardware.
Depending on their architecture, hypervisors can be one of two types:
- Type 1: So called “bare metal” hypervisors, sit directly above the hardware. Every operating system running on the system, will need to request access to the hardware from the hypervisor. In such a setup, the different virtual systems are run within domains (or doms). The first domain (dom0) is started first by the hypervisor after boot as it is used to manage the hypervisor itself, and consequently the other systems/domains. For this reason, dom0 is sometimes referred to as the Management Domain2.
- Type 2: Also known as “hosted” hypervisors, are designated as such because they run within another operating system which itself has direct control to the hardware. hypervisors of this type will share the resources of the host system with the OS they run on top of. This is in contrasts to hypervisors of the previous type which have full control over the hardware.
Figure 23 showcases the difference between the two hypervisor types.
Figure 2: Hypervisors: Type-1 (left) vs Type-2 (right)
In the following, we will learn about the software component which most commonly assumes the role of the hypervisor in the Linux virtualization stack. However, before we can do that, we need to learn about an important technology which is the main enabler of efficient virtualization on modern CPUs.
Hardware-assisted virtualization
During virtualization, when a guest systems is executed, some of the instructions it will issue can change the state of the host system. These are referred to as sensitive instructions1. It is critical for the CPU to be able to detect such instructions in order to instruct the hypervisor to take over. This is in order to avoid conflicting states with the host system or other possible guest systems. During the design of the x86 platform, virtualization was not a major consideration4. Due to their design, some of the sensitive instructions that could be encountered during virtualization would go undetected by x86 CPUs.
Hardware-assisted virtualization changed this, by adding a new CPU execution mode known as guest mode to complement the previously existing user and kernel modes. This mode would be used to execute the code of guest systems. Additionally, new CPU instructions that enable switching from and to this mode on the detection of sensitive instructions have been introduced1. When a sensitive instruction is detected, the CPU notifies the hypervisor, which in turn exits the guest mode and handles this instruction (in kernel mode). Once the instruction handling is done, a switch back to the guest mode is made and control is given back to the guest system. In comparison to its software based counterpart, this hardware-based virtualization comes with great performance advantages. This is due to detection/patching/execution of sensitive instructions being now completely handled in hardware instead of it being managed by software as was previously the case4.
Hardware-assisted virtualization has been part of all Intel and AMD processors produced after 20065. On Intel CPUs it has the name Intel-VT, whereas it is known as AMD-V on AMD processors.
KVM
KVM (Kernel-based Virtual Machine) is a Linux kernel module, which has been part of the mainline kernel since version 2.6.20 which was released on February 5th, 2007. When enabled, it allows the kernel to operate as a type-1 hypervisor. KVM relies on the host system’s CPU support for hardware-assisted virtualization. The main role of KVM is to handle switches to and from the guest mode on the host CPU during the execution of the virtualized guest systems. In doing so, KVM emulates virtual CPUs (or vCPUs) and presents them to the guest systems. To a guest system a vCPUs operates indistinguishably from real physical CPU. Behind the scenes, KVM makes sure of this through its handling of the different modes of the real physical CPU.
In the previous section, we mentioned that switches between guest and kernel mode occur on attempts of the guest system(s) to execute sensitive instructions which may cause the host system’s state to change. A good example of such instructions is I/O calls. Whenever a virtual machine tries to make such calls (e.g. by attempting to access a peripheral), KVM needs to step in and take over to handle that call. However, KVM does little in the way of I/O handling. In fact, KVM’s job stops at handing virtual CPUs to the guest systems and managing memory, opting to delegate I/O emulation (e.g. disks, serial ports, PCIe devices) to other more specialized components6. As such, when an I/O call causes control to be switched to KVM, all it does is give control to another component in the virtualization stack which handles I/O emulation. Once the I/O call is handled, control is returned to KVM and then to the guest system. Unsurprisingly, the component which handles this emulation is known as the emulator.
In addition to I/O emulation, there are other tasks entailed in the virtualization process which KVM leaves for other software components. Although, it emulates virtual CPUs to be used by guest systems, KVM does not initiate nor does it handle the guest processes that run on those CPUs. Additionally, KVM does not set up the the address space to be used by the guest systems. These tasks are delegated to another software component, the virtualizer.
In the Linux virtualization stack, the roles of emulator and virtualizer are taken over by one software component, which is the subject of the next section.
QEMU: The emulator/virtualizer
The homepage of the official website of the QEMU project describes it as:
A generic and open source machine emulator and virtualizer.
Perhaps this is a good time to clarify the difference between emulation and virtualization. The two terms are sometimes used interchangeably, as they achieve a similar result. Namely, the execution of a guest system on a host system. However, the way in which this end-result is achieved is different between the two processes:
- Emulation relies on the interpretation of the instructions of the guest system. These interpreted instructions are then translated into instructions compatible with the host systems, before being executed. The end effect being that the guest system’s behavior is emulated. Emulation achieves this without relying on any specific pre-requisite of the host hardware.
- Virtualization is built upon the creation of a complete virtual environment on top of the physical hardware of the host system. Instructions of a guest system are then passed down to this virtual environment and executed without interpretation. Virtualization requires support from the underlying hardware.
In the following we will learn how QEMU can play both roles of an emulator and a virtualizer, and how it integrates into the Linux virtualization stack.
QEMU as an emulator
An emulator is a piece of software which allows a computer system (the host) to behave as another system (the guest). An emulator can also be a piece of hardware, though that is not relevant to our context. Among other things, emulators allow to run software originally intended to run on a certain platform, to run on a completely different one (e.g. running ARM software on a x86 system). This is usually achieved using a translation layer which translates instructions from the guest system to ones that can be run on the host system.
As an emulator, QEMU allows to emulate a certain guest platform on a different host platform through dynamic binary translation. This means that QEMU can take-in instructions from the guest platform, and “on the spot” translate them to instructions which can be executed on the host platform7. QEMU emulation can be conducted in one of two modes:
- User-mode: Here the environment of a singular program is emulated.
- System-mode: In which a complete operating system is emulated.
On the x86 platform, in user-mode, QEMU can emulate a multitude of devices including CD/DVD-ROMs, serial and parallel ports, Disk interfaces and controllers, and a few sound and video cards. As an emulator, QEMU is used in the Linux virtualization stack to emulate these devices and make them available to the guest systems.
QEMU as a virtualizer
A virtualizer is a slightly more delicate term to define. Virtualizer is simply another name used to describe a hypervisor, a third term that is also used being Virtual Machine Monitor (VMM). However, As I have come to understand it, these terms describe different aspects of the tasks that the hypervisor fulfills:
- Hypervisor: this is the term most widely used, and it usually is used to encapsulate the two other terms. It deals with the lowest level of functionality, namely interaction with the hardware.
- Virtualizer: reflects higher level functions, namely managing processes of the virtual machines, assigning address space to the VMs and supplying them with a firmware image.
- Virtual Machine Manager: refers to the tasks of creating, managing and governing the virtual machines.
As mentioned before, the three terms are used exchangeably.
As we have explained in our discussion in the previous section, on its own, KVM is not a fully functional hypervisor. As a virtualizer, QEMU is used in the Linux virtualization stack to complement KVM. KVM provides the infrastructure required for hardware-based full virtualization, providing virtual CPUs and directly mapping those to physical CPUs through the use of hardware-assisted virtualization. However, KVM requires QEMU in order to start and manage the guest system processes, assign memory to these systems and provide emulated hardware to them. QEMU takes over the tasks of virtualization and acts as a type-2 hypervisor making use of the virtualization infrastructure made available by the type-1 hypervisor that is the Linux kernel with the KVM module. This operating mode of QEMU is referred to as KVM-hosted mode.
The KVM/QEMU model
With our current understanding of the operation of QEMU and KVM, we can now attempt to piece together the big picture of the KVM/QEMU model. The diagram in Figure 38, illustrates the integration of QEMU into KVM.
Figure 3: Inter-working of QEMU and KVM
The following are the main aspects represented in Figure 3:
- KVM operates from within the Linux kernel. Its main task is presenting virtual CPUs to the guest systems. These CPUs are directly mapped to the physical CPUs in the hardware layer, thanks to hardware-assisted virtualization.
- The virtual CPUs are to be used to run the code of the guest systems. However, it is up to QEMU to initiate the processes to run this code, and instructs KVM when to start it up. In addition to QEMU effectively managing the vCPUs, it also assigns memory space to the virtual machines, and handles emulation of I/O devices for the guest systems.
- Finally, one aspect we haven’t covered so far, is the communication between QEMU and KVM.
KVM creates the device file
/dev/kvm
, which QEMU uses to convey instructions to KVM1. To achieve this, QEMU makes use of theioctl()
system calls, which are part of the Linux kernel.
QEMU in itself is a simple command line utility which does not save/manage any VM settings or configurations (e.g. number of CPU cores to be used and memory size). Additionally, other than a window showing the VM’s video output, QEMU does not provide a graphical user interface9. These tasks are left to software components better suited for VM management. This is the topic of the next section.
Libvirt & virt-manager: VMs management
The final element in the Linux virtualization stack we will be looking at is Libvirt.
Libvirt is a library, which provides a tool-kit that includes a daemon (libvirtd
) and a management tool (libvirsh
).
This toolkit can be used to manage a multitude of supported virtualization platforms/hypervisors, including KVM/QEMU.
The features provided by libvirt include10:
- VM management: starting/pausing/stoping of the guest systems, their backup and restoration as well as migration are all possible.
- Remote support: Libvirt supports remote VM control through multiple network protocols. The simplest being SSH which does not require any specific configuration.
- Storage management: Different types of storage devices can be created, mounted and connected to the managed VMs.
- Virtual networking: Libvirt can be used to create virtual networks to provide VMs with customized network access.
As mentioned, libvirt uses a daemon/client model.
The libvirtd
daemon can be interacted with using a client, and libvirt ships with libvirsh
which is a command line client.
Other clients exist which can be used to control libvirt, one of the most popular is perhaps virt-manager which provides a very easy to use graphical interface.
Summarizing the virtualization stack
Figure 4 summarizes all what we have learned about the KVM/QEMU/Libvirt virtualization stack.
Figure 4: The KVM/QEMU/Libvirt stack
To summarize:
- KVM is the lowest component in the stack and as a Linux kernel module, interacts directly with the hardware. It makes use of hardware-assisted virtualization supported by the CPU, in order to allow hardware-based full virtualization.
- QEMU sits above KVM and complements its functionality. It manages the processes of the guest systems and emulates devices being passed to them.
- Libvirt is the library that is used in order to control QEMU and manage the virtual machines. It uses a daemon/client model.
- Virt-manager is a graphical client which can be used to interact with libvirt.
Conclusions
Through our discussion we learned about the different puzzle pieces that make up the KVM/QEMU/Libvirt virtualization stack. This is one of the most common virtualization stacks under Linux. Thus, having a basic understanding of the operation of its components and how they interact with each other will probably go a long way. Throughout this article, I have been refering to this stack as the Linux virtualization stack. It is true that the KVM/QEMU/Libvirt stack is the most popular for virtualization under Linux. However, it is important not to forget that virtualization can still be achieved under Linux using different components and a different stack (see for example the Xen hypervisor).
This article has not been an easy one to pin down, and I have had to go back an restructure it at least a couple times. At the start of writing, I had a false understanding of how QEMU integrated into KVM, and somehow the operation of both components did not make sense to me. It was only during my research as I was on my second attempt that I managed to find more relevant resources which helped me gain a better understanding.
I am still not completely satisfied with some aspects of this article. In particular, VM processes being managed by QEMU, yet at the same time handed down to KVM for execution on the hardware is still not very clear to me. I believe a better understanding of that would require me to look more into internals of the Linux kernel, which I have been wanting to do for a while now. However, that would be out of the scope of this article, which again ended up being quite long for the introduction it is supposed to be…So perhaps this can be the topic of a future article? In the meantime, I hope the explanations above in combination with the references below will help you if, like me you want to learn a tiny bit more about how virtualization works on Linux.
Until next time!
References
-
Yasunori Goto - Kernel-based Virtual Machine Technology (archived on 05.07.2022) ↩︎ ↩︎2 ↩︎3 ↩︎4
-
Oracle® VM Concepts Guide for Release 3.3 - 2.2.2.1 Management Domain (dom0) (archived on 26.12.2022) ↩︎
-
Mastering KVM Virtualization, Second Edition - Chapter 2: KVM as a Virtualization Solution ↩︎ ↩︎2
-
Wikipedia: X86 virtualization#Hardware-assisted virtualization ↩︎
-
KVM: Kernel-based Virtualization Driver - White Paper - Copyright © 2006 Qumranet Inc. (archived on 08.10.2007) ↩︎