The Xen community was very interested in (and a little worried by!) the recent performance comparison of â€Baremetal, Virtual Box, KVM and Xenâ€, published by Phoronix, so I took it upon myself to find out what was going on.
Upon investigation I found that the 3.0 Linux kernel used in Ubuntu 11.04 was lacking a rather key set of patches in domain 0 which inform the Xen hypervisor about the power management (specifically cpufreq scaling) properties of the processors in the system. Without these patches Xen will not make use of the highest performing CPU frequencies. These patches are in the process of being upstreamed to Linux but are already readily available and reasonably easy to apply to a 3.0 onwards kernel. You can find them at:
I reran the benchmarks presented by Phoronix in the following scenarios:
- Baremetal Baseline
- KVM Baseline
- Xen PVHVM Baseline
- Xen PVHVM Rebuilt
- Xen PVHVM CPUFreq
- Xen PVHVM 3.1+CPUFreq
The “Baseline” results are stock Ubuntu 11.04. “Xen PVHVM Rebuilt” is a straight rebuild of the stock Ubuntu 11.04 kernel (to rule out a simple rebuild impacting the results too much), “Xen PVHVM CPUFreq” is that stock kernel plus the cpufreq patches and “Xen PVHVM 3.1 +CPUFreq” is a mainline 3.1 plus those patches (only really included because that’s
where those patches were originally developed, comparing 3.0 and 3.1 is a bit apples and oranges). In all cases only the dom0 kernel was modified and the guest was always using the stock 11.04 kernel.
All test cases were run on the same hardware. The baremetal results used 32GB of RAM, 250GB disk and 16 cores while in all cases the virtual machines were given 24GB of RAM, 24GB of disk and 16 cores.
The Xen guest was using a “PVHVM†configuration, that is an HVM (fully-virtualised) guest making full use of paravirtualised drivers and PV extensions (PV timers, PV interrupt injection, all of which are enabled by default). The KVM guest was configured to use the virtio drivers for IO as well as any other paravirtualisation which is enabled by default.
Here are the raw results as reported by the Phoronix Test Suite:
The following table compares the baseline KVM figures (nb: the patches are to Xen specific code and will not impact KVM) to the “Xen PVHVM CPUFreq” case and tells a very different story to the numbers shown by Phoronix.
As you can see in many cases the results were very close (9/17 cases were +/- 1% in their respective comparison to native) and in the remaining 8 cases 4 favoured Xen and 4 KVM. Overall 7 cases favoured Xen and 8 favoured KVM with 2 having identical results. This is not surprising since many of the test cases are heavily CPU bound and you would therefore naturally expect that two virtualisation solutions making full use of hardware virtualisation facilities would be approximately equivalent.
I sent the above results and analysis to Michael Larabel, the Phoronix author of the article, on 17 November but have yet to hear any response. In the meantime he has posted another article containing results of a set of tests clearly chosen to highlight the power management impact of not applying these patches. It’s disappointing that Phoronix chose not to engage with the Xen community before publishing these results despite being contacted several times by a variety of people. Of course, we are not the only community which recently has been affected by unbalanced reporting (see “About the Kernel 3.0 “Power Regression” Mythâ€) and one would do well to think carefully about the reliability of performance measurements from folks who do not take minimal steps to understand or explain the results which they are seeing.
The full test results are available upon request. I won’t delve any deeper here since I don’t feel the kind of vacuous “analysis†performed by Phoronix really adds much to the raw data and there really isn’t much else to say about them.
In Summary
- The results published by Phoronix in â€Baremetal, Virtual Box, KVM and Xen†which favour KVM over Xen are caused by missing patches in the Linux Kernel.
- Patches which fix this issue are available. With those patches applied to the Dom0 kernel the performance measured using the Phoronix benchmarks are very similar on KVM and Xen.
- The behavior observed by Phoronix mainly applies to hardware which aggressively uses the power management capabilities of processors (i.e. Laptops are more affected than servers).