Photo by ROCCO STOPPOLONI on Unsplash

Photo by ROCCO STOPPOLONI on Unsplash

Getting your virtual machine dimensions right

Tailor Made

Article from ADMIN 84/2024
By
Massive, performance-hungry VMs require proper handling to meet their dynamic requirements. We give you some rules to help size these monsters properly.

Admins often face the challenge of meeting dynamic requirements. Although the trend is toward cloud-native applications and microservices – as seen primarily in the form of large numbers of comparatively small virtual machines (aka container hosts) – some applications still require massive, performance-hungry virtual machines (VMs, e.g., SAP HANA, an in-memory, column-oriented relational database management system). These monster VMs require proper handling.

Running large, resource-hungry, and monolithic applications such as Oracle, Microsoft SQL, and especially SAP HANA successfully on VMware vSphere is challenging, because they have many more factors to consider than do smaller VMs. Many of the configuration best practices mentioned in this article originate from the SAP HANA environment but are just as useful for other monster VMs and their applications. An in-depth report would go beyond the scope of this article, which is why the focus is on the most important rules. These rules come into play in all life cycle phases of a monster VM, which are difficult to create, manage, and move. The challenges include:

  • clean configuration of the physical server (BIOS, CPU, memory, network cards),
  • correct initial sizing (vCPUs, NUMA alignment, reserves for the hypervisor), and
  • vMotion (timing, duration, performance, negative effects).

Correct CPU Usage

The virtual CPU configuration plays a central role in the VM's performance. If you get the CPU configuration right, each VM will have the compute power it needs to perform its tasks efficiently without unnecessarily overloading the host's resources.

vSphere distinguishes between physical CPUs (pCPUs) and virtual CPUs (vCPUs). pCPUs sit in the physical server and are the hardware components that perform the compute operations. A pCPU socket can contain several CPU cores; each core can process two threads simultaneously by hyperthreading, leading to the first performance pitfall: Hyperthreading gives you two logical CPUs (the hyperthreads), which means that when hyperthreading is enabled, there are twice as many options for executing a thread as there are physical cores installed.

The hypervisor thus has the opportunity to schedule vCPUs better and run more smoothly. However, hyperthreading does not double performance, which is why the sizing of a CPU-hungry application should always be based on the physical core count and not on the number of hyperthreads, which usually correlate to the number of vCPUs available for the VM. In other words, the effect of hyperthreading on performance and performance enhancement needs some critical evaluation. In some cases, especially with applications that are heavily dependent on the CPU cache, hyperthreading can even cause performance losses, because several threads share the same cache. Thorough testing before commissioning and continuous monitoring in production are essential to ensure that hyperthreading offers the benefits you expect.

In contrast, vCPUs are an abstraction that enables a VM to use computing resources as if it were accessing the physical hardware directly. The number of vCPUs assigned to a VM should not be based on the maximum possible, but on the performance or scaling required (e.g., the number of parallel database users). Application vendors usually also have good sizing tools for this process.

Non-Uniform Memory Access

Non-uniform memory access (NUMA) optimizes efficient access to RAM and is a decisive aspect in the architecture of modern multicore and multisocket systems. In NUMA systems (multiprocessor systems, to put this simply), the RAM is divided into different areas, each of which is assigned to a specific CPU socket.

The combination of the processor (socket) and the directly connected main memory is the NUMA node. If a VM with its vCPUs resides in a socket, it can access the directly connected RAM with the highest bandwidth and the lowest latency, also known as NUMA locality. When a CPU accesses the RAM that resides behind the memory controller of another CPU, the RAM access latency increases. Figure 1 illustrates this for a two-socket system.

Figure 1: A two-socket system (two NUMA nodes) showing local and remote RAM access from CPU0.

Depending on the server type (two-socket, four-socket, or eight-socket system), different memory bus architectures are used, which causes different path lengths from the RAM of one CPU to another. In other words, the choice of server can directly affect the expected performance of the virtualized monster application.

vSphere supports NUMA in virtual environments by ensuring that the memory and CPU resources are used efficiently to maximize performance. A correct NUMA configuration can offer significant performance benefits, especially for large VMs with high memory and compute requirements. Regardless of the server type and size, various BIOS/UEFI settings are fundamental to high-performance monster VMs:

  • Enabling Turbo Boost (if configurable), which gives you a balanced workload on unused cores.
  • Hyperthreading for better CPU scheduling of the VMs.
  • Enabling NUMA by deactivating node interleaving in systems such as HPE servers.
  • Enabling extended CPU functions such as Intel and AMD virtualization (VT-x and AMD-V), extended page tables (EPT), no execute and execute disable (NX/XD), and rapid virtualization indexing (RVI).
  • Disabling all unused devices and BIOS functions (e.g., video RAM cacheable, onboard audio, serial ports, CD-ROM, or USB).
  • Switching off the "C1E Halt State."
  • Disabling "Enhanced C-States."
  • Configuring power management for high performance, which prevents power saving and the subsequent performance hits.

These host configurations lay the foundations for creating monster VMs.

Correct VM Size

The "bigger is better" principle is only partially true in the context of virtual machines. Of course, the aim is to make maximum use of the host and its CPU and RAM resources. Any SAP HANA or Oracle admin will also want to use as many CPUs, cores, and hyperthreads as possible for the VM – which is where the second big mistake happens.

The hypervisor also needs resources. The ESXi server's VMkernel has to take care of many elements because it provides the basic hypervisor functionality, such as CPU scheduling, RAM management, the virtual network (i.e., the standard vSwitch (VSS) or distributed virtual switch (VDS)), the storage stack (NFS, clustered filesystem (VMFS), virtual volumes (vVOLs), and virtual storage area network (vSAN)), including multipathing for block storage, to name just a few of the central components.

Another need might be for integration, such as with VMware network and security virtualization (NSX) and distributed firewalling (DFW). Finally, functions such as the distributed resource scheduler (DRS), high-availability (HA) clustering, and vMotion also need to be usable.

On top of this, each VM also requires VMkernel processes at runtime to monitor the CPU and RAM and map virtual network cards, virtual SCSI adapters, virtual graphics cards, and the like. If you are looking for a generic term, you can call this the virtualization overhead. CPU and RAM resources need to be reserved for the kernel processes and the virtualization overhead of the VMs (large or small); in other words, these resources must not be occupied by the VMs themselves.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs



Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>
	</a>

<hr>		    
			</div>
		    		</div>

		<div class=