Manage Learn to apply best practices and optimize your operations.

VM management: How to walk the data center resource balancing act

Use these tools and techniques to improve VM management efficiency and address problems before they become a crisis.

Today’s data center must stay robust, efficient and – most of all – properly utilized. Idle resources within an environment waste money. But heavily used data centers with improper resource configurations create dangerous scenarios where a hardware failure could spell trouble for other physical hosts. The challenge for IT administrators is to manage and to utilize computing resources that span the entire environment, often including physical, virtual and cloud resources. In this tip, we’ll discuss resource planning and issue mitigation as means of optimizing resource use. We’ll also address how to quell issues before they become serious problems.

Best practices for resource planning
Almost all of today’s data center environments have or will have some form of virtualization deployed. This demands additional considerations when deploying a virtualized physical platform. We now have multiple virtual machines (VMs) relying on a single hardware platform for computing resources such as CPU, memory, and I/O for network – and sometimes disk – traffic. When it comes to resource management and spotting problems, proactive planning can help with resource utilization issues.

Resource load balancing
Whether working in a virtual or physical environment, it’s crucial to know what resources are being used and to which VMs those resources are being allocated. For example, if we have one physical host with 8 cores and 24 GB of RAM, an engineer would not want to use all of the available resources because there would be no room for adjustment or failure.

Figure 1 — Citrix XenServer 6.0 Enterprise Hypervisor is configured as a single host with local storage only. For small environments this works well; however, more hosts are required as businesses change.

As you see in Figure 1, a single XenServer host delivers several highly utilized workloads to the end user. Although the solution will work, it leaves little room to adjust for usage spikes or for the potential need for additional virtual servers. In the example above, engineers would have to remove virtual resources — RAM in this case — from other virtual machines to allow for growth on a single physical box. Physical resources must be made available to all VMs as required. Leaving room for emergencies or expansion purposes is a must in any environment. This is where load balancing and VM management comes into play.

Both a virtual and physical server must have the right type of resources assigned to it. When heavily used workloads are deployed, resources must be planned and delivered to each VM without affecting other virtual or physical workloads. Using the same example scenario, we can introduce a secondary physical host in Figure 2 with similar hardware specifications and begin to load-balance both the virtual and physical resource elements.

Figure 2 — Citrix XenServer 6.0 Enterprise Hypervisor illustrates multiple hosts configured within a pool. This pool shares resources because VMs are capable of moving live between the hosts across a storage area network (SAN) backbone.

In the new scenario, we have two physical hosts joined to a pool where resources are shared between VMs. Since every environment is unique, the process of load-balancing physical resources will require individual planning. In this example, the two physical hosts have extra resources available as required by the existing virtual machines. Extra CPU, RAM and storage requirements have been set for this specific environment so that VM agility can be guaranteed.  

Also, many environments will want to load-balance for high availability (HA). Using XenServer 6.0 as the sample hypervisor, built-in tools will help with this process. By going into the Pooled Server HA feature, administrators are able to see which machines are able to safely fail over to the other host. From there, engineers are able to determine how many resources are required to handle the load per physical box. The most important element to remember is these two servers are load balanced with resources available for new VM creation, failover and workflow automation.  In terms of HA, if one of the above physical servers fail, the other will be able to handle the critical virtual machines of the failed host.

When resources are balanced between machines, VMs have the ability to move as needed between physical hosts without affecting the current resource state. Consider disaster recovery (DR) as one possible example. If a physical host fails in this type of load-balanced scenario, VMs will migrate to the next available host where resources can be found. If either of these machines were completely utilized, DR and failover would be impossible because the other server simply would not have the resources available to support the influx of additional VMs.

Workflow automation
Many environments using both virtual and physical servers may require an element of workflow automation. For example, if one particular server is heavily utilized, new VMs will be spun up to help cover the existing server. Built on the Microsoft .NET Framework, Windows Workflow Foundation and Windows PowerShell, Citrix Workflow Studio allows engineers to dynamically create new virtual resources to respond to capacity needs, on-premise or off-premise. In this scenario, it’s important to have the proper resources aligned to the spare physical machines so new virtual machines will have RAM and CPUs available to them.

There will always be a need for new VMs to be created in an environment, and it will be up to administrators to apply the correct amount of resources to each new VM. Over- and under-allocating resources can waste both time and money. This is why having a strategic plan for VM management around existing resources is important. By knowing what is currently available within the data center, engineers can deliver workloads more efficiently.

This means that administrators will need to keep a keen eye on their physical and virtual environments and know how many users or machines can live on that host safely and efficiently. For example, consider a virtual desktop infrastructure. As users log in, they will begin to consume resources located on a machine being monitored in Figure 3.

Figure 3 — Citrix XenServer 6.0 Enterprise Hypervisor shows an isolated XenServer host utilized only for virtual desktop infrastructure. VMs are stored either locally or can be stored on a backbone SAN.

Currently, the machine shown in Figure 3 is not being heavily utilized. However, with an influx of users, that count can increase quickly and create problems for poorly balanced environments. Use this data to size virtual workloads accordingly. For example, setting a cap on resource use will allow a safe number of workloads to be launched on a physical box within a well-managed data center.

Working with resource alerts and alarms
Creating alerts and notifications within a data center will help to maintain a healthy environment and improve VM management. Catching an issue before it becomes noticeable to users or jeopardizes a service-level agreement (SLA) will help keep both physical and virtual machines within a data center running longer and more efficiently. From a resource perspective, leading hypervisors will have alerts and notifications that can be setup, such as in Figure 4.

Figure 4 — Citrix XenServer 6.0 Enterprise Hypervisor provides alerts that can be configured per VM and per physical host. In this case, alerts are set up for a Windows Server 2008R2 Enterprise Licensing Server.

With alert monitoring, engineers can setup CPU, network and disk alarms to warn of encroaching trouble, allowing technicians to mitigate resource issues before they affect the end user. Setting up resource alerts is crucial in the planning and deployment process. Many environments leave this for the last step only to run into resource-based issues quickly within their data center.

Using existing and third-party resource monitoring tools
Administrators often need to check on resources that directly affect a specific physical or virtual server. In these cases, there are great third-party granular tools capable of reporting on special database servers, cloud-based machines and other heavily utilized workloads. One such tool, up.time, by uptime software Inc., helps administrators monitor servers, virtual machines, the cloud, a colocation and more. Using up.time’s graphical server monitoring software, an administrator can graph and analyze all critical server resources running inside the data center independent of any operating systems that is being used. In-depth, granular monitoring of resources such as CPU, memory, disk, processes, workload, network, user, service status and configuration data can help engineers properly allocate and plan out their data center resources.

Another solid network monitoring tool comes from SolarWinds. The tool is called Orion Network Performance Monitor (NPM) and provides granular network traffic and performance monitoring. Data center engineers can use it to gather, track and analyze network information from SNMP-enabled devices.

 In addition, resource issues can usually be diagnosed and answered using onboard tools. For example, Resource Monitor, offered by the Windows OS platform, graphs resource utilization on a machine and shows an administrator how those resources are used.

Figure 5 — Resource Monitor is showing memory utilization on a Windows Server 2008 R2 Enterprise Exchange Server.

In the scenario of Figure 5, this server is having RAM issues with store.exe. Engineers should be aware that Exchange can be RAM intensive, so seeing this type of utilization is common. However, having this information, engineers are able to either add more resources to this box or offload some of its workload to other machines.

Natively, Resource Monitor has several tabs that help engineers probe their machines and see where resources are being used. Another example is network throughput. In Figure 6, we see a normally operating server. However, if there was a network spike we would be able to see the source then decide how to best mitigate the issue. To get a granular look, engineers are able to dive into the environment and create their own monitors to see where the data center is lacking or how well it’s performing.

Figure 6 — Resource Monitor is showing network traffic and utilization on a Windows Server 2008 R2 Enterprise Exchange Server.

Data center storage resource considerations
Storage resources can be very limited and expensive. Improper storage utilization can lead to performance problems and very costly resolutions. It’s always important to monitor how storage is being used in both a virtual and physical environment. Intelligent storage tools can help ease workload pains by consolidating data and delivering it efficiently. A major vulnerability of a data center SAN environment is usage spikes, such as when a large number of users access the system at any given time.

In these situations, disks become heavily utilized and performance can slow to a near halt. To combat this, SAN manufacturers look to solid-state technologies and intelligent deduplication-aware caching mechanisms to reduce performance bottlenecks.

Figure 7 — Graph shows how Flash Cache affects a NetApp 3000 series controller and its disk aggregate. This capture shows a total of 80 minutes of activity – with the first 20 being without caching.

In Figure 7, we see a heavily utilized workload being accessed by a number of users. The device in this example is a NetApp controller. You can see the difference in disk performance without cache and with onboard caching enabled. These types of data center efficiency solutions help keep an environment running longer and smoother. In this case, engineers will not have to buy a larger aggregate of disks for resource distribution. Rather, they are able to use their existing storage to more efficiently deliver large workloads to the end user.

Planning and attention pay dividends
Always remember that computing resources are finite. It can be extremely expensive to add resources when reacting to an unexpected event or shortage. This means administrators must keep a proactive eye on the entire data center environment and catch resource utilization problems before they begin to affect the workload or the end user.

About the author: Bill Kleyman, MBA, MISM, is an avid technologist with experience in network infrastructure management. His engineering work includes large virtualization deployments as well as business network design and implementation. Currently, he is the Virtualization Architect at MTM Technologies Inc. He previously worked as Director of Technology at World Wide Fittings Inc.

Dig Deeper on Configuration Management and DevOps

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.