First, there’s the availability of the physical infrastructure, the physical servers, the network and storage layers. Second is anxiety and perceived risk of having five, 10 or 20 virtual machines (VMs) running on single servers or blades that depend on the availability of the physical infrastructure. And finally, there’s the anxiety surrounding the availability of the services that run within the guest operating system inside the VM. To some degree, this availability anxiety has always been present in our industry—with customers worried about what their contingencies might be if a server dies or if a service stubbornly refuses to start. These anxieties predate the adoption of virtualization within most businesses. It is really the adoption of virtualization in the last 10 years that has introduced an entirely new anxiety.
Boosting consolidation ratios
At the heart of virtualization efficiencies is the consolidation of servers, and now increasingly desktops, into a small number of physical hosts. The more VMs you can run on one single piece of tin, the cheaper each VM becomes.
But with these greater consolidation ratios have come additional anxieties because we are creating more eggs in one basket. The easy way to avoid the eggs-in-one-basket anxiety altogether, of course, is to virtualize and have one VM per physical box. There are benefits to virtualization above and beyond consolidation, but most organizations want to have great consolidation ratios. They’ll also add availability technologies to their environments to protect themselves from a possible failure of their chosen platforms.
Consolidation ratios increase year by year, but the quality of the availability technologies may not be keeping pace with this growth. This has lead to a growing “availability gap.” The availability technologies are being outstripped by increasing physical resources (CPU/RAM) and more capable hypervisors being able to address those resources.
When considering availability, businesses become overly focused on merely buying technology to fix the anxiety they feel. Despite reminders from vendors that products are not silver bullets, customers go looking for this anyway.
What’s frequently overlooked is that failures that undermine availability are often caused as much by human error as much as they are by hardware failure. So when you’re building an availability solution, you must look at both the hard and soft processes that can undermine your good work if not properly validated.
Worse still is a business that wastes countless dollars in software or hardware availability solutions that don’t work when the business really needs them to. That can spell disaster.
Using technology to focus on only availability issues can lead to scenarios where customers use the wrong technology from the start to solve their problems. Rather than focusing on the best way to solve problems, some businesses embark on endless proof-of-concepts before carrying out a proper assessment.
It’s the initial—albeit somewhat tedious— evaluation process that counts rather than the sexier product installation and configuration. Without this initial assessment, you can find yourself in a situation where you are using the wrong technology to solve the problem.
I have frequently seen customers bend and twist technology to make it fit a job it wasn’t designed for. They then complain the technology is no good, in precisely the same way a bad workman blames his tools when he tries to use a screw driver to drive home a nail or a hammer to tighten up a nut.
Truthful answers to availability questions
The problem isn’t the technology but the person using that technology. In fairness, we don’t help ourselves when the IT industry¬—including ISVs, resellers, distributors and consultants—erroneously conflate availability technologies with disaster recovery technologies. Additionally, there is always commercial pressure to try to re-engineer Product A to fix Problem Z because customers are looking for a way to save money. But leaving these excuses to one side—without the initial upfront assessment, you could find yourself getting burnt.
For example, I’ve seen customers attempt to create “stretched clusters” using VMware’s High Availability technology. There is no shortage of storage companies that fall over themselves to sell the storage infrastructure to achieve this.
Technologies like NetApp MetroCluster and EMC vPlex are powerful solutions. VMware HA was designed to cover the failure of a physical server in a single site. It was never designed to be an availability solution that could be used across metropolitan distances.
Yet every week I am asked if it is possible. The answer to this question is yes, it certainly is possible to bend and twist the technology to work this way, but the bigger question is at what cost?
After completing this feat of re-engineering, would it meet what is called the Recovery Point (RPO) and Recovery Time Objectives (RTO) with this technology alone? The RPO and RTO concepts are all about what state your data is in when the recovery happens, and how long it takes you to get to that state.
It’s about objectives, not technology
Before you even think about selecting a technology, you have to ask yourself the right questions about what the availability objectives are. This allows you to map the correct technology to the problem. Once that process is completed you can focus on ensuring that your availability “insurance policy” will work. Consider some of the following questions:
- What do you want to protect—site, host, virtual machine or service?
- What level of availability do you want to achieve?
- How valuable is the system you are protecting to the business set against the cost of the availability solution?
- What state should your data be in if you have to make a claim on your insurance policy?
When thinking about application availability, it makes sense to address these service issues first. These often have the most extreme availability demands within the business.
ABOUT THE AUTHOR:
Mike Laverick is a professional instructor with 15 years of experience in technologies such as Novell, Windows and Citrix. He has also been involved with the VMware community since 2003. Laverick is a VMware forum moderator and member of the London VMware User Group Steering Committee. In addition to teaching, Laverick is the owner and author of the virtualization website and blog RTFM Education, where he publishes free guides and utilities aimed at VMware ESX/VirtualCenter users. In 2009, Laverick received the VMware vExpert award and helped found the Irish and Scottish user groups. Laverick has had books published on VMware Virtual Infrastructure 3, VMware vSphere4 and VMware Site Recovery Manager.