SAN FRANCISCO – One of the rock stars of the cloud and DevOps movement shared lessons learned from using Opscode's Chef configuration management tools.
Phil Dibowitz, systems engineer at Facebook, demonstrated at ChefConf here this week his company's use of Chef and other open source utilities to manage an environment consisting of multiple clusters of 10,000 or more servers each.
Dibowitz and Pedro Canahuati, director of production engineering at Facebook, sat down with SearchDataCenter for a more in-depth discussion about DevOps, advice for DevOps newbies, how Facebook selected Chef, and what's on the wish list for the tool.
What do you think of the DevOps term? Is it a real-world concept or an empty buzzword?
Phil Dibowitz: I think people like to make terms. People like to label stuff. But I think the concept underneath it is useful.
When I was a junior admin in the late '90s and early 2000s, someone said to me that the difference between a senior admin and a junior admin is that a senior admin can read and write code. I thought that was useful, and I had a computer science degree, and the concept of sitting in front of a huge body of code all the time did not sound interesting to me, but I could see why it was useful. I think that was kind of an early seed of what you see today, and people like to call it DevOps.
I call it being good at my job. Some people call it DevOps.
systems engineer, Facebook
At the end of the day, people who are in operations need to be able to read and write code; they need to be able to understand the applications they're supporting; but moreover, as we scale, if you can't write code, you can't automate effectively.
Logging into each machine by hand just doesn't fly anymore. We've gotten past that scale. So if you have tens of thousands of machines, SSHing into each one and running a command just isn't going to fly, but if you can automate that, then you can actually run that environment efficiently.
Everyone will give you a slightly different definition. That's the heart and soul of it, and I don't really care what you call it. I call it being good at my job. Some people call it DevOps.
Would you have any advice for beginners to the DevOps concept?
Dibowitz: The biggest thing is that it's a mind-set shift. If you're in a legacy infrastructure, it's really tempting to think about, 'I have my DNS server, and I have my mail server, and I need to log in to x server and do x thing.' And that mind-set doesn't scale. Because one day you need five DNS servers or 10 mail servers, or 30 Web servers -- or hundreds, or thousands.
Think about your environment, as opposed to your servers. Think about how you want to express the desired state of your world in a useful way that you can push that out, and always regenerate everything you have quickly and easily, in an automated fashion.
Phil, in your talk this morning you mentioned 'configuration as data.' What exactly does that mean?
Dibowitz: The reality of it is that if you want to delegate some subset of configuring a system or group of systems to a software developer, they don't necessarily know how to be a sysadmin, they don't necessarily know all of the Chef, Puppet or CFEngine bits, or whatever it is you're working with. What they do know is, 'I need x megabytes of this,' or 'I need my core files in a different directory' or whatever. And that's just data. 'I want this amount of this. I want this in this place.'
And so, if you can give them a way to express that in data, if you can give them a hash that represents settings of system controls or arrays of packages, you get to a place where developers can manage the pieces of configuration the application needs in their environment, in a way that works well with the thing they're used to; it's just code and data at that point, and every developer has worked with code and data.
Pedro Canahuati: I think to take his example even further, a developer maybe needs to know, 'I need more shared memory.' But he doesn't need to know, for example, that he's running on a Linux box, that the Linux system has a sysctl file that's in /etc, doesn't need to know that after you run that file you have to run a [command] to make those settings correct.
And so, what configuration as data does for us basically says, 'Here is a file that tells you [that] you want more shared memory. Here's one small portion of that hash and all you have to do is change it, and then the rest is magic underneath.' Developers love that because they don't need to know systems administration. They just have to know that they need more shared memory.
Did you do a bake-off between configuration automation tools?
Dibowitz: We did. We looked at Puppet, Chef and Spine. Spine you've probably never heard of, but it's a thing I co-wrote when I was at Ticketmaster. … Rather than give everyone their favorite tool, we gave everyone a tool they thought would fail, and said, 'You have to go and sell this tool as best you can.' We gave everyone a couple of weeks, and then we met and implemented a couple of different features -- sysctl being one of them, as you might imagine -- and our SSH configs, which are fairly complicated. And then we came back and looked at the code, we looked at the experience each person had trying to use it, and how well it would fit into our model.
All three tools were able to do the job, but in our case, the flexibility of Chef met our workflow needs best. You have the full power of Ruby to express and modify your configuration, but you also have the full power of Ruby to extend and modify Chef itself. We modified node.save, which is an internal piece of Chef -- we modified how that behaved.
We could bend the tool to what we wanted it to be, rather than bend our workflow to the tool. The other really big draw for Chef was, because everything happens in the client, using it in a server mode was an option for us, where it wouldn't have been for other tools.
Is there anything on your wish list for Chef?
Canahuati: One of the things we struggle with most is making sure that the changes that we make are not affecting what we don't expect to be changed -- the unintended consequences. Chef gives us the ability to test things, but I'd like that to be more transparent and easier to use at the beginning, where if I'm going to modify a cache server, I know I will not be inadvertently modifying a database server. That's a really, really hard problem to solve.