NEW YORK -- When a company modernizes its app development process, an IT infrastructure upgrade often follows.
That was the message from three household-name web companies that shared their infrastructure overhaul experiences here this week: Meetup, Netflix and Dropbox.
Meetup gets together with multiple cloud providers
Meetup is just 15 years old, but it became clear its IT infrastructure was from a bygone era, as the company recently looked to implement new application development and delivery pipelines. Its two geographically separate data centers with bare-metal servers lacked the flexibility of cloud computing, so Meetup started down the road of a radical IT infrastructure change.
The company wanted to design its new continuous delivery (CD) environment for developer self-service and empowerment, fast feedback and speed, said Yvette Pasqua, CTO for the New York-based network of local groups.
The process "didn't start off awesome," Pasqua confessed. "It took longer to get going than we thought it would."
Meetup eventually realized it had to change its organizational structure and dedicate groups of engineers to the project when fighting daily production fires got in the way of the IT infrastructure upgrade process.
Once that was done, the company quickly narrowed the field down to two finalists: Google Cloud Platform and Amazon Web Services (AWS). Pasqua said everyone was surprised by the outcome of the decision process about which cloud to implement -- the answer turned out to be both clouds would suit different application development needs.
Google Cloud Platform was found to be more appropriate for the company's new microservices-based CD pipeline that it also would convert from Jenkins to Travis CI. AWS, with its plethora of secondary services, was deemed the better bet for older, monolithic applications.
For new apps, Meetup wanted an immutable infrastructure as much as possible and a team focused on application tooling, rather than operations. Google Container Engine "ended up checking all those boxes for us," Pasqua said. Meetup also liked Google's strategic direction with regard to data-processing products, and its data pipeline performed faster than Amazon's Kinesis in Meetup's testing.
On the other hand, Amazon's onboarding and support for cloudifying current services made it a better fit for more mature applications, Pasqua said.
With Google, "we are charting our own territory," Pasqua said, but Amazon offered credits and engineering help to do the migration.
Secondary services relieve ops pressures
Meetup also converted from a self-managed MySQL database to Amazon Aurora, changed from a self-administered memcached deployment to Amazon ElastiCache and is using OpsWorks instead of rolling its own Chef environment. It uses CloudFormation for its "cattle" servers, which now form the majority of its infrastructure on AWS.
This was a major cultural change, in addition to the move to cloud and the adoption of continuous integration (CI), Pasqua said, but getting engineers' minds around the shift was well worth it.
"Managed services from cloud providers reduce our operations overhead and focus our engineers on making Meetup better," she said.
Meetup is running Docker on Amazon using the EC2 Container Service, about which the company initially was "hesitant and skeptical," but became less so once the AWS application load balancer became available last month, Pasqua said.
Some Google secondary services are also in use, such as Google Cloud Storage, as opposed to a three-way replicated Hadoop Distributed File System cluster for analytics logging and processing. Google's Cloud Dataproc service and Spark clusters speed data through the pipeline in hours, where it used to take a day, and Meetup no longer has to wait for a nightly batch jobs to push data to a warehouse, thanks to Google's Kafka-based data pipeline.
Before the IT infrastructure upgrade, "our engineers were really hungry" for a more flexible environment to develop in, Pasqua said. Meetup is now "quite happy" with its "unexpected hybrid" infrastructure.
Netflix Project Titus jump-starts container-based infrastructure
It may not seem like Netflix has much room for infrastructure revolution, given it pioneered many of today's cloud computing infrastructure and modern app-development techniques. The company's VM-based cloud infrastructure was also working handsomely before containers came along, said two of its engineers who also presented here at the Velocity conference this week on how containers added to an already-proven cloud architecture.
Containers are all the rage, but Netflix didn't jump on them right away, given its infrastructure was working well. After some investigation, though, the company decided containers could offer better quality of service for batch jobs, as well as improve code flexibility for services and form the foundation for a next-generation CI process.
Thus, the cloud bellwether embarked on Project Titus, a container cloud it designed to work with AWS Auto Scaling, Virtual Private Cloud and Identity and Access Management roles, alongside its internally developed Chaos Monkey reliability tool, Atlas monitoring utility and Eureka service registry.
Docker and Project Titus "helped generalize the use case" for batch data processing at an otherwise heavily services-oriented company, said Andrew Spyker, open source coordinator for Netflix, based in Los Gatos, Calif.
Previously, engineers who wanted to run batch jobs needed to know exactly what files and formats they had and fit them to a support list. Advanced scheduling was required, and the batch system initially ignored failures.
"That led us not to focus on reliability" or batch job processing, Stryker said. Containers have allowed Netflix to support many more batch scenarios, including fair scheduling.
Meanwhile, the developer side of the house is planning a new CI process that will result in something consistent with capabilities found in open source-based Travis CI and less complex than the current Spinnaker process, which largely supports Java apps. Containers will allow developers to use languages, such as Node.js, without having to worry about baked-in Java components in Amazon Machine Images, said Mike McGarr, developer productivity manager.
Members of the audience asked if Netflix would add Project Titus to its Open Source Software suite, but Spyker and McGarr said there are no current plans to do so and declined further comment.
Dropbox is no exception
Large-enough web companies often see economies of scale and unique scalability needs that prompt engineers to roll their own IT infrastructure utilities. Dropbox, which has written its own MySQL management and block data storage platforms to support a fleet of thousands of database servers and hundreds of petabytes of data in its eight-year lifetime, is no exception.
Still, Dropbox had advice in a session presented here at Velocity that applies to anyone looking to maximize database performance and improve its production-code rollouts.
In the area of performance, solid-state drives are the only way to fly in environments with lots of random data queries and sizable amounts of data. They can be more expensive than traditional spinning disks, but Dropbox has found their reliability is superior to hard disk drives, said Tammy Butow, site reliability engineering manager for Dropbox.
When rolling out InnoDB compression to further maximize storage space, Dropbox discovered the importance of matching full-production rollouts exactly to the process of partial canary deployments, Butow said.
In the case of InnoDB, the canary rollout worked perfectly, but there was a lag between steps when the changes were rolled out across the production environment, which caused performance issues.