AUSTIN, Texas -- Kubernetes release 1.9, which ships this week, boasts better stability, but users have said it remains a challenge to operate.
The rapid growth of KubeCon, the annual gathering for Kubernetes users and open source contributors held here last week, demonstrates the container orchestration platform's ascension from obscurity to nearly mainstream. Attendance soared to 4,300, more than twice as many people as the previous year.
Updates introduced to projects maintained by members of the Cloud Native Computing Foundation (CNCF) during KubeCon were almost as numerous. In addition to Kubernetes 1.9, in which several core APIs reached stability, several utilities that underpin the platform have attained version 1.0 status. Among them are Docker's containerd container runtime project, the networking project CoreDNS, a distributed log data collection platform called Fluentd, and the Uber tracing platform known as Jaeger.
In the Kubernetes 1.9 release, a group of APIs dubbed the Workload APIs are now generally available: DaemonSet, Deployment, StatefulSet and ReplicaSet. DaemonSet is used to run a daemon on every node in a Kubernetes cluster, which is important for shared data stores and log collection. The Deployment API supports declarative updates to Pods and ReplicaSets. StatefulSets support long-running stateful applications with persistent storage, and ReplicaSets support their replication for disaster recovery and high availability.
With these updates, the maintainers of the core Kubernetes platform APIs have assured users they won't release changes that break fundamental functions in future versions. That, in turn, means the container orchestration platform is now ready for production use in enterprise IT shops, Kubernetes release team leads said.
"It's a big cue for enterprise users with reservations about moving production workloads to anything beta that Kubernetes is ready," said Jaice Singer DuMars, Kubernetes project ambassador at Microsoft. The move of Windows container support from alpha to beta with this Kubernetes release is another sign the platform is ready for mainstream IT use, he said.
However, users who've already deployed Kubernetes in production say that it's a powerful tool, but still urge caution -- and above all, attention to detail -- as their enterprise peers explore it.
Kubernetes release shows maturity, but not simplicity
Kubernetes has captured interest because it allows enterprises to solve thorny IT problems in a novel way. Kubernetes helped Gannett's USA Today Network stay on top of coverage during the night of the 2016 presidential election, for example, during which its web traffic surged to more than 83 million page views, an increase of more than 1000% over its typical daily traffic. The platform as a service (PaaS) team for USA Today also handled 170 deployments that night.
This year, as hurricanes Harvey and Irma struck the U.S. mainland, the platform accommodated more than 1,500 successful deployments, while daily costs for PaaS dropped by hundreds of dollars, and website deployments that each took two hours before Kubernetes were reduced to an average of 25 minutes.
"It was a transformative project for us," said Ronald Lipke, senior engineer in the PaaS team at Gannett, in a presentation at KubeCon. "We could not have made those [1,500] deployments the [old] way."
This Kubernetes release also addresses some tricky technical details that have frustrated experienced users at Nordstrom, the Seattle-based clothing retailer. Its engineers, who worked with Kubernetes prior to the 1.0 release, shared war stories in a KubeCon session called "101 Ways to Crash Your Cluster."
Kubernetes release 1.9 fixes an obscure issue that crashed the Nordstrom cluster this year and caused an outage in which "it was not simply out of service, but violently wrong," said Emmanuel Gomez, principal software engineer at Nordstrom. "Nothing made sense."
Emmanuel Gomezprincipal software engineer, Nordstrom
They traced the issue back to the nodes that ran the etcd database, which manages state for the Kubernetes cluster. The database entered something akin to* a split-brain scenario, in which data was not consistent among nodes because read operations sometimes returned stale data before Kubernetes 1.9. The new Kubernetes 1.9 release addresses this because it makes quorum reads the default behavior.
"Any number of small things could turn into a big issue" with Kubernetes at scale, Gomez said. And he urged fellow enterprise users to be vigilant with the platform's frequent updates: "Make sure you read the documentation like going to church on Sunday." That is to say, regularly and often.
Lipke's team has struggled to keep pace with frequent Kubernetes releases, and version 1.9 will probably take well into the first quarter of 2018 to test and deploy, he said.
"If we were working on just this, we could probably do it [faster], but we have a whole other platform to manage, plus our integrations with [HashiCorp's] Vault and Consul," he said.
Cloud portability compounds complexity
CNCF officials point to broad support for Kubernetes among cloud service providers as the cure for Kubernetes management difficulty. Enterprises without the hardcore expertise to deploy Kubernetes on premises will turn to cloud and managed service providers that offer turnkey Kubernetes services, said Chris Aniszczyk, COO of the CNCF.
But this won't address the popular enterprise hybrid cloud approach, which still requires on-premises Kubernetes deployments to support workload portability and unified management between public and private clouds.
The CNCF will work toward making cluster deployments more consistent across clouds with another feature introduced in the Kubernetes 1.9 release, called the Container Storage Interface (CSI). CSI will make storage providers' wares compatible with Kubernetes clusters, as well as those managed with Docker Swarm Mode, Mesosphere's DC/OS and Cloud Foundry, Aniszczyk said. Next year, cloud providers will also certify their Kubernetes offerings with the CNCF to ensure consistency between them.
Still, even when separate clouds share the same data center -- as Bloomberg did in an experiment that attempted to link a bare-metal Kubernetes cluster with one built on OpenStack -- networking bugs crop up. And while CSI opens the door for more storage providers to join the Kubernetes persistent storage market, it will take time to mature, said Steven Bower, search and data science infrastructure lead at Bloomberg, the global finance, media and tech company based in New York.
Enterprise IT shops also lack a solid open source cloud native distributed storage system on premises needed to run stateful Kubernetes workloads, said Andrey Rybka, technical architect in the office of the CTO at Bloomberg.
"Kubernetes doesn't have any awareness of infrastructure as a service," Rybka said. "You need some other provisioning tool below Kubernetes to handle resource utilization at the VM and physical machine level, and the public IaaS APIs are all different."
Enterprises often stick to cloud-hosted versions of Kubernetes, but there are kinks to be worked out there, too. USA Today ran into a bug with the Container Network Interface and Google Container Engine, which is now known as Google Kubernetes Engine (GKE), for example.
"If you set a 'cloud provider equals GKE' kubelet flag, which is required to get GKE-specific features like persistent volumes, and you're using a CNI plug-in, it will still try and create [network] routes in GKE, and you'll have nodes that come up, but there are no routes to them," Lipke said. "The workaround right now is to not set a cloud provider, which kind of doesn't work, because then we can't use persistent volumes or StatefulSets."
* information updated after publication