Machine learning is coming to the data center both to improve internal IT management and embed intelligence into key business processes. You have probably heard of a mystical deep learning, threatening to infuse everything from systems management to self-driving cars. Is this deep learning some really smart artificial intelligence that was just created and about to be unleashed on the world, or simply marketing hype aiming to re-launch complex machine learning algorithms in a better light?
It definitely fires the imagination, but it's actually not that complicated. At a technical level, deep learning mostly refers to large compute-intensive neural networks running at scale. These networks are often trained over big data sets that might, for example, include imagery, speech, video and other dense data with inherently complex patterns difficult for more logical, rules-based machine learning approaches to master.
Neural networks and deep learning themselves are not new. Almost from the beginning of the modern computer age, neural network algorithms have been researched to help recognize deep patterns hidden in complex data streams. In that sense, deep learning is built on familiar machine learning techniques. Yet the application of newer, more computationally complex forms of neural network algorithms to today's big data sets creates significant new opportunities. These "deep" models can be created and applied in real-time (at least faster than human time) at large scales, using affordable clouds or commodity scale-out big data architectures.
Impressionable neural networks
Neural networks were first explored back in the '50s and '60s as a model for how the human brain works. They consist of layers of nodes that are linked together like neurons in the brain into a large network. Each node receives input signals, and in turn, activates an outgoing signal sent to other nodes according to a pre-defined "activation function" that determines when that node should turn on. Basically you can think of how a node works in terms of excitement -- as a node gets increasingly excited by the combination of its inputs, it can generate some level of output signal to send downstream. Interestingly, a node can get excited and signal either positively or negatively; some nodes when activated actually inhibit other nodes from getting excited.
Nodes are interconnected by links that each have their own weight variable. A link's weight modifies any signal it carries. Neural networks adapt and learn to recognize patterns by incrementally adjusting their whole network of link weights, so that eventually only recognized patterns create a full cascade of excitement through the network.
Generally, input data is formatted into an incoming signal linked into a first layer of exposed nodes. These nodes in turn send signals into one or more hidden layers, with a final output layer of nodes assembling an "answer" to the outside world. As the learning (i.e., the intelligence) becomes embedded in the link weights, the key to practical use is figuring out to how to adjust or train all the hidden link weights to respond to the right patterns. Today, neural networks mainly learn to recognize patterns found in training data by using an incremental technique called back-propagation. This method proportionally "rewards" links when they contribute in a positive way towards recognizing good examples and penalizes them when they identify negative examples.
However there is no one right network architecture for any given problem. This is one area in which a machine learning expertise is invaluable, as there can be an infinite number of potential neural network configurations considering the number of nodes, their activation functions, the number of hidden layers and how all the nodes are interconnected (e.g., with dense or sparse links, with or without internal feedback or recurring loops, and so on). Traditionally, neural networks have been limited to only a handful of hidden layers, which can be surprisingly adept at learning patterns well beyond conscious human capabilities. However, deep learning and neural networks may potentially have hundreds of layers to fully capture deep subtleties.
The key to the practical application of deep learning is in figuring out how to effectively scale out large neural networks across hundreds or thousands of compute cores in parallel, and then efficiently train them on massively large data sets. This used to require unique high performance computing (HPC), beyond the scope of the enterprise data center. Today, companies like NVIDIA, Mellanox and DataDirect Networks are bringing HPC within reach of the corporate data center. For example, NVIDIA's DGX-1 box is basically a hyper-converged supercomputer designed for deep learning with eight high-end GPUs wrapped in a surprisingly affordable 4U appliance.
AlphaGo's neural networks and deep learning combo goes big time
Cloud vendors like Google also offer hosted machine learning tools. As an example, Google's AlphaGo game playing program recently beat a world-class Go champion at the highest level of play. The game of Go was thought to be one of the last frontiers where the unique capabilities of human intelligence couldn't be matched by a machine, given that Go can't be practically solved by simple brute force computing (fully computing the best move on a 19x19 Go board requires more computing power than is ever likely to exist). Instead, you could say the AlphaGo team took a shortcut by training its deep learning program to play only the best moves that any human player ever played before. The program was then able to get even better by playing another version of itself.
Under the hood, AlphaGo consists mainly of two large neural networks linked together (with some Monte Carlo simulation to prune any large set of "too many choices to compute" down to a smaller set of the more likely options). The first neural network was trained with millions of past game board positions so that it can identify the most likely "next play" (as made by a player who would go on to win the game). The second neural network was trained to estimate the value each new board position might have, basically rewarding positions that occurred more often in games that were eventually won. These two networks were used recursively (in combination) to look ahead a finite number of moves to help pick the most valuable current move. The bottom line here is that deep learning methods learned from the best players, and now can beat them in real time without brute force computation.
Deeply learning enterprise IT?
Deep learning programs can even surprise their trainers by learning sophisticated patterns from past data that work well in unexpected or seemingly orthogonal situations. But fundamentally, they can't actually predict a pattern they haven't been trained on -- they can only learn from situations it has encountered before. In addition, they can't explain what they've learned in logical terms or rules that can be easily extracted.
There are also limits in ultimate accuracy. With any machine learning technique, there is always a fine balance between becoming too specific (e.g., too closely memorizing exact historical data as a kind of lookup table) and remaining too general to be useful (e.g., simply giving the single most likely value, no matter the input). Data scientists strive to find an optimal balance for the specific problem at hand.
Deep learning can be incredibly useful anytime we have large amounts of training data available. Every day in IT we generate more and more machine data that could be used to develop useful intelligence. For example, in security applications, neural networks can learn to identify deep patterns signaling possible intrusion or hacking. Neural networks can even be trained on time-series data to learn the dynamic normal (and abnormal) behavior of both key workloads and resources. It's entirely likely that Google is exploring how to leverage AlphaGo-like capabilities to help avoid outages and optimize resource utilization across its cloud-scale infrastructure.
If you want to learn more about neural networks and deep learning, I recommend spending a few minutes playing around with the neat interactive example. We should all get prepared for the day when we will be able to wire our brains directly into the data center network!
About the author:
Mike Matchett is senior analyst at Taneja Group. Reach him on Twitter: @smworldbigdata.