alphaspirit - Fotolia
A new breed of management tools is gaining traction in IT organizations. Rather than imposing strenuous new tests or metrics, these emerging tools look at perhaps the most mundane aspect of systems and applications: the log files.
Despite the sophisticated management tools used in complex and demanding data center environments to look for potential problems, subtle cause-and-effect relationships are often impossible to divine, complicating everyday troubleshooting and optimization efforts. Log analysis tools help IT pros make better decisions faster in complex data center infrastructures.
Virtually all systems and applications produce log files. Logs are detailed, time-stamped waypoints of behaviors, conditions and events. In practice, individual log files have only limited value. But when tools automatically digest and compare a multitude of diverse log files, administrators can glean a new picture of important and problematic behaviors and offer insights that would have been impossible to discern with painstaking manual log file analysis.
An embarrassment of riches
Documenting key events in computer program execution is not a new idea. Programmers have used "output" statements (depending on the programming language) to report important events since the dawn of software development; saving each event to a file for later reference is a standard practice. Today, operating systems and applications routinely log all manner of events, along with positive and negative outcomes: A Web server might record each successful or failed page request; Microsoft Active Directory tracks user logon attempts or changes; a database server logs queries and results, and so on.
Simple logging has its limitations. Each log file only pertains to the related application or operating system, so an administrator needs to open and review logs individually. It's not hard to locate errors or critical events taking place in a particular log, but it's exceedingly difficult to identify cause-and-effect relationships between different systems or applications. Even the most skilled administrator doesn't have the time to sit and compare multiple logs side-by-side; it's simply not practical for humans to digest all of the activities taking place to see complex relationships.
Log analytics on the case
Troubleshoot: A fault, configuration change or other setting causes an erratic or seemingly random problem.
Correlate: Log analytics bring to light cause-and-effect relationships between disparate data center systems -- especially among heterogeneous systems lacking common management tools.
Secure: Find malicious users through access attempts or other suspicious activity.
Convince: Use log data on capacity and performance to justify architectural changes and upgrades.
Log management and analytical tools digest, process and report on all of the log data produced by hardware systems, operating systems, hypervisors and applications across the data center. The results of such analysis help inform critical decisions.
Log analytics help with event correlation and troubleshooting. For example, a change in a network switch configuration might cause storage subsystem timeouts for certain applications' users. Log analytics shows the point where issues began and notes any preceding events. This improves root cause identification and enables decisive fixes for erratic or seemingly disassociated problems.
"I'm starting to drum up [in-house] interest in IT operational analytics, where log file entries for infrastructure capabilities can be aggregated to identify broad issues that were only previously associated with a single server or single application," said Tim Noble, IT director and advisory board member from Reach IPS, a cloud and managed services provider in Cupertino, Calif. Noble expects to focus on troubleshooting and optimization efforts across a broad array of systems.
Security is another vital use for log analytics. For example, a new user added to Active Directory might coincide with a significant uptick in unauthorized storage access attempts. Log analytics can report those attempts and relate noteworthy events that provide important clues about security violations or lead to the identity of the malicious user. As another example, organizations subject to government or industry regulation can rely on log analytics to ensure compliance to regulations regarding security, system access, and so on.
Log reports and error messages also help justify capacity planning or architectural changes to maintain or improve service performance.
Tools of the trade
As log analysis tools proliferate, IT decision makers must select the best one or suite for their specific data center and business needs. This requires a careful consideration of each product's feature set and requirements.
"I evaluate [SaaS log tools] on price, security, reliability and features. I need to organize the data, create meaningful dashboards, interact via APIs and set up log monitoring and alerts," said Ben Whaley, a San Francisco Bay area technology consultant.
Points to consider when choosing the best tool include the following:
- Assess the analytical needs. Tools vary in their ability to ingest, parse and process log files, so decide which log files are required in the management and analytical initiative. Examples include operating system logs from Windows or Linux servers; Windows Active Directory logs; and network logs from DHCP servers, firewalls, VPNs, routers and switches. Log management and analytics tools should always be virtualization-aware. Security-focused log analytics might need to support endpoint security or identity authentication tools like lightweight directory access protocol, Trustwave Data Loss Prevention, Vormetric data security products and others. You may also need logs for specific business applications like Microsoft SharePoint, database platforms such as Oracle or SQL, electronic medical records and so on.
- Weigh analytical capabilities and reporting. Compatibility with current log files isn't quite enough. As log management and analytics tools proliferate, they will also specialize, which will matter to the IT team using it. For example, if security events recognition and investigation is your principle goal, consider a SIEM tool designed for security information and event management, rather than a tool with extra features around capacity prediction.
Also look at the way in which data is processed, accessed and reported. Some tools provide dashboard-style instrumentation while other tools generate detailed, formal and configurable reports.
Searchable analytics allow users to locate and correlate events on demand. "Integrating log analytics lets us identify both negative and positive events in the system using keywords like 'successful startup' and 'failed process' to assess the success of our upgrades and identify and track issues as they occur," said the CTO of one federal government contractor.
- Consider the platform approach. Some log management and analysis tools are installed locally like any other traditional management tool. Examples include ManageEngine's EventLog Analyzer, SolarWinds' Log and Event Manager and AWStats. With local installation users exercise direct control over the installation and behaviors associated with data collection, storage, processing and reporting.
A growing number of tools -- Loggly, Splunk, Sumo Logic, Sematext -- are available online as cloud services or software as a service (SaaS). These services work for a monthly fee without any of the hardware or IT staff overhead associated with installing or maintaining management tools.
Don't forget APIs, which enhance the interface between the tool and various logs or other applications in the enterprise.
- Check the tool's scalability. Regardless of the platform choice, the selected log management and analytics platform must support your operations' current and foreseeable future scale. To collect, store, process, correlate and report log data from tens of thousands of servers and devices carries different requirements than it does to manage only several hundred systems.
- Evaluate the installation requirements. Any log management and analytics software will impose computing overhead on the environment, so verify that you have ample server and storage resources to deploy the tool properly. As one example, SolarWinds' Log and Event Manager notes system requirements that include VMware ESXi 4.0 or Hyper-V Server 2008 R2 and later, a dual-processor server at 3 MHz, 8 GB of system memory and 250 MB of storage for the application. Increased data collection rates or deployments involving large data center infrastructures often impose greater requirements, such as more processors and additional network bandwidth.
Tools deployed as appliances are another option: SevOne's Performance Log Appliance, for example, essentially handles any hardware requirements within the appliance itself.
Avoid log analysis pitfalls
It sounds so simple: Just funnel all of your log files into a tool and analytical wizardry will reveal events and associations. Not so fast.
The most notable issue is log compatibility. Log file structures, formats, context and content vary substantially between hardware devices, operating systems, applications and other sources. One log management and analytics tool may struggle to open and ingest the dizzying array of log types and formats generated across the infrastructure. Assessing the organization's analytical needs up front reduces the chances of compatibility problems, but they still crop up.
"There are some issues with multi-line logging that most vendors don't seem to get right," said Whaley. "If a log statement comprises multiple lines, the message is split in the interface and it can be difficult to piece together."
A second issue occurs with underlying log message timestamps. Log tools typically depend on timestamps to correlate events in each log (especially if analytics are not occurring in real-time). Variations in clock coordination won't prevent serious events from being recorded or reported, but it might cause the log tool to miss possible cause-and-effect associations between logs -- one of the main reasons you deployed it. Check if you need to synchronize device clocks or use tools to normalize the time tracking between different logs.
A third limitation arises in the analytical results. You may not want to see every event; you may only want to see negative events, or to search for specific events. Limited or absent search capabilities can make it harder to find specific issues like storage errors or failed logon attempts, and so on. There is also no guarantee that when an issue is identified and an alert is produced that the tool will be able to provide actionable guidance. A log management and analytics tool is only useful if it can alert you to issues, help locate issues on-demand and provide actionable advice when issues are detected. Seeing correlated activities helps, but you might settle in for some serious troubleshooting or investigation to determine the actual problem.
Due diligence can thwart each of these potential issues. Test prospective log management and analytics products with free vendor demos and invest in long-term proof-of-principle projects to vet log tools carefully. Find the product or service that will deliver results that can best benefit your particular organization, and be sure to examine the product roadmap to see if future iterations of the log tool match your own data center plans.
Digging for gold
In the years ahead, expect more uniformity in log content, making tools even better at spotting and reporting problems.
"I expect to have automatic alerts when abnormal patterns are detected," said Whaley. "I would love to see some standards supported by all vendors that developers can then adopt within applications to make log messages universally understandable."
Log management and analytics tools aren't replacements for the rich suite of systems or infrastructure management tools that many IT organizations already use. But the potential to assess and utilize a wealth of existing log information -- often overlooked today -- is a tantalizing new capability that can coexist with current management deployments.
Cut down on your IT alerts daily
WaPo makes log analysis the top story
Hudl employs log analysis software to score points with teams
- Real-Time Performance Monitoring: Lowering Costs and Improving Quality –Vitria Technology, Inc.
- Connected Datacenter Monitoring and Real-Time Insights –Schneider Electric
- How to Transition from Annual Performance Reviews to Real-Time Feedback –Reflektive
- Real-Time Process Monitoring with the VIA IoT Analytics Platform –Vitria Technology, Inc.