One of the most persistent axioms in business these days is “If you can’t measure it, you can’t monitor it.” While this may not be universally true, it nonetheless captures the importance of using data to glean the proper insight into today’s business processes and models.
But monitoring is proving to be a significant challenge in the emerging DevOps era, not only because the worlds of development and IT operations use different tools to peer into their respective environments, but in many cases they have completely different ideas of what success actually means.
This is leading to an explosion of DevOps-facing monitoring platforms, which in itself is a good sign that enterprises undergoing the transition to agile understand the importance of accurate measurement and analysis. However, the fundamental problem remains: How, exactly, can such a fluid, fast-moving work environment be measured accurately when the goals and objectives of individual projects vary so dramatically?
According to Kent Erickson, principal product manager at IT operations developer Zenoss, a top challenge is handling the speed at which DevOps processes occur. In the old days, new software products came out once a year, if that, and were generally implemented on a piecemeal basis. Nowadays, new products and upgrades are emerging weekly and are pushed out to all users automatically, which means the speed and depth of monitoring must be enhanced dramatically.
“Monitoring must become an automated step in the deployment process,” he said. “As soon as it sees code, sees new product behavior, it must make a scripted call to (reflect those changes).”
As well, DevOps monitoring processes must take into account that while IT administrators may be accustomed to multitasking, developers are not.
“Interrupts destroy developer productivity because it takes a long time to get back into the workflow,” Erickson said. “You can cut interrupts in half by feeding all issues into a service desk and then kick off scripts on a system that fixes them automatically.”
What to Measure?
Many organizations are also struggling with establishing the right metrics to judge DevOps projects. While traditional performance indicators like CPU usage, data throughput and storage consumption are still necessary, DevOps practitioners are also taking into account more esoteric elements like customer satisfaction, development lead time, and mean time to detection and recovery.
Ultimately, of course, each organization is likely to craft a unique monitoring environment based on the goals and objectives of their application load. As data environments sever their direct ties to the physical world of servers, networking and storage in favor of abstract, virtual infrastructure, expect to see a rise in highly customized applications and services that will require equally customized monitoring and management systems.
Here are 10 of the leading DevOps monitoring solutions on the market today:
Collectl folds numerous performance monitoring tools into a single platform. It can run as a daemon or interactively and can monitor a wide range of subsystems, including processors, storage, nodes, file systems and TCP. It runs on all Linux distributions and is available in the Red Hat and Debian repositories.
Consul by HashiCorp
Consul provides discovery, key-value storage, failure detection and other functions across multi-data center environments. It features a built-in DNS server for querying services and supports existing infrastructure without changing code. A simple HTTP-based API provides a hierarchical key-value store, while long polling enables near-instant configuration changes.
Ganglia uses a hierarchical design optimized for federations of clusters. It relies on common technologies like XML and XDR for data representation and transport, as well as a unique data structure and algorithmic approach to reduce overhead on the node and implement a high level of concurrency.
God leverages a Ruby framework to provide a simplified approach to monitoring. Although only available on Linux, BSD and Darwin systems, it provides a simplified way to write custom poll and event conditions. It also allows different poll positions to have their own intervals and provides an integrated and customizable notification system.
Icinga provides monitoring-as-code capabilities through object-based configuration or REST API provisioning. It uses SSL to maintain security in scale-out integrated cluster systems, while at the same time integrates with numerous leading DevOps tools and platforms.
Nagios provides server, network and application monitoring using a combination of agent-based and agentless software tools for Windows, Linux, Unix and web environments. The system provides availability, uptime and response using a variety of visualization and reporting formats, while community partners have already delivered more than 5,000 add-ons and plug-ins to the core software module.
New Relic offers deep analytics, performance monitoring and visibility to provide a clear picture of infrastructure conditions and application activity as it relates to business outcomes. The platform includes a browser-side visibility module, as well as optimization of Android and iOS applications. It also has a simulator to gauge the impact of changes before going live.
Specific to Unix systems, Monit provides automatic maintenance and repair and can even execute certain actions in error situations. It is particularly adept at monitoring boot-level daemon processes such as sendmail and sshd. It also oversees localhost files and directories to check for changes to timestamps, checksums and other functions.
Prometheus uses a dimensional data model in which time series are identified by metric name and key-value pairs. This provides for more robust queries for improved graphical representation and visualization. The system also uses a custom memory solution for storing time series data for improved scaling, sharding and federation.
Zenoss provides a wide range of monitoring tools in its Service Dynamics platform, ranging from hybrid infrastructure management to event management, root-cause analysis and service impact. The system integrates with a wide range of IT platforms, such as HPE, IBM and Cisco, as well as analytics and orchestration tools like Splunk, Chef and Puppet.
Arthur Cole writes about infrastructure for IT Business Edge. Cole has been covering the high-tech media and computing industries for more than 20 years, having served as editor of TV Technology, Video Technology News, Internet News and Multimedia Weekly. His contributions have appeared in Communications Today and Enterprise Networking Planet and as web content for numerous high-tech clients like TwinStrata and Carpathia. Follow Art on Twitter @acole602.