December 13, 2018

Harness 24x7 Service Guard Empowers Developers with Total Operational Control

Table of Contents

When Harness came out of stealth, we entered the Continuous Delivery market with several unique capabilities. Our Smart Automation helped customers build deployment pipelines in minutes, and our Continuous Verification helped developers automate the verification and rollback of their deployments.

That's basically how we've helped our first 35 customers move fast without breaking things.

Today, we're announcing 24x7 Service Guard, which is basically Continuous Verification on steroids.

Harness 24x7 Service Guard is like developers having a dedicated bodyguard watching their production apps 24x7. If something bad happens, it will automatically roll back code changes and protect them.

Why 24x7 Service Guard?

Our initial Continuous Verification was focused on deployments or canary phases, analyzing the performance/quality of new code during the first 15-20 minutes of its life. Customers could customize this verification duration but it was always finite in scope. This capability was great at catching two thirds of performance anomalies and/or quality regressions because most applications fail within minutes of deployment.

24x7 Service Guard was created to catch the anomalies/regressions that surfaced many hours after a new deployment. Sometimes deployments are done out-of-hours when minimal traffic is using the app, or specific functionality in the app might not be accessed or stressed immediately by users.

At the same time, our customers were struggling with monitoring tool fatigue. They had one of everything to monitor different aspects of their application. In a microservices world, a customer could have tens of microservices with tens of different monitoring tools, logs, and instrumentation. Unifying these data sets is a huge challenge for developers and teams.

Catching post-deployment issues and tool fatigue are why we created the 24x7 Service Guard. We want to give developers total operational visibility of their production apps across all tools, and protect them when they weren't looking.

Powered By Unsupervised Machine Learning

Like Continuous Verification, our 24x7 Service Guard sits on top of all your APM, monitoring, and log tools. However, we've modified our unsupervised machine learning significantly to scale for 24x7 data streams.

We're still using the core algorithms such as Symbolic Aggregate Representation (SAX) and Hidden Markov-Models, but we're also applying entropy and several new neural nets so we can continuously learn and detect the unknown unknowns as well as reduce the false positives. Watch this webinar if you want a tech deep dive on our AI/ML.

Harness uses Harness for Continuous Delivery so we've been battle testing 24x7 Service Guard for some time, and refining its accuracy for several weeks.

Unifying APM, Log, and Observability Data

Simply add one or more of your monitoring tools to Harness in minutes by registering your tools' URL, API/Webhook, and login credentials.

Next, for each application in Harness, add the verifications you want for each environment (dev, QA, staging, production) and Harness will figure out the rest.

Once set up, click on the top Continuous Verification navigation tab and you'll see something like this:

24x7 Service Guard Screenshot

We can see above that 24x7 Service Guard is protecting the Web Online Application and is observing 4 monitoring sources (AppDynamics, Datadog, Splunk, and Prometheus) for the production environment. You will see the same view for every application and environment you enable 24x7 Service Guard for.

At a glance, developers can now observe the health of any service in any environment for any monitoring tool in seconds. Excuse me for a second, but that's pretty badass. I haven't even gotten to the best bits yet.

Users can select from several time resolutions: 12 hours, 1 day, 7 days, and 30 days.

Based on the data that 24x7 Service Guard observes from each monitoring tool, it will paint a heat map of service health for each time slice square. It will also show and correlate any deployments so users get full operational visibility.

Understanding Service & Business Impact

Traffic lights are an easy way to understand service health at a high level. With 24x7 Service Guard, developers can drill down beyond traffic lights into the business impact of a service in one click.

For example, if we click on the red time-slice highlighted below, we immediately see the business transactions in AppDynamics that are impacted along with related anomalies/regressions highlighted in red.

We can see below that the transaction /online/Payment is experiencing high response times:

The Impact of 24x7 Service Guard.

The insight of this drill-down capability is driven by the type of monitoring tool behind the data. For example, if Datadog was showing a service impact, 24x7 Service Guard would show the cloud infrastructure resource metrics that are anomalous. If Splunk was showing an impact, it would show the errors or exceptions that are causing the regressions and so on.

Think of the above as an easy way for a developer to immediately understand what is going on across their monitoring data sets.

Drill-Down To Root Cause With Context

24x7 Service Guard doesn't stop there. It gets better.

Harness also provides contextual drill-down that takes the developer from Harness into their monitoring tools in the context of the metric or event they are troubleshooting.

In the above example, a developer can click on the red /online/payment/ transaction and it will take them directly into the AppDynamics UI for the specific transaction and time period that was anomalous.

In one click you can go from this:

Drill Down in One Click.


to this:

Inside AppD.


24x7 Service Guard will do the same for all your favorite APM and Log tools.

This capability gives developers a unified view of their monitoring tools/data. It also allows them to take a shortcut to the root cause in just a few clicks. Sounds simple, but it's extremely powerful.

Automatically Roll Back Code Changes

The final part of 24x7 Service Guard is its ability to automatically roll back code changes (if needed) when the developer isn't looking. For example, let's imagine a developer performed a production deployment using Harness at 3 pm on a Friday. After 10-15 minutes of verification, everything from a performance and quality perspective looks good in Harness. Shortly after, the developer hits the bar and proceeds to drink 5 pints of Guinness. As the 5th pint of Guinness goes down, the application starts to grind to a halt. 24x7 Service Guard detects this performance anomaly, and as a precaution, automatically rolls back the application to its last working state.

The developer wakes up the next day with a hangover and notices an alert from Harness the night before. In one click, the developer launches Harness in the context of the performance anomaly and identifies which transactions were responsible along with links to the root cause of the performance issue inside their APM tool.

Supported Applications and Tools

Harness supports both non-container and containerized applications across all cloud providers and bare metal data center infrastructure.

We currently support AppDynamics, New Relic, Dynatrace, Datadog, and Prometheus for APM and time-series metrics. We also have an API to support custom time-series data. We also support Splunk, Elastic/ELK, Sumo Logic, Bugsnag, and Logz.io for log and event data.

Sign up for your free trial of Harness platform today and give 24x7 Service Guard a shot.

Platform