May 31, 2023

SRM Feature Update - May 2023

Table of Contents

At Harness we’re gaining some great momentum with our Service Reliability Management (SRM) module. We’ve been at the major SRE and reliability events speaking to loads of great people in this amazing community. We’ve also had tons of customer interactions to help us move the product in the direction that adds the most value to our clients. Our product team has been heads down working on adding great new features and now we’re ready to show you the results. We hope you’ll love them as much as we do.

As a quick summary, we’ve built upon the existing foundational SLO management capabilities within SRM and created advanced functionality that’s required to support all of your reliability goals, hopes, and dreams 🙂

Here’s a summary of what’s been added:

  • Composite SLOs - combine multiple individual SLOs into a high level objective
  • Request Based SLOs - define your SLO objective for each request that your system receives
  • SLO Downtime - don't get penalized against your error budget during scheduled downtime or outside of business hours
  • SLO Reporting - identify trends and patterns in your SLO performance
  • Chaos Engineering Integrations - quickly identify that a chaos experiment was the cause of an SLO violation
  • Feature Flags Integration - immediately know the service impact of a feature toggle
  • Custom Change Sources - import change events from any source to understand the service impact

If you’d like to see these features in action, our SRM Staff Product Manager Shankar Hariharan has created a great demo video.

The value of our latest SRM features

Let’s explore how these features will impact your work life.

Composite SLOs - Creating individual SLOs is a foundational capability of any SLO management process. What do you do when certain services are more critical to your business than others? For example, a payment SLO might be deemed absolutely business critical and you don't want payment failures to happen, so you would assign a higher weight to that SLO than for a less important service like a chat service. Composite SLOs make it possible to align your SLO management process to the needs of your business.

Composite SLOs in Harness SRM
Composite SLOs in Harness SRM

Request Based SLOs - A common use case is to set an objective for the percentage of requests that are considered “good”. The formula for this type of SLO is (# good requests / # total requests) X 100. Harness already had time based SLOs which creates a ratio of good time periods to bad time periods and the addition of request based SLOs ensures that you can configure SLOs exactly as your needs dictate.

Request-based SLOs in Harness SRM
Request-based SLOs in Harness SRM

SLO Downtime - For certain services, your SLO metrics shouldn’t suffer during a planned downtime event or during hours when your service is not in production. Harness enables you to configure these time periods so that you don’t have to go adjust your metrics manually (automate away this toil).

SLO downtime in Harness SRM
SLO downtime in Harness SRM

SLO Reporting - There’s a lot of valuable information tucked away in those SLIs and SLOs. Today, Harness SRM comes with two out of the box dashboards. The SLO Health dashboard gives you a quick insight into your SLO health across different projects. Get this report emailed to you daily so that you don’t need to login and check the UI every day.

SLO Health Report in Harness SRM
SLO Health report in Harness SRM

The SLO Historical Performance dashboard helps you identify trends and patterns. For Example, you could identify teams that are complying with your error budget, or you could identify teams that are always out of compliance or are not meeting the error budget. This type of reporting enables you to structure incentives to reward or penalize teams accordingly (if that’s something your business wants to do).

SLO History report in Harness SRM
SLO History report in Harness SRM

Chaos Engineering Integration - Do you know when a chaos experiment has just caused a customer impact in production? If this happens, how long does it take your team to realize this and take corrective action? With Harness SRM’s latest Chaos Engineering integration you can know exactly what happened in a few minutes and take corrective action. This has 2 benefits; faster MTTR and further reduction of toil associated with manual processes.

Integration between Chaos Engineering and Harness SRM
Integration between Chaos Engineering and Harness SRM

Feature Flags Integration - Engineering team who create feature flags can turn on new features while a service is running in production. If these newly enabled features cause customer impact, it can take hours of manual exploration to figure out what happened. Now with the Feature Flags integration in Harness SRM, you will be able to identify the cause in a few minutes. More toil automated away by Harness.

Integration between Feature Flags and Harness SRM
Integration between Feature Flags and Harness SRM

Custom Change Sources - The first question that is always asked during an incident is “What changed?”. Those 2 tiny words can stymie a group of 20+ highly compensated IT professionals but knowing the answer is absolutely critical since at least 80% of production incidents are self-inflicted due to change. Harness already includes multiple different change sources but it’s impossible to cover them all. To accommodate for this, we’ve made it possible to import change events from any source so you will know in minutes exactly what changed, making it much easier and faster to restore service during an incident.

Conclusion

The latest updates to the Harness SRM module provide advanced capabilities that help SREs align their reliability goals with the nuanced realities of the business they support. Additionally, these updates help SREs troubleshoot issues faster and provide historical insights into how teams are performing relative to their reliability targets. Be sure to watch the video demo on this page and then request a personalized conversation about how SRM can help you achieve your reliability goals.

You might also like
No items found.

Similar Blogs

No items found.
Service Reliability Management