Harness Chaos Engineering (CE) - Key Capabilities

Let's take a deep dive into Harness Chaos Engineering’s key capabilities, including unified experimentation across SaaS and self-hosted deployments, chaos orchestration in CI/CD pipelines, steady statement management, and more.‍

Published on
11/9/22

Chaos engineering helps organizations minimize unplanned downtime's financial and reputational impact. It also lets developers focus on software delivery rather than fire-fighting production incidents. Chaos experiments go beyond traditional unit, integration, and system tests and more closely represent random failures in a real-world production environment. This realistic environment provides insight into how systems behave, equipping teams to understand applications' and infrastructure weaknesses and proactively creating resilience to help prevent costly downtime. This blog will look closely at the product’s key capabilities to see how it helps teams solve these challenges.

Harness CE provides: 

  • Unified experimentation across cloud providers and self-hosted platforms
  • Chaos orchestration in CI/CD pipelines for continuous verification
  • Steady statement management for baselining and improving reliability
  • Observability and ecosystem integration for visibility
  • Robust experiment control methods for safe testing and automatic recovery rollbacks
  • Enterprise dashboards, analytics, logs, and reports for clear communication
  • Enterprise-grade audit trail and role-based access control (RBAC) for security
  • Enterprise support to help businesses scale the practice quickly

Let’s dive deeper into the capabilities that teams can leverage to increase reliability.

Harness Chaos Engineering workflow


Key Capabilities

Unified Experimentation Platform

Implement chaos engineering using our SaaS, self-hosted, on-premises, or air-gapped deployments to align with your business and security requirements. Harness supports injecting experiments into multiple platforms and environments. The Enterprise ChaosHub is a catalog of advanced experiments with coverage across VMware, AWS, GCP, Azure, and a full range of Kubernetes chaos experiments. Chaos experiments enable users to manage, edit, schedule, and run experiments within the UI for improved collaboration. Harness provides the largest and most diverse chaos experiments available today, with many more added monthly.

Harness Chaos Engineering Enterprise ChaosHubs

Chaos Orchestration and Reliability Management

Chaos orchestration enables users to build a CE practice quickly by letting the Harness solution fill the gaps in the organization's knowledge, processes, and tools. Utilize Harness CE to train new and existing employees to level everyone up on software reliability. 

Roll out chaos engineering to the entire enterprise from a Git repository instead of waiting years to adopt the CE practice team by team. Start your entire enterprise on the chaos engineering practice to scale software reliability to every application. Leverage GitOps and CI/CD integrations to automate the complexity and meet developers where they are by providing declarative YAML files for chaos experiments that improve the developer experience.

The GitOps feature enables you to configure a single source of truth for your chaos experiments and execute them directly from Git, allowing a vast scope of automation in CI/CD pipelines.

A team can manage reliability through the resilience score to define, measure, and tune each experiment to track resiliency over time and automate experiment results.

Harness Chaos Engineering Reliability Score

Steady State Measurement

Rather than have developers manually look at monitoring dashboards and have “eyes on glass” with multiple browser tabs open, Harness CE provides probes that can automate the experiment's measurement. Probes are editable checks you can define for any chaos experiment to measure an experiment's success and failure conditions. Chaos Probe examples include simple querying of application health checks and system steady state metrics.

Harness Chaos Engineering Reliability Probes

Experiment Control Methods

Harness provides declarative chaos experiments to define configuration in a code repository, version, and edit through automation. This declarative approach empowers developers to build and automate reliability in their code.

Harness chaos engineering enables you to run faults in parallel (CPU fault + Memory fault) to mimic real-world events. In addition to this approach, you can run chaos experiments in parallel to model complex IT outages that often stem from multiple failure modes.

Run various experiments on different targets to simulate cascading failure across more extensive sets of services. This ability enables you to cause a network disruption on one cloud provider’s availability zone and simultaneously run a resource exhaustion experiment, simulating traffic moving over to the redundant system.

Lastly, you can abort an inflight experiment that causes an impact beyond the desired test expectation. Users can manually or automatically set up abort conditions using probes defined with the tested system's health metrics and automate recovery scripts.

Harness Chaos Engineering Declarative Experiments

Observability and Ecosystem Integrations

Harness CE can send chaos metrics to popular observability and application performance monitoring (APM) solutions that enable developers to integrate with their ecosystem of reliability. This reduces developer toil because Harness CE can plug into their system. Our list includes Prometheus, Grafana, Dynatrace, Keptn, and more. Besides observability and monitoring integrations, you can integrate with load-testing tools or leverage your own test with a custom script.

Enterprise Dashboards, Analytics, and Reports 

Different roles require additional views regarding dashboards and reports. Executives might want a high-level risk assessment on a single dashboard. An engineering manager might want to see the reliability status of all services. Regardless, Harness CE has all the experiment data, analytics, and reporting capabilities needed to be the centralized source for reliability.

Harness Chaos Engineering Enterprise Dashboards and Reports>

Enterprise-Grade Audit Trails and RBAC

Harness has built a reputation in the CI/CD industry for having detailed audit trails and fine-grained RBAC. These audit trails make it quick and easy for engineering teams to pass audits, often turning what would be days of effort into just a few hours. Our fine-grained RBAC model means that you can implement a permissions system that meets your organization's needs - no matter how complex.

Enterprise Support 

Harness recognizes that enterprises need to move fast and scale quickly to meet the demands of their business, so we’re equipped to offer enterprise support to ensure your chaos engineering practice can begin as quickly and safely as possible. Harness CE was built by the same team of experts that created the CNCF open-source project, LitmusChaos. This team is ready to support SaaS, on-premises, self-hosted, or air-gapped installations and provide onboarding assistance, feature enhancements, chaos best practices, and custom tooling integration for CI/CD and observability platforms.

Start Improving Software Reliability Today with Harness Chaos Engineering

Getting started with chaos engineering has never been so simple. If you are ready to see how your organization can adopt this practice and improve reliability, request a demo and sign up for the on-premises trial or SaaS version today!

The Modern Software Delivery Platform™

Loved by Developers, Trusted by Businesses
Get Started

Need more info? Contact Sales