UPDATEd ON
5 Dec
2024
Chaos Platform
SaaS
<yes><yes>
<yes><yes>
On-Premises / Self-Managed
<yes><yes>
<no><no>
Air-gapped
<yes><yes>
<no><no>
Target Personas
Developers, QA engineers, and testers SRES devops platform engineers APIs
SRES
Primary Outcomes and Use Cases
Continuous resiliencetm continuous reliability management developer productivity digital transformation customer experience it disaster recovery
Static reliability management fault injection
Chaos Experiments and Faults
Chaos Faults
92+
11
Enterprise ChaosHub
<yes><yes>
<no><no>
Visual Experiment Builder with built-in YAML Editor
<yes><yes>
<no><no>
Multi-Cloud Support
<yes><yes>
<yes><yes>
Serverless Support
<yes><yes>
<no><no>
Bring Your Own Chaos
<yes><yes>
<no><no>
Observability and Steady Statement Management
Native Health Metrics
<yes><yes>
<no><no>
Resilience Probes
<yes><yes>
http comand Kuberneted Prometheus service level objectives
<yes><yes>
Status checks
Customizable Resilience Probes
<yes><yes>
Templated and fully customizable
<no><no>
Integrates with Leading Observability Solutions
<yes><yes>
Datadog, New Relic, Dynatrace, AppDynamics, Splunk, Google Cloud Operations, Prometheus
<yes><yes>
Datadog
Experiment Control Methods
Declarative Chaos Experiments
<yes><yes>
<no><no>
Run Chaos Faults in Parallel
<yes><yes>
<no><no>
Multiple Kubernetes Cluster Experimentation
<yes><yes>
<yes><yes>
Multiple experiment setup
Ability to abort experiment
<yes><yes>
<yes><yes>
Chaos Orchestration and Reliability Management
GitOps / Event-driven chaos
<yes><yes>
<no><no>
Scheduling Experiments
<yes><yes>
<yes><yes>
Export Chaos Experiments to ChaosHub
<yes><yes>
<no><no>
Reliability Management with a Score
<yes><yes>
Customizable weighted values per exeriment
<yes><yes>
Not customizable and scored per service
GameDay portal
<yes><yes>
<yes><yes>
Additional add on
Chaos Integration and SDKs
Continuous ResilienceTM
<yes><yes>
<no><no>
Automated Deployment Verification
<yes><yes>
<no><no>
Automated Rollback of Failed Deployments
<yes><yes>
<no><no>
Reliability Guardrails with CI/CD Pipelines
<yes><yes>
<no><no>
Chaos Integration with SLOs
<yes><yes>
<no><no>
Chaos Integration with Feature Flags
<yes><yes>
<no><no>
Pipeline Integration with API
<yes><yes>
<yes><yes>
Load Testing Integration
<yes><yes>
Custom
<yes><yes>
Grafana
API Support for Platform Self-Service
<yes><yes>
<no><no>
Python SDK
<yes><yes>
<yes><yes>
GoLang SDK
<yes><yes>
<no><no>
Ansible SDK
<yes><yes>
<no><no>
Administration, Security, and Governance
Comprehensive APIs
<yes><yes>
<yes><yes>
Built-in User Management and Authentication
<yes><yes>
<yes><yes>
Single Sign-On (OAuth 2.0, SAML, LDAP, 2FA)
<yes><yes>
<with><with>
Reporting
<yes><yes>
<yes><yes>
Role-based Access Control
<yes><yes>
<yes><yes>
Custom RBAC
<yes><yes>
<no><no>
Chaos Experiment Logs Availability
<yes><yes>
Automated into AWS S3 buckets
<yes><yes>
Manual
Full Audit Trails
<yes><yes>
2 years
<yes><yes>
Policy-Based (OPA)
<yes><yes>
<no><no>
Pipeline Governance
<yes><yes>
<no><no>
Integrated Secrets Management
<yes><yes>
<no><no>
IP Address Allowlist Management
<yes><yes>
<no><no>
Support
SLA Guarantee
<yes><yes>
<yes><yes>
Training and Support
<yes><yes>
<yes><yes>
Community Developer Hub
<yes><yes>
<no><no>
Unified Software Delivery Platform
<yes><yes>
<no><no>
Harness and Gremlin provide fault injection and chaos engineering capabilities. Harness Chaos Engineering (CE) is designed to facilitate greater collaboration between SREs, QA engineers, and developers while automating many actions related to resilience testing.
Gremlin rebranded into a Reliability Management vendor that provides failure testing strategies for a strict list of use cases. Gremlin’s reliability management features can be useful to SRE teams getting started with chaos engineering. Still, it does not provide automation across the software delivery lifecycle to scale out enterprise reliability, as evident in the following analysis.
Harness supports SaaS and self-managed deployment models, enabling businesses to deploy a chaos platform across every architecture, including high-security platforms with air-gapped solutions. This enables our primary users to be developers, QA Engineers and Testers, SREs, DevOps, Platform Engineers, and APIs. Our fault injection tool allows outcomes and use cases for Continuous ResilienceTM, Continuous Reliability Management, Improve Developer Productivity, Accelerate Digital Transformation, Improve Customer Experience, and IT Disaster Recovery Automation.
Gremlin offers a SaaS version of its product, primarily supports SREs, and uses cases such as reliability management and fault injection. The reliability management platform provides static scheduled snapshots of measured reliability on a defined service.
Harness has 92+ experiments to run out of the box, enabling teams to scale quickly across all platforms. Each month, Harness releases more to support new platforms and new failure modes to test. Harness provides an Enterprise ChaosHub that enables users to organize their experiments as well as quickly execute tests and directly schedule from the ChaosHub. The experiment workflow is customizable with a visual workflow builder and a built-in YAML editor that enables flexibility for the user to work.
Harness also supports various experiments across VMware, GCP, Azure, AWS, Kubernetes, and Serverless enabling enterprises to build resilience across multiple environments no matter where their developers work. For platforms not supported, we offer a “Bring Your Own Chaos” experience.
Gremlin has 11 different types of faults that are flexible to run on many platforms. Still, you need to configure them specifically for the type of experiment you want to build requiring a lot of infrastructure knowledge and associated failure modes.
Harness Chaos Infrastructure includes native health metrics for Kubernetes to ensure your infrastructure and experiment are in a healthy state of operations.
Harness has five resilience probe types that are highly tunable and enable custom configuration for various environments. The probes allow engineers to query HTTP responses, Kubernetes services, Prometheus metrics, and a command probe that can be configured with a custom script. Resilience probes can also be configured to run before, during, and after the execution of an experiment so a user can monitor the precise impact on their system.
Harness has connectors to integrate with other observability tools like Datadog, New Relic, Dynatrace, AppDyanmics, Splunk, Google Cloud Operations, and Prometheus. These provide a full integration experience to remove the cognitive load from complex manual integration setup.
Gremlin has a status check feature that allows users to manually set up a health status. It has a basic configuration and doesn’t allow the user to be flexible in monitoring experiments.
Harness has ensured that we support cloud-native platforms by having a declarative approach to an experiment supporting YAML. We also enable users to run experiments in parallel on multiple Kubernetes clusters and programmatically abort experiments safely.
Gremlin has had a basic attack and scenario control method. It provides serial testing built to support traditional testing but not cloud-native ephemerality. The ability to abort a test is available.
Harness provides multiple orchestration methods, including native integration with Harness CD and a simplified API for integration with other CI/CD tooling. GitOps also uses event-driven chaos injection in which target resources can be configured to automatically trigger chaos experiments with any changes in the resource specification. Currently, the events supported for chaos injection are resource image change, configuration change, change in replicas, and many more. The event-driven chaos injection allows Harness to be integrated with traditional GitOps flow that involves automated deployment of applications or workloads. Chaos experiments can be automatically triggered based on your organization’s needs.
Harness also provides simple experiment scheduling and the ability to export an experiment to a new ChaosHub to share experiments with other teams.
Harness provides a resilience score per experiment that can be configured specifically to your needs. Each score can have custom weighted averages configured so you can prioritize which experiments drive the measurement of the score. This can also be leveraged in the GameDay feature
Gremlin provides an API integration to orchestrate experimentation automatically. They have manual scheduling that is simple to turn on.
Harness CE features are built to function within the entire software delivery lifecycle. The native integration with Harness CD is an industry first for enabling developers to automate chaos experiments in a pipeline to achieve Continuous ResilienceTM. For users that use the module standalone, the API is still simple to use with other CD tools.
The native platform integration for Harness also gives you a CD experience that provides automated deployment verification, rollback of failed deployments, and reliability guardrails to make Service Level Objectives and impact from a chaos experiment visible.
Harness has many integrations, such as Dynatrace, Datadog, and New Relic. New integration connectors can be quickly built, and the support keeps improving. We can also support integrations with Feature Flags, other SLO providers, and load-testing tools.
Gremlin has an integration with Datadog for experiment workflows, Jira for ticket creation, and Grafana K6 for load testing. There is no native integration with CI/CD tooling but you can leverage their API to run experiments.
Harness officially supports Python, GoLang, and Ansible SDKs, allowing users to build custom experiments and API wrappers that can be used with custom self-service enterprise solutions. This is important for enterprises building platform solutions as you can easily wrap the Harness API with an SDK to make it self-service.
Gremlin has unofficial SDK support for Python.
At Harness, we take security extremely seriously, as evidenced by our DevSecOps approach. We took the same views with Chaos Engineering, ensuring RBAC and SSO were available from inception, with more controls being added as security postures get updated. We have a comprehensive API to automate every task. Chaos experiment logs can automatically be backed up to an AWS S3 bucket to ensure you have access to logs and reporting.
As part of the platform integration, you also benefit from full audit trails, policy-based governance through Open Policy Agent, integrated secrets management, and much more.
Gremlin supports RBAC, SSO, and provides audit logs and reporting. They are a SaaS-only provider and can’t support a self-hosted model.
Harness provides SLAs for all of our product modules and platform. If your enterprise needs training and support, we have enterprise options that can be leveraged. We also provide a Community Developer Hub which includes tutorials, documentation and customer driven content that can help you build a community.
Gremlin provides an SLA, training and support.
Harness CE is the Chaos Engineering module of the Harness Software Delivery Platform. Each module can be used standalone (integrated as a best-of-breed solution to a DevOps toolchain) or as part of the platform. When used as part of the platform, each module passes meta-data and can provide greater levels of automation than if used standalone.
Gremlin does not offer a unified software delivery platform.
*Please note: Our competitors, just like us, release updates to their products on a regular cadence. We keep these pages updated to the best of our ability, but there are bound to be discrepancies. For the most up-to-date information on competitor features, browsing the competitor’s new release pages and communities are your best bet.