What is a Production Incident?
A production incident, commonly known as an "incident," is an unexpected event or problem that arises within our live production environments, resulting in either complete or partial service disruptions. In the case of a partial incident, it renders one or more functions of a module nonfunctional or inaccessible.
All production incidents with high blast radius or impact are posted on our status page (https://status.harness.io) and our users can subscribe to the feeds from this site to get notified.
These major incidents follow an escalated all-hands-on-deck process with shorter timeframes and higher urgency that is required to accelerate the resolution process.
Prod Incident Criteria:
- P0 Incident means Harness is down or is unusable for 5+ customers in a specific cluster
- P1 Incident means a major Harness feature/function is not available to any of our users including regressions (new releases breaking existing behavior)
Here is a general workflow for how an Incident is managed at Harness.