Connecting observability and logging tools to your production release process is powerful. With machine learning, you can identify when error rates or metrics are unsatisfactory, quickly initiating a rollback. This approach can significantly reduce the risk of change.
An existential question for any engineer with deployment responsibility is “when is your deployment over?”. Digging into that question, an immediate follow up would be “what are you deploying?”. In the Kubernetes world, some might draw the line as when the Readiness Probe is complete. Application centric views might be when the first set of transactions are successful or their transactional boundaries are crossed. Certain architectural thought is that the deployment is over when the next deployment goes in.
Regardless where you fall in that spectrum, there is a common thread that binds all of the approaches. There is some sort of validation that has to occur. From an engineer hitting their newly updated endpoint to validate that the change has gone in, to SREs interrogating the observability stack, validation has to occur to make the judgment call.
The go or no-go decision has become more complex in modern software delivery teams. This is because of usually a lack of an absolute failure. If there was an absolute failure, for example the deployment failed to start, the call to action can be pretty quick; a rollback would ensue almost immediately. Though with the scrutiny of modern software delivery and rise in more atomic microservices, deployments in the form of the application just starting can be expected to be initially successful in production. This lack of a major event makes finding the root cause in line with the adage “there is no root cause”.
Site Reliability Engineering is about finding the needle in the haystack, before the needle was placed in the haystack. Closely interrogating the myriad of observability, monitoring, and logging tools looking for trending towards failure is not an easy process. Having a dedicated SRE for every deployment analyzing this data can be costly. This is why Harness has created Continuous Verification (CV) to do this analysis on your behalf.
Continuous Verification is Harness’s deployment verification. Included as part of your free or paid Harness Continuous Delivery subscription, Continuous Verification validates your deployments with your observability, monitoring, and logging tools and platforms.
By adding a Verify Step to your Harness Pipeline, you can allow Harness to query a Health Source or Health Sources such as Prometheus, Datadog, Splunk, AppDynamics, or even a Custom Health Source, to help make a judgment call on your behalf. Can look at the more exhaustive list in the CV documentation.
Continuous Verification is integrated into your Harness Pipeline. You do have the ability to allow Harness to take automatic action on your behalf or enact a Failure Strategy that is supported by the Harness Pipeline such as a manual intervention.
Continuous Verification can work in conjunction with your deployment strategy. For example if executing a Canary Deployment, CV can analyze the canary phases before proceeding on.
Getting Continuous Verification enabled on your Pipeline is an important step to achieving resiliency goals with your deployments.
Combining two engineering sayings, first “quality is everyone’s responsibility” and “slowness is the new down” will produce a third saying, “reliability is everyone’s responsibility”. Your Continuous Delivery Pipeline is a culmination of expertise across multiple disciplines. Having an SRE manually verify every deployment is not scalable thus having a system to assist with the most arduous tasks in validation will allow you to scale. SRE expertise can be consistently and systematically applied in your Delivery Pipeline with Continuous Verification.
Change inherently brings risk and can be a risk to reliability. Even slight blips in reliability can cause reputation damage and revenue loss. Though innovation requires change and balancing innovation and reliability is a paradox many engineers face. With Continuous Verification, adding a layer of systemic verification and validation allows for change to push forward with the ability to baseline what is or was normal.
Having verification and validation as part of your pipeline is crucial. Even if you are not using Harness for your Delivery Pipelines, leveraging something like Spinnaker’s Kayenta or Argo’s Rollout Analysis should be leveraged if using either of those two tools. Harness’s Continuous Verification is designed for ease of use and flexibility no matter which deployment strategy you take, trying CV out is a great first step.
To get started with Harness Continuous Verification, check out this tutorial on Harness Developer Hub which goes through verifying a Kubernetes deployment using Prometheus. Since “reliability is everyone’s responsibility”, adding a Verification Step to your Harness Pipeline is a prudent step and included as part of your Harness CD subscription (Sign Up Here). Take a look at the tutorial and get further on the reliability journey, today.