Harness Continuous Verification (CV) allows you to verify new deployments with the help of machine learning (ML). It rapidly detects regressions or anomalies, and helps in the rapid rollback for failed deployments. CV offers a slew of benefits, but for this article we’ll focus on the technical aspects.
Once you set up a verification step within a workflow, Harness makes an API call to the verification provider and relies on the verification provider to return the data. Then, Harness collects data for the configured Analysis Time Duration. The ML Algorithm kicks in to compare the received data with the selected verification type (Previous/Canary), and accordingly marks the transaction as Risk/No Risk. This helps isolate the issue before final deployment, and lets you roll back in case of anomalies.
You can see all of the API calls made to the provider under the API call tab within the Execution, where a similar call is being made for other providers. In the case of a custom APM, it will be as per your configured query. The following is a sample request/response for appdynamics.
Here you can see that data is being returned for node1234, with the metric Calls per Minute, BT--> _APPDYNAMICS_DEFAULT_TX . Therefore, multiple calls like this will be made during analysis time to get responses for different metrics.
Harness also provides flexibility to configure custom queries, so that you have complete control in case of custom Metric and Log verifications.
https://<verification-provider>/controller/rest/applications/11/metric-data?output=JSON&time-range-type=BETWEEN_TIMES&metric-path=Business%20Transaction%20Performance|Business%20Transactions|app|_APPDYNAMICS_DEFAULT_TX_|Individual%20Nodes|node1234|*&start-time=1636710979679&end-time=1636711279679&rollup=false
[{"metricName":"BTM|BTs|BT:8571397|Component:128319|Calls per Minute","metricId":13,"metricPath":"Business Transaction Performance |Business Transactions|dynamic|_APPDYNAMICS_DEFAULT_TX_|Individual Nodes|node1234|Calls per Minute","frequency":"ONE_MIN",
"metricValues":[{"startTimeInMillis":1636710960000,"value":6.0,"min":6,"max":6,"current":6,"sum":6,"count":1,"standardDeviation":0.0,"occurrences":1,"useRange":true}]}]
Nodes are identified under the service instance. “New” refers to the newly-deployed instance, and “Previous” refers to older nodes in case of canary (the following screenshot on the left). This is taken from old execution in the case of previous execution, which shows as the Baseline. You can select the three dots to see all of the nodes identified.
Once analysis begins, make sure that the correct node is being identified and data is being returned with the correct values while analysis is running.
Data points are marked on the graph after receiving the mean average of the data collected (((x-1)+x+(x+1))/3). This means that it’s unnecessary to see the exact number of data points as per the data collection period, or the number of responses received in API calls. For example, in the following execution, data was collected for five minutes, but only two data points are marked.
Once the data point is marked, it will be compared with the Control host data and marked as Risk/No Risk. If you want to review the Individual Node comparison, then you can select the info icon after the metric name and you can see all of the nodes by node comparison.
If the verification step is not working as expected, then you can check and verify the following:
1. First, test and verify from the workflow stage if it looks good, and make sure that it can make an API call to the provider and get a proper response.
2. Next, check the third party API call tab once the execution is started, and validate the API request and response. You must make sure that, in case of a custom APM, your query is correct and returns valid data for the corresponding node while the analysis is running.
3. Make sure that the correct BT/query/metric data is being returned as per the configuration matching the corresponding new/previous node during the time of the analysis.
4. The Execution Logs tab will provide more information (timestamps for data collection, analysis completion, etc.). It can also flag for issues, such as previous node data not found or no baseline found.
5. You can utilize the Custom Threshold to avoid any failure or anomalies.
In this article, we explored the fundamentals of Harness Continuous Verification. Harness supports major APM and logging vendors, and also provides an option to add custom APM, where you can query custom metrics.
For further reading, go through the Importance of Continuous Verification as well as Applying ML with Harness.