Cloud Cost Management and Deployments are often analyzed in isolation and owned by different teams within an organization. What happens when these teams collaborate closely? Does it even make sense to look at costs and releases in conjunction with each other? In this post, we’ll look at three reasons why every organization should consider correlating Engineering releases with their cloud costs.
The lift-and-shift strategy was common among organizations making their first cloud forays. But organizations have increasingly started adopting fully-managed cloud services. Furthermore, introducing new fully-managed services into the overall architecture brings in bugs with invalid configurations, false usage interpretations, etc.
For example, in Amazon CloudWatch Events, you can create rules that lead to infinite loops, where a rule is fired repeatedly. A potential rule might detect that ACLs have changed on an Amazon Simple Storage Service (Amazon S3) bucket, and then trigger software to change them to the desired state. If the rule isn’t written carefully, then the subsequent ACL changes fires the rule again, thus creating an infinite loop. An infinite loop can quickly cause higher than expected charges.
In GCP Stackdriver logging, developers might have added log lines to a hot path of the code. This might not add any value or add very little value millions of times in a day, thereby causing a cost spike in Stackdriver Logging costs.
Similarly, with all of the microservices built by development teams, there are changes where code with freshly-introduced bugs might creep into production. What if a frontend microservice makes multiple calls to the same API endpoint? If this is caught earlier in the development cycle, then it’s a minor coding error. However, in production environments, it will affect the page performance and also contribute to a cost increase if the API layer is handled by cloud services which are typically priced based on usage. For example, consider a lambda function where users pay per invocation and the amount of compute seconds used.
Moreover, correlating deployments with cost spikes will help developers find bugs earlier in the life cycle, even if they inevitably creep into higher environments, such as staging and production.
Developers love to optimize, and once developers start looking at costs as both an opportunity for optimization and just another boundary condition to optimize, organizations will achieve an efficient cost spending on cloud. Correlating costs to engineering releases unearths numerous opportunities for optimization. Here are a few examples:
You might be querying a data warehouse, such as BigQuery, which charges based on the amount of data scanned while querying. Missing partitions while adding new queries can significantly increase the amount of data scanned. This can send costs through the roof. Similarly, in the case of Stackdriver metrics, pushing new metrics that aren’t optimized (have more group-by fields than necessary) causes costs to spiral.
More often than not, organizations can benefit from spending time to understand the pricing models of the various cloud services that an application consumes, as well as how every engineering choice will affect the costs. The easiest way to do that is to closely monitor how every deployment is affecting costs.
Short Circuiting the Feedback Loop
Correlation also reduces the feedback cycle to narrow down the root cause of a cost spike. This makes remediation extremely easy. Catching the cost spikes and issues sooner in the product life cycle can help reduce business impact and ensure user happiness.
How Can Harness Help?
Harness is an end-to-end software delivery platform, including both Continuous Delivery and Cloud Cost Management on the same platform, paired with our custom dashboarding capabilities which crosscuts across modules. This makes it extremely easy to correlate cost spikes to deployments. Further Cloud Cost Management also has a first-class Anomaly Detection feature. This notifies users regarding cost anomalies – which further simplifies the process of constantly checking for cost spikes.