Cloud spend is forecasted to reach over $360B by 2022, predicted to be fueled largely by speed of delivery and innovation. (Source: Gartner).
At the core of this innovation are the engineering decisions that can sometimes prove costly, causing bloated cloud bills and inefficient resource usage. Balancing software velocity with operational efficiencies may not be as easy, as we learnt from our users.
We launched Harness Continuous Efficiency in BETA and have been working with a few customers to understand their specific problems in managing cloud costs these past few weeks. The feedback we have gotten so far has been overwhelmingly positive. One of our users was able to realize over $500K annual savings within days of using the product. Another claimed that they could now be more proactive about feature releases & anticipated traffic with cost visibility in hours vs weeks.
Here’s a summary of what we learned and how we iterated quickly to address their challenges.
#1: Shifting down & driving a cost-aware culture
You have probably heard of Shift-left or shift-right. But what is shift-down? Shift-down is the ability to empower engineering – developers, dev-ops, SREs – with useful and actionable cost-insights that may have traditionally been in the hands of tech-savvy financial analysts & executives. A majority of our users mentioned that most members in their teams didn’t have access to cost-management products.
Those from business who did have access lacked the specific context of the decisions being made that caused costs to change. One of our beta users mentioned, “it helps to know that my developers have cost information at their fingertips. It helps them be more proactive about their spend while at the same time not get overly nervous about what every minor change might cost them”. It helps drive cost transparency across key stakeholders while empowering individual developers and DevOps engineers to build trust with their finance counterparts.
One of our users claimed that they finally have a tool that is friendly to people spending the money, and not just the ones counting it.
#2: Digging into cost data or Root-Cost Analysis
Containers and container orchestrators are a popular choice for most teams. However, traditional cost allocation strategies are massively challenging due to the lack of visibility into granular cost data.
Cloud providers provide billing data only for the usage of the underlying server instance, but when this server instance is running multiple containers, you need additional information to figure out the cost of running those containers. How much of that server was used by a container for service foo vs service bar? How many of those nodes had unallocated resources claimed by containers? How many of those containers had resources that were claimed but were idle?
Our beta users pointed out that starting with the account or service instance, and being able to drill all the way down to a specific micro-service or container was critical to optimizing costs. At Harness, we call this root-cost analysis. Root-cost analysis led one of our beta customers to save over 182K in annual costs for a single overprovisioned Kubernetes cluster in one of their dev environments. Another explained what would ordinarily take 4-5 days with their existing cloud cost management product, now took them 20 minutes.
#3: Allocating costs by application, service, clusters etc
To understand true cost drivers of any service, one needs to look beyond the aggregated daily/weekly/monthly details that most native cloud cost products such as AWS, GCP & Azure provide. They are very useful to get started, but our beta users noted that they’d hit a wall trying to understand how much a service, application or a product cost, and what each of their teams spend in research & development or in production to serve their end customers. Small/mid-sized companies used show-back, to let teams know what they were spending, and the more mature companies used chargeback to associate costs to the team’s budgets and P&L.
We worked with users on either side of the spectrum and the common pain-point across most was, it isn’t easy to isolate resources and map them to an application or a product without diligently tagging them. You can circumvent the need to tag your services, at-least the ones you deploy with Harness since we understand the construct of your applications/services and the infrastructure they are deployed to. The below representation illustrates how such a breakdown occurs by leveraging existing Harness deployments.
Microservices costs leveraging Harness deployments
#4: Protecting cost margins
To be able to understand the cost of serving customers, it is important to know your cloud spend in Research & Development (Dev, QA, Staging, etc) and Production. We saw many variants of how our users were able to do this and learned quite a bit from each of their experiences. Some had their spend distributed across Kubernetes Namespaces and knew precisely what they were. Some used Kubernetes labels to group pods by their end customers, team/owners, or release/version. Another user wanted to understand the cost of their newly deployed/launched service and if it was more than they’d expected.
My personal favorite was that of one of our users who wanted to run their workloads for a few hours and understand costs on a per-hour basis to simulate a real production workload. We rolled out the hourly granularity feature within days and this helped them understand their spend better and what to charge their end-customers. Our users are just scratching the surface, and there are tons of opportunities to be able to leverage Harness for understanding and protecting cost margins.
Cost of Kubernetes clusters by the hour
#5: Granular Cost Governance
Most users traditionally managed costs broadly by cloud accounts, subscriptions, projects or services. Some had weekly or monthly reports sent to key business executives or finance stakeholders and would typically wait until the end of the month. Others were more proactive in managing budgets, meeting more aggressively and having open and transparent conversations.
What we learnt was that the alerts they received were informational for the most part. The goals weren’t broken down to granular targets by teams who owned specific microservices, namespaces, clusters or environments (such as production, staging, development QA etc). There was also a desire to forecast spend against a granularity like a namespace for the next quarter as an example.
Harness provides the ability to govern spend at all levels of granularities and divide company goals into smaller targets owned by teams to self-manage costs.
One of our beta customers, a services company in the public sector claimed, “Since our clusters host applications from many different business units, it has been very difficult for our finance team to accurately allocate cloud computing costs to the right P&L. With Harness CE, that task becomes trivial and we can now shift down the responsibility of cost to our engineering teams”
Harness costs governance for QE
#6: And finally, cost isn’t a one-time fix. It is Continuous.
A key learning for us observing users during beta was that cost isn’t a one-time fix. It needs continuous monitoring when infrastructure is provisioned in the cloud, test environments are created but not destroyed, clusters are auto-scaled with considerable cushions specifically for production workloads or when there are investments in new initiatives.
Our users in beta encountered all these scenarios, some more frequent than others. While they told us they didn’t want to constantly watch dashboards, they did want to be notified when costs spiked. Besides having access to the cost data, they considered consolidating deployment tools with costs as a huge benefit since they didn’t have to switch between multiple tools to know the impact of their pipelines.
A deployment event such as launching a new version of a service could be looked at in the context of its impact on cost, for instance. Below is a trend of one of the namespaces our customers enabled to monitor their team’s cost. As you can see, the costs tend to creep back up and needs to be continuously monitored to keep them within bounds.
For more information and to request your demo of Harness Continuous Efficiency visit https://harness.io/continuous-efficiency/