Cloud cost management is a hot topic, and is the new imperative for any software company these days. Despite the variety of cloud providers or platforms such as Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure, it seems none arm engineering teams well enough to actually handle their cloud costs.
In fact, AWS itself admitted as early as 2017 that about 35% of cloud spend is wastage, and back then, that was $6.4B of their revenue. In 2020, Gartner predicted that the public cloud market would grow to $257.9B for the year. That means companies were projected to collectively waste $90.8B in public cloud spend. The public cloud providers probably aren’t complaining about that, but engineering and company budgets definitely are.
The reality is that cloud bills are exploding and everyone wants to bring them down. In this post, we’re going to walk through what contributes to crazy cloud bills, dive into some ways to reduce tangibly cloud costs, and explore some tools that will help make all of this easier.
The Importance of Cloud Cost Optimization
The biggest and most obvious reason to optimize your cloud costs is simple: to keep your cloud bill as small possible. In the same way we don’t want to blow up our personal wallets and keep costs down where we can, we want to keep our cloud spend to a manageable level so we have money to spend down the line. You can check out our post about picking the right cloud cost management method to see what might work for you.
Imagine if you had a $100 cloud budget that you wanted to stretch for a month, but because of non-existent cloud cost management practices or visibility, you went through that in a day. You went bust in the first round. In the same vein, engineering teams have budgets — or discounted commits — with cloud providers that they expect to last the entire year. What happens when costs go out of control and the budget is hit (or vastly overrun) because we don’t know what’s going on with the cloud bill?
There are horror stories of startups who have straight up run out of money because they made one small mistake that ended up in a massive cost overrun. Getting visibility into cloud expenses and figuring out how to manage and optimize them is becoming an increasingly important issue — and as time goes on, more and more companies are expecting their developers themselves to take a hand in keeping cloud costs optimized. Today, managing cloud costs often falls to an external team that doesn’t have the right context, and so while they can find cost savings, it’s without an understanding of what’s causing the costs.
What Are the Biggest Contributors to High Cloud Costs?
Contributor #1: Unused Resources
All environments are prone to unused resources. These resources include unattached storage volumes and obsolete snapshots (which get charged per GigaByte (GB) per month), idle load balancers, and unused but running instances (which get charged per hour).
You’ll probably be all-too-familiar with how easy it is to spin up new instances of cloud resources, and how easy it is to forget they exist, because your focus is on speed of development. If you use AWS, you’re probably guilty (as am I) of spinning up a few EC2 instances for a small test application and then letting them run even if you’re not using them anymore. That, of course, means you’re still paying for these unused EC2 instances and it’ll hit the wallet later. At the same time, you might even be spinning up new compute instances that will add to the cost! Oops.
Developers should carefully maintain their sandbox environments to ensure that compute instances that are spun up are cleaned up or shut when they are no longer needed.
Contributor #2: Compute Resources
In container environments, developers and administrators can use Kubernetes to manage and allocate compute resources to their Kubernetes resources, including pods, nodes, and namespaces. It’s a fairly common practice to specify a resource limit on the amount of CPU (measured in cores) and memory (measured in bytes) an application can use within an ecosystem. For public cloud costs, however, this can lead to idle costs where applications are over-allocated and do not use all the resources that are reserved. Likewise, empty clusters, environments, and workspaces contribute to unallocated costs.
Developers should use tools to monitor the performance of their applications. Gathering information on computer, memory, network, and storage consumption can help you optimize resource utilization and allocation within the cloud. Sharing these findings with your cloud administrator can allow you to reduce idle and unallocated costs, or better yet, you might even be able to deal with them yourself.
Contributor #3: The Hidden Costs of Storage Services
It’s easy to understand that storage comes at a cost (per GB/month), but many developers forget that I/O (input/output) operations on storage files also cost money. Service providers charge per request/API call. Data transfers are also charged per GB when transferring data between different regions, Virtual Private Clouds (VPCs), or out of the cloud.
Many of us are used to, in local environments, freely using read/write operations on files while developing and testing, so it’s understandable that we may treat reading or writing to an AWS S3 bucket the same way. But that’s where we have to be careful, because every one of those reads and writes costs money when working in the cloud.
Developers should carefully consider the hidden costs of storage when architecting data transfer jobs or data-intensive applications. One option is to keep I/O to local files until you’re confident they’re optimized for the cloud or they meet the architectural requirements to minimize these operations.
3 Ways to Reduce Cloud Costs
Let’s look at some of the ways in which you can reduce your cloud costs, even as a developer. If you’re on the end of the spectrum where you’re responsible as an engineer for keeping costs low, this is for you.
Rightsizing Cloud Resources
Rightsizing is a fancy way of saying “reduce idle costs,” or even more simply, “don’t use bigger instances than you need.” Obviously, this is very hard to get right the first time in practice, because you don’t know exactly what you’re going to need or what the app usage will look like before you’ve built the thing!
Where rightsizing comes in is generally in a post-build scenario where you can get the right kinds of resources, now that you know what you actually need. A simple example is that you might provision an AWS EC2 Large instance while you’re building, but you don’t end up using most of that compute instance and so you find you can actually use an EC2 Med instance. In the illustration below, you’ll see that even if you save $5/hr on your EC2 instance cost, that turns into $43,800 of cost savings across the whole year.
Reduce Unallocated Resources
Oftentimes, you’ll run into situations where it’s not the individual instance itself that is overprovisioned, but rather a set of resources that you can use more optimally. In the same vein as rightsizing, it’s difficult to do this ahead of time, but over time it becomes much more obvious where you’re not using resources and where you can use them better.
Using a simple non-cluster example, let’s say you have 40% steady usage across three AWS EC2 instances. If you don’t need all the extra capacity in each of those EC2 instances, could you possibly shift over the compute from one of the three to another? That would get you to 80% usage in one instance, 40% in a second instance, and the third one you can spin down because you don’t need it anymore. If each instance cost $5/hr, then you’ll be saving $43,800 over the course of a year. Of course, this doesn’t work if you’re keeping that capacity handy because of expected usage spikes in production.
In many cases, you’ll be working with containers and you’ll be overprovisioning the nodes with which you’re working. The below illustration demonstrates the same concept as the non-cluster example above but how it would work within your clustered workloads, perhaps in Kubernetes or AWS ECS. Using the same methodology of moving over workloads, we can effectively kill one of the nodes and net you a cost savings of $5/hr or $43,800 per year.
Know the minimum amount of cloud instances you’ll need to consume in the coming time periods? You definitely want to cash in on volume discounts that all of the cloud providers will happily provide in exchange for a minimum commitment over a given time period.
You may have heard the terms on-demand vs. reserved vs. volume as they relate to the costs of cloud resources. Even if you haven’t, they’re basically just pricing tiers.
- On-demand will cost the most because you’re asking for your cloud provider to provide you compute instances right when you need it. It’s the same concept as needing a contractor to come out to your house that day to make a repair, which will cost you more.
- Reserved is cheaper than on-demand because you’re saying, “I need this compute for a longer period of time and I’m willing to pay in advance to have that set aside for me.” The same as if you book your contractor in advance, you’ll find some cost savings in the form of a discount.
- Volume purchases give you the best discounts because you’re saying, “I’ll guarantee you $X over this period of time if you give me a discount.” Because you’re paying upfront and giving the cloud provider a guarantee of income, they’re willing to give you the biggest discount. This is almost like having your contractor on retainer; because you’re guaranteed to be paying them no matter what, you’re going to get a break on services when needed.
Tools for Reducing Cloud Costs
Reducing your cloud costs seems great and the methods above may even work really well in your mind. But they nonetheless take time to do yourself, and you can always make it easier with the right tools.
Continuous Efficiency Tools (Harness)
Tools that allow you to see your exact usage as it happens are like looking into log files to find out exactly what is going on in real time or at any timestamp. This is going to be the best way for developers in the weeds to see exactly what costs they’re incurring in the cloud and where they’re coming from, so they can optimize cloud costs at any moment in time at the ground level.
Harness has a product called Continuous Efficiency that does just this.
Cloud Cost Management Tools (CloudHealth, Cloudability)
These tools are good for assessing point-in-time — or periodic — cost savings opportunities, rather than opportunities to optimize for every resource. Typically, teams will look at these tools every few weeks or months to see where they can save money. The downside of these is that they might not get into the deep details of where every cost is coming from, meaning that for you as a developer, it might not get into the nitty gritty you need to handle costs yourself.
Cloud Provider Billing Tools (AWS Cost Explorer, GCP Billing, Azure Billing)
Each of cloud providers provide these tools to give you the basic visibility into where your cloud costs are coming from. This is usually most useful for people who own the budget and want to see at a high level what things are costing. They’re not as useful for in-the-weeds developers because they don’t necessarily provide the granularity they need to understand their specific usage, and they don’t dive into containerization costs like with Kubernetes.
Reduce Cloud Waste and Lower Cloud Costs with Harness
As developers take on an increasing amount of responsibility for optimizing cloud costs as part of their CI/CD process, it’s important for them to have the tools to be able to understand their own usage and head off cloud bill oopsies before they become an issue.
Harness Continuous Efficiency is built with the needs of developers in mind, making it easy to get visibility at any level they need. For the first time, developers can take charge of their own cloud expenses and do what engineers do best: optimize.
Do you want to learn more about which tools there are out there? We mapped out the top cloud cost management tools to consider.