One of the most important ways to reduce cluster resource utilization is to configure the workload to have a trade-off between performance and cost saving. In turn, this helps us have a smaller cluster, which also saves us money by having either a smaller number of nodes or having a cheaper virtual instance. This helps us with cloud cost optimization.
Therefore, having correct resource requests and limits are necessary in Kubernetes if you don't want to starve your Pods of compute resources, don't want it to be killed on high memory utilization, and want to save on cost by not overprovisioning.
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "6Gi" # minimum of 6GiB memory will always be available with the container
cpu: "2" # minimum of 2 vCPU will always be available with the container
limits:
memory: "10Gi" # the container is not allowed to exceed 10GiB of utilization
Note that the limits for CPU are not configured in the example above. This is the best followed practice across the industry.
This makes sure that the performance is not affected due to throttling when the CPU utilization is more than the requested CPU for a particular period of time. Moreover, it doesn't starve other containers of its guaranteed CPU resource (configured using requests).
Coming up with proper resource requests can be difficult if you don't know the actual utilization of the Pods. Even if you get to know the instantaneous utilization, you won't know if that will take seasonality into consideration and will be sufficient on days when the utilization is very high. For that, you must know your Pod's historical utilization data over time. This will give you a rough idea of the maximum resource utilization over that time period. Furthermore, what if the Workload has multiple replicas and each one has a different utilization?
That's when Harness Cloud Cost Workload Recommendation comes into play, which recommends the compute resource suitable for the workload.
When you configure Cost Visibility with a Kubernetes Connector, the corresponding delegate associated with the Connector starts pulling CPU and memory resource utilization metrics for every Nodes and Pods (including individual containers) present in the cluster every minute, with the help of metrics server.
We use histogram aggregation to collect utilization data, and aggregate it for a 20 minute window in the delegate. Then, this 20 minute aggregated histogram data is sent to Harness, where we again aggregate it at a 24 hour window and save it in the database managed by Harness. For a particular workload, each of these daily histograms has equal weightage, as it's not a decaying histogram. Therefore, if you choose the last 30 days of data to aggregate, then we will give equal weightage to all of these days.
A decaying histogram might look like a tempting solution, but it doesn't effectively take into consideration the seasonality of high resource utilization on some days of the week. Suppose your application has very high traffic (consequently high resource utilization) on weekends:
That's why we give equal preference to all previous days while calculating the Recommendation.
Note that this histogram is calculated for each of the containers present in the Pod. And, if there are multiple replicas for a particular Pod, then data from all replicas is considered as a separate sample for a particular container. In other words, the utilization data from all of the replicas are aggregated for the same container. As a result, the utilization data for all of the replicas is normalized and a suitable resource is recommended based on the percentile chosen.
Furthermore, since the recommended resource is for each replica, the current number of replicas can be left as such while applying the Recommendation.
How are Recommendations calculated when the resource requests and limits are not configured?
The recommended resource is based purely on the utilization metrics pulled from the metrics server. Therefore, it doesn't matter or make a difference whether or not the resource requests and limits are configured.
Do Recommendations take burst of CPU into consideration?
Yes, we collect metrics data every minute, and the data sent by the metrics server is the average of the last one minute window for any container.
What if there are multiple containers inside of the Pod?
In that case, we will get separate Recommendations for these individual containers. The recommendation is at the container's level and not at the Pod's level.
Recommendations often suggest different limits than requests, which impacts the QoS.
It is not mandatory to apply the recommended requests and limits strictly. You can use the slider on the histogram to choose a suitable resource based on the percentile values, and the recommended resource will change accordingly. For example, to use Guaranteed QoS, you can choose suitable resource requests for both CPU and memory based on the percentile value selected, and that becomes your CPU and memory limits as well, since Guaranteed QoS requires that the requests and limits should be equal.
The Workload Recommendation simply provides you with the visibility to your container resource utilization over a period of time. With its help, you can choose a resource suitable to you.
In this post, we learned how to configure a correct resource for Kubernetes workloads. This helps us have more free space inside the cluster, thereby allowing us to run more workloads on the same cluster or have a smaller cluster, which ultimately saves us cloud costs.
After we have rightsized the workloads, in a coming post, we will learn how to rightsize a node pool by having the most suitable node instance type and the number of nodes using our Cloud Cost Node Pool Recommendations. You can refer to our documentation for Node Pool Recommendations today.