Kubernetes Autoscaling allows to automatically scale the cluster by adding more nodes. This ensures that the cluster is always providing enough capability to run an application.
Kubernetes is ruling the container orchestration world. It’s a truly portable system that has powerful capabilities for deploying, scaling and managing containerized applications.
It is an interesting space to look into and becomes even more if you add autoscaling into the mix.
Why Custom Metrics instead of Traditional Metrics?
As Autoscaling is natively supported in Kubernetes. By default, you have an option to automatically scale the number of Kubernetes pods based on the observed CPU utilization (A traditional metric).
However, in many scenarios, you want to scale your application based on other monitored metrics, such as the number of incoming requests or memory consumption. In Kubernetes 1.7, you have the capability to do that by leveraging the Prometheus and Kubernetes aggregator layers.
Custom metrics gives more control and visibility on what parameters the service needs to be autoscaled.
Flow diagram for Autoscaling
Prometheus is widely used to monitor all the components of a Kubernetes cluster including the control plane, the worker nodes, and the applications running on the cluster.
A sidecar for the Prometheus server that can send metrics to Stackdriver.
Once Prometheus scrapes the metrics from various pods. This sidecar sends the metrics and metric-data to Stackdriver.
The metrics server uses the Kubernetes API to expose the metrics so that the metrics are available in the same manner in which Kubernetes API is available. The metrics server aims to provides only the core metrics such as memory and CPU of pods and nodes and for all other metrics
Custom Metrics API
Custom Metrics Server exposes the endpoint using Kubernetes API. But before that metrics needs to be converted to the suitable format.
Following manifest file describes a HorizontalPodAutoscaler object that scales a Deployment based on the target average value for the metric
Once you have all these components setup. On running describe HPA it should result in the following output.
Once HPA comes in to play. The deployment starts scaling up/down automatically based on the configurations provided. As shown in the Image. The HPA events starts recording whenever the pod scales up/down.
Harness & Kubernetes with HPA
Harness provides an easy way of configuring HPA for services to be deployed on Kubernetes. With configurations completely in Yaml. User needs to provide HPA details while configuring Service in Harness and that’s all.
After the deployment is done. The Service will be up and running with HPA enabled. With such an easy way of configuring HPA. Harness supports all types of HPA :
- Multiple metrics based HPA
- Default metrics based HPA
- Custom/External metrics based HPA
In this blog, I covered how to use Prometheus, Custom Metrics API server and HPA By Custom Metrics. For HPA the first thing you must truly understand is which part of the application causes the high-load situation and configure the proper scale policy to allow the application to survive during peak times.
In Harness, we have implemented HPA at various places to make sure the smooth running of different microservices. Harness makes things easy and efficient for a user by providing autoscaling behind the scene and ties up the metrics in the system as part of the continuous delivery process. Taking away all the pain of scaling and monitoring the system.