In the public cloud world, nothing is free. Logging is sometimes viewed as an afterthought but an import piece to have the ability to audit and gain insights into a system. With the move to the public cloud, logs can get expensive if left unchecked; reasons could be that we are used to a certain level of log retention and verbosity on our own systems and expect the same in the public cloud.
At Harness, we use Google Cloud Platform (GCP) for hosting our production environment and we are very proactive in optimizing our cloud usage and managing the cloud bill. As our services have been scaling, we noticed recently that our spend on logging has become a significant contributor to our total GCP bill. Before we took action, logging was costing us about $800 per day.
Understanding Our Logging Cost
With public cloud services, your final cost is usually aggregated across multiple dimensions of service such as storage and data transfer. With GCP, Cloud Logging prices are based on usage and it costs $0.50 per GiB of log ingested. We started into looking into the major sources of the log volume. The log ingestion dashboard pointed to the Google Kubernetes Engine (GKE) container and Global (our Delegate ingestion) as the top two sources of log volume. The third was Cloud HTTP Load Balancer.
This made sense since most of the Harness workloads are in GKE clusters which is the first category. The Global category is the logs we ingest from our delegates, aka Harness component which runs in customer environments.
To analyze further, we wanted to understand which workloads within the clusters were major contributors to this cost. For such kind of aggregation functionality, we configured Log Sink to BigQuery. Once we had a few days’ log data in BigQuery, we were able to run rich queries and further breakdown the log data into its sources.
The below screenshot shows the Table size in GiB from our BigQuery dataset where we exported our logs to. Each table corresponds to a particular Logger in Cloud Logging. This way we know the Manager and Delegate loggers are the most expensive.
To further drill down the logs within these tables, we had to understand which components in our code are contributing/writing to the logs. We structured our log data into a JSON payload for each LogEntry. This allows us to have rich querying capabilities on the logs. In these payloads, we put information about source code classes, the source node, and customer account-specific information. Using this we were able to quickly attribute a cost to different areas of code which were most noisy. One example of the cost of logs from particular classes:
With the cost data, we were able to start to be strategic around optimization.
Optimizing Our Logging Cost
We took a few actions to optimize the Cloud Logging spend. Following are the most useful ones:
- Configured Log Exclusions. We did not know of this gem of a feature by Cloud Logging. We have stopped ingesting a major portion of noisy logs with this.
- Configured Log Sink to GCS/BigQuery for the logs which we need to keep for audit purposes, but this way we don’t have to pay for them in Cloud Logging.
- Reduce the logs in the particularly noisy components. Some of the source code was producing disproportionately high log volume. While it was not a problem with a lower scale, it’s not useful, and in fact, diminishes our ability to effectively debug as service has grown. We followed up with individual developers to drop the unnecessary log lines.
The immediate results of this effort are evident. We have reduced the cloud spend on Logging to half since within a few days which amounts to annual savings of more than $140,000.
Hope our experience helps you in optimizing your own Cloud Logging spend. A similar approach can be taken with other public cloud vendors. Saving costs is a goal we have for our Continuous Efficiency product and live those values every day at Harness. If you have not seen Continuous Efficiency in action, feel free to take a look.
Puneet & Brett