iHerb is a global eCommerce leader selling natural healthcare products. iHerb has been in business for 22 years, and in the past 5 years iHerb has experienced exponential growth. In 2013 iHerb had 2 physical data centers with roughly 80 virtual machines (VMs). Now, five years later, iHerb is operating a cloud eCommerce platform spanning four public cloud providers and multiple on-premise data centers, serving upwards of 1.4 billion page views monthly.
We are now running 30+ Kubernetes clusters on our platform in addition to 800+ cloud VMs across 19 cloud data centers in 13 different regions.
So how did we achieve this massive cloud infrastructure in such a short period of time, while at the same time experiencing explosive growth?
In 2013, we began to see a significant rise in our website traffic. We were adding additional products and features to our website, and our hard work was paying off. We were attracting significant numbers of new customers, and retaining their business. One of our major limiting factors pre-cloud was the fact that we could not easily scale our infrastructure to meet our traffic demand, nor could we achieve high availability with the resources we had. Our data centers had a fixed capacity and it took weeks, sometimes months, to get additional capacity added at a steep cost. We didn’t need all that capacity 24×7, so we were constantly wasting money. iHerb leadership understood that we could never quickly scale to meet the demands of our ever-increasing website traffic if we stayed in our physical data centers, so the decision was made to move to the cloud.
During this time we were primarily a windows shop with all of our code in .Net. We had a CD tool for our Windows deployments which worked well, but with our highly customized Windows machines it was tedious to create new servers and not easy to scale. We needed a solution that would allow us to easily and quickly scale up and down with traffic. We knew that continuing to run our .Net applications on Windows would never scale well and would always be slower and more expensive than using containers. We knew containers were the future, so we began investigating container management systems. It was at this time, in early 2016, that we began looking at this new technology called Kubernetes.
Adventures in Kubernetes
The first application we moved to our new Kubernetes clusters was our catalog. As customers browse through iherb.com they may move from the homepage, to a category page and then to a product detail page. We call all of this content our catalog. It was no small task to re-write all this code from .Net to .Net Core 2.0. Our catalog team undertook the exceptionally difficult job of making each page/section of our extensive catalog into its own Kubernetes application. Today we have over a dozen different Kubernetes applications that make up our catalog. We broke the catalog down into these individual pieces to allow the catalog pages to scale independently of one another. To accomplish this we use different hpa’s, horizontal pod auto-scalers, for each Kubernetes application.
We were excited to finally have our first large production applications ready for deployment. But now that we had our applications, how can we deploy these dozen applications each with their own set of secrets, configmaps, ingresses, hpas, and services to 12 different clusters around the world? I would love to tell you that from the start we had an amazing CD tool that automated all of this for us and provided access controls, auditing, and easy pipelining to facilitate the management of all these moving pieces. We did not. As I’m sure you are aware, the pressures of time-to-market along with limited resources usually means a bumpy road and a more manual process than would be optimal or preferred.
CD in Kubernetes
We were very fortunate to have a talented lead on our Catalog team who was able to jerry-rig our legacy Windows deployment tool to do some parts of the CD deployments to our multiple Kubernetes catalog clusters. Unfortunately, this complicated process was not able to be extend to other team’s applications so it fell to DevOps to manually deploy other applications into these Kubernetes clusters. For a while we used Kubectl to get all these newly ported applications up and running in their appropriate Kubernetes clusters. We knew that this manual deployment method would never scale, so we immediately began researching CD tools. We wanted a tool that would provide fully developer-managed deployments as fast as possible with the smallest amount of training. We looked at quite a few, and in the end we chose Harness. Harness had just gone GA while we were in the middle of our CD tooling search. After a few remote presentations, Harness came onsite and gave an excellent presentation/training for our Leads and Directors. We were hooked. Putting our applications into Harness was fast and easy, as well as manageable and auditable.
In our first week using Harness we deployed 4 applications across 6 data centers and 10 Kubernetes clusters. Today we have more than 50 applications in Harness comprised of over 80 different services being deployed across 30 Kubernetes clusters using 4 different cloud providers, as well as on premise, and there are still more to come.
Achieving Structure and Standards with Harness
With the introduction of Harness to our CD process the DevOps team was now able to ensure specific structure and standards in both the way the Kubernetes applications were deployed and the actual Kubernetes applications themselves. We have standardized our deployment process by using the Harness implementation of the Canary deployment. Since most of our eCommerce applications are customer facing, we must have 100% uptime even during deployments. Using the Harness Canary deployments across our applications allows us to deploy to production with no risk of downtime.
Using Harness for our CD makes it easy for the DevOps team to audit each new Kubernetes application for our specific set of requirements before it goes to production. Within just a few minutes we can quickly verify that each new application adheres to our standards:
- Appropriate requests and limits for both CPU and memory
- Liveness and readiness probes configured
- All naming conventions followed
- Sensitive data has been correctly secured in secrets
- Canary deployment used
- Appropriate notifications configured for success and failure
- How and when to use the Harness automatic rollback strategy
Onboarding new development teams
When we began using Harness, the DevOps team would build each new Kubernetes application into Harness, and then teach the software development teams how to deploy their applications using Harness. The DevOps team then began training some of the software development teams to build their new applications into Harness. With over a dozen software development teams at iHerb, this quickly became too time consuming for our DevOps team. We brought in Harness professional services and in just 2 days Harness had trained all of our software development teams how to build their applications into Harness, and how to deploy those applications with Harness. We were finally free….almost.
The final piece to the puzzle
We had evaluated almost a half dozen CD tools before deciding on Harness. In addition to the ease of use and detailed visibility into all deployments, Harness has a unique feature built in to it’s CD tool that no other vendor offered; Continuous Verification. Harness has built in Continuous Verification, which is essentially machine learning of our logs and application performance. This Continuous Verification compares and analyzes logs from the new pods in various phases of a canary deployment to those from the ones running previous deployment. If the Continuous Verification detects that the new deployment is generating more errors or new errors vs the existing deployment, Harness can be configured to automatically roll back the new deployment, or pause the new deployment until someone can look at the results and then choose an action.
We have configured our Harness Continuous Verification to analyze the logs from our Kubernetes applications using our ELK logging stack as well as DynaTrace. By integrating Harness Continuous Verification with DynaTrace in our Kubernetes clusters, we are able to see if a new deployment has increased latency or reduced throughput either within our application or with our 3rd party integrations. Using Continuous Verification in tandem with canary deployments, we are able to quickly see if we have an issue in production without affecting a significant number of customers.
By implementing Harness throughout our company and by leveraging Harness professional services, DevOps has been able to get out of the business of building and managing deployments to Kubernetes. We have also gained a much greater visibility into what applications have been deployed into each Kubernetes cluster, by whom, when, and the results of each deployment. Our deployments have become more manageable and predictable, taking less than 20 minutes per deployment, and are much safer using the Harness Continuous Verification.
The DevOps team at iHerb is no longer babysitting deployments, and we are dealing with fewer production issues because the software development teams will now catch these issues in the first phase of their canary deployments using Continuous Verification. With the number of applications we have in Harness and the number of times our teams deploy daily, the amount of time saved for the DevOps team has been immeasurable. Our software development teams are able to get their features to our customers faster and more reliably than ever before. Using Harness we have been able to achieve frictionless and safe Continuous Deployment.
Team Lead DevOps Engineering at iHerb Inc.