This post was written in collaboration with Srinivas Bandi, Dinesh Garg, and Shivakumar Ningappa.
Harness, like many other companies, used Jenkins for the CI portion of our software delivery process. We had over 300 jobs generating an average load of approximately 4,000 builds per day.
Over time, we were impacted by the common pain points and struggles that other Jenkins users face:
- Jenkins Master Management and Resource Issues: The build team spent a significant amount of time troubleshooting master management issues. There were also many CPU and memory spikes on Jenkins nodes.
- Outages: The build team suffered through weekly outages, which impacted developer productivity and bandwidth.
- Usability: Jenkins offers no ‘single pane’ pipeline view, and it’s cumbersome to navigate to older builds. The lack of an out-of-the-box dashboard is also a serious shortcoming.
- Not Container-Native: Jenkins is over two decades old and was primarily designed for physical build machines and support for containers and K8s is an afterthought.
Building a Modern CI Solution
The primary motivation for Harness building its own CI solution was to create a modern tool that would improve developer build and test time. Harness had realized that the tools currently in the market, such as Jenkins, were very poor and out of date. Jenkins doesn’t offer Docker-based Architecture, Cloud and SCM Agnosticism, a graphical view, the ability for pipeline steps, a SaaS option – need I say more? With the pain points (listed above) and challenges in mind, as well as a focus on developer experience, Harness set out to create a better CI solution. The end result was our newest product: Harness CI.
We were excited to test Harness CI on our own applications to see how successful it would be for an enterprise-level software application. With 327 Jenkins jobs, we knew that the migration was going to be massive. To start, we conducted a gap analysis to identify and address some key functional gaps in CI. We found gaps in the Junit report integration, the issue comment trigger support, and the abort on re-trigger. We quickly built out these capabilities.
Our Jenkins Migration Strategy
From the gap analysis, we also identified which jobs to migrate in order to instill confidence and prove the success of our product. We decided to focus on our primary backend repo. This repository was the most complex of our Jenkins pipelines and exercised 90% of all Jenkins features. Narrowing down our focus, we were able to reduce our migration efforts to approximately 25 jobs and knew that if we were successful in migrating these, the rest would be very straightforward.
Within this repository, we started with the Release Build Pipeline. This pipeline is used to generate the release artifacts which are then deployed to QA for testing and then eventually to prod. Since they were critical to Harness operations, we made sure to have a backup to Jenkins. We started the migration with this pipeline because it was critical but also because it was run in a controlled fashion by the QE team. If we identified any issues with Harness CI, we could just trigger the same pipeline on Jenkins to generate the build. As part of the migration process we analysed the job and broke it into small functional components. For each component we formulated the most optimised way to achieve the same in Harness CI. For example, in Jenkins we were using pipeline chaining to run parallel isolated steps and in Harness CI we used parallel steps to achieve the same which gave us much better view and control over pipeline executions.
Next, we decided to move the ‘PR checks pipelines.’ This functionality is critical, since PR checks are the backbone for all code being merged into master – and therefore impact each and every one of the 100+ backend engineers at Harness. We decided to take a phased approach. For two weeks, we ran these checks in shadow mode to build confidence and iron out any issues. Once we confirmed all was well, and after those two weeks had passed, we moved the rest of the pipelines over to CI.
Throughout the migration, we found and resolved a number of issues, including a few performance bottlenecks, race conditions causing stuck builds, and issues with pod cleanup. This exercise helped us build the groundwork and paved the way to depreciate Jenkins.
After we migrated our primary backend repo, we didn’t encounter any major surprises. We set a complete migration deadline for the first week of June, and each respective team worked to retire the remaining 300+ jobs in under 3 weeks.
Post Jenkins Migration Results
100k+ Builds on Harness CI
We have successfully run on Harness CI for the past few months. Harness CI has powered over 100k builds – including 1000+ release and feature builds and 90k+ PR checks.
Reduced Developer Frustration
One of the most common complaints in 1:1s across teams was the unreliability of Jenkins. We had several outages as we initially started scaling our engineering team, which impacted developer productivity across the board.
With Harness CI, developer frustration was reduced dramatically as the pain points listed above were resolved. Abhinav Singh, a backend engineer, shared his thoughts: “Running Jenkins in Harness was becoming a nightmare as we were not able to scale it and it went down very frequently. Even looking at its thread dump and figuring out the root cause was very tough as it was a few GBs. Waiting for things to get resolved I had to waste many hours and it created a good amount of frustration. From the moment we moved to Harness CI, most of the issues were resolved and we have had almost zero downtime and scaling issues. Also, to onboard any new developer with Jenkins, it was just as big of a task as looking at log files. Navigating around Jenkins was not very intuitive and someone had to handhold a developer when onboarding. Onboarding new developers to Harness CI is easy and takes a few hours.”
Improved Productivity for the Build Team
A considerable chunk of bandwidth for the build team was allocated to maintain Jenkins. With Jenkins out of the way, they are now able to focus their time on addressing developer productivity issues.
A New Feature Jenkins or Any Other CI Product Never Had
Test Intelligence (TI) is unique to Harness CI. Our CI module ships with intelligent test selection and execution capabilities for Java unit tests. Our Test Intelligence algorithm smartly identifies the tests to execute based on the code changes, thereby drastically reducing test cycle times. To put this into perspective, look below at the metrics that we pulled straight from our environment with TI.
Test Intelligence Time Savings
To further test out our Test Intelligence functionality, we ran it against open-source projects to see how much faster the build process was. The data is shown below.
|Project Name||Execution Time without TI||Avg Execution time with TI||% Time Saved using TI||% Test Selection Invoked||Avg Execution Time with Test Selection|
|Harness||43 mins||32 mins||26%||45%||10 mins|
|Incubator Pinot||476 mins||185 mins||61%||66%||55 mins|
|Hudi||116 mins||54 mins||53%||68%||28 mins|
|RocketMQ||9.6 mins||5.8 mins||40%||47%||2.7 mins|
|Spring Cloud Alibaba||10.6 mins||6.35 mins||40%||38%||1.7 mins|
By looking at this data, we were delighted to see the time savings from using Test Intelligence and to further show the differentiating feature of our CI module.
We are currently seeing an infrastructure cost savings of 25% due to Test Intelligence. (details about this as well as other findings coming soon!)
The migration to Harness CI was a huge win for us at Harness. We were able to migrate off of Jenkins and successfully run on our own Continuous Integration tool that we created, knowing the pain points of the current solutions on the market. We don’t recommend that everyone start building their own CI solution. It’s pretty difficult to do it right and we’ve already done all the hard work for you. Instead, you can try Harness CI for yourself today. Take the first step in ending developer frustration and request a demo of Harness CI.