Recently, we released a blog post on why we migrated from a homegrown feature flag solution to Harness Feature Flags. While that post covered many of the reasons we migrated, we want to share the technical journey, including the roadblocks we ran into and what the transition was like.
One of the main driving factors of developing Harness Feature Flags was to solve problems for our internal engineering team. Our current feature flags mechanism was very simplistic in its design and functionality.
For the remainder of this blog, I’ll refer to our homegrown solution as Local FeatureFlags and the updated mechanism will be Harness Feature Flags.
One of the goals we wanted to meet when we migrated over to Harness Feature Flags was to make feature flagging at Harness more powerful—but at the same time, maintain the same level of simplicity for the developers.
State of the World With Local FeatureFlags
It’s important to discuss the history of feature flags at Harness. I’m sure that this story resonates with many organizations.
Feature flags were developed as a side project by an engineer about three years ago on a weekend as a temporary service to get over some immediate requirements. Before long, it ballooned to a critical service and was used by almost all the applications.
We also expanded the user base from just developers to Product Managers, Customer Success, and QA. We had to invest about three months for a software engineer to build a frontend service just to manage feature flags. Even with this investment, the tool was capable of only simple feature flag management.
Local FeatureFlags supported only simple boolean feature flags. Flags could be enabled/disabled globally or per customer. When a developer wanted to add a new feature flag, they would add an entry into this enum class. On application startup, the FeatureFlagService would sync all new flags to the local database.
Any updates to feature flags would then be done via an API to add customer-specific overrides or enable/disable the flag on a global level.
The FeatureFlagService would refresh its local internal cache every few minutes, and the new updated value of the feature flag would be reflected in the application.
A few of observations:
- It was simple. Feature flags were part of the codebase, and this was intuitive for developers. Adding a new feature flag was as simple as adding an entry in this enum class.
- Every environment needed to be managed independently. We could not manage feature flags for all environments together.
- This service had limited capabilities. We could not add more verbose rules to manage feature flags.
The Migration Plan for Our Technical Journey
Rolling out changes to feature flag management, though it seems simple, is still a very critical change. It’s almost analogous to changing an infrastructure layer in your entire platform. There’s always going to be resistance to such a change—and for good reasons:
- Why change something if it’s working just fine? The old adage of, “If it ain’t broke, don’t fix it!”
- Convince SREs/Ops on whether we know it will not cause a disruption.
- Convince all developers to adopt this new mechanism.
To address these concerns, we decided to follow a three-step process. This phased approach allowed us to mitigate the risks associated with making this change throughout the platform.
Migrate Local FeatureFlags to Harness Feature Flags:
We created a project in Harness Feature Flags and also prepared all the environments that we use for both QA, Staging and Production.
We mapped a mechanism for migrating Local FeatureFlags into Harness Feature Flags using the following steps.
Firstly, we included a Java SDK for Admin APIs (currently in private beta) in our main application.
The Admin SDK allows us to manage feature flags with the Harness Feature Flags server. We mapped the Local FeatureFlags mechanism to an equivalent representation in Harness Feature Flags. We wrote a simple migration service in our main application that would run the following mechanism for every feature flag on startup.
This allowed us to migrate all existing feature flags into Harness.
Running Harness Feature Flags in Audit Mode
After we successfully imported all feature flags into Harness, we were ready to start using this for our feature flag management; however, as with any new software, we had to prove to our engineering leadership that we were ready for prime time.
For this purpose, we decided to first test out Harness Feature Flags in audit mode.
We ran Harness Feature Flags side-by-side with the Local FeatureFlags system where Harness Feature Flags would only log the final value.
If there was a mismatch in the values between the Local and the Harness Feature Flags system, we would log that as an error.
This allowed us to validate the SDK for:
1) SDK precision. Any mismatches would be due to either a server-side issue or an SDK bug.
2) Performance. We were running >6 environments evaluating >200 feature flags and tracking around 4000 customer accounts. Running this system in audit mode gave us confidence that the SDK was not slowing our target application.
We ran the system in audit mode for more than 4 weeks. The data gave us the confidence for moving to the next and final step. In the meantime, in preparation for the next steps, we added a configuration in our codebase to switch between using Harness Feature Flags and using the Local FeatureFlags.
We started using Harness Feature Flags in QA. We turned it on for all QA environments and let the QA team manage the feature flags from our admin console. At this point, we had deprecated usage of Local FeatureFlags and we had completely switched over to Harness Feature Flags in all QA environments. We monitored the usage in QA for another week.
Automated Feature Flag Creation Mechanism
To maintain the same simplicity as the Local FeatureFlags mechanism, we added a small enhancement in our code to automatically sync new feature flags to our Harness Feature Flags server. This enhancement leveraged the already integrated admin SDK and the migration service written earlier to sync new flags to the Harness Feature Flags server.
This synchronization is enabled only in our QA environment, ensuring that feature flags are available and ready for management before they open up in production.
The overall flowchart for this sync mechanism is captured below:
We decided to bite the bullet and switch over to Harness Feature Flags on a weekend. We turned on the global configuration to start using Harness Feature Flags. Steps 2 and 3 gave us enough confidence that we did not miss anything, and that we would have no impact on the production rollout.
Harness Feature Flags Today
Today, Harness Feature Flags manages 200+ feature flags across 6 environments and more than 4000 customers.
We have successfully migrated to using Harness Feature Flags for all products in Harness today, and at the same time, maintain a consistent and simple developer experience.
Currently, we are actively monitoring the usage of Harness Feature Flags by all involved parties (Ops, Product Managers, QA and Developers) and continue to work on improving the feature flag management experience for everyone.
Here are my key takeaways for making such a disruptive change:
- Always plan to make changes in phases.
- Gather enough data to make an objective decision.
- Follow through with the plan once you have started the process and have enough metrics to support the decision.
- Communication is key. Ensure that all stakeholders are in the loop and their concerns are taken into account.
Thank you all for taking this journey with us! Harness Feature Flags is already improving the lives of many engineers, and we’d like to improve yours too. Level up your feature flag game – book a demo today.