A/B testing with feature flags can serve engineering verification or behavioral analytics. While feature flags optimize deployment and quality, A/B tests focus on user experience and business goals. Use specialized tools for each to maximize effectiveness.
We talk to teams daily that have A/B testing on the list of their needs from a feature flag tool. However, what A/B testing is and where you may want to get it from usually becomes a more nuanced conversation than just “included” or “not included.”
In this post, we want to look at what people mean when they talk about A/B testing and share how we think about it in relation to feature flags.
There are two pretty different things people can mean when they talk about A/B testing in the context of feature flags:
For the rest of this post, we’ll focus on the confusion around the second point. In a future post, we’ll talk more about the engineering side of A/B testing and the process of verifying the impact of your code changes on your overall system’s performance and cost.
Let’s take this as an assumption - you want to get the absolute most value possible out of both your feature flags and your A/B tests.
To do that, it helps to understand that they were serving two different purposes, and often for two different people in the organization.
A/B tests are implemented pretty similar to a feature flag - it’s essentially just a diff in the code serving one path vs. another conditionally - but the concerns before and after the implementation for the two are pretty different.
With feature flags, you want governance, developer experience, release automation, reporting, and lifecycle management. With behavioral analytics A/B testing, you want data science and growth/revenue-based correlations. This often involves delving into topics like true positive rates and false discovery rates and understanding the difference between type two error scenarios and accurate results.
From a user experience, it’s very unlikely that a tool that provides a great experience for engineering is also providing a great experience for marketing. These are significantly different audiences with different lifecycle concerns and different optimizations needed.
However, because the implementation is so similar, you do see on the market companies that bolt them together. This results in great A/B testing companies with very limited feature flag offerings or feature flag companies with very minimal, hard-to-use A/B testing offerings. We don’t like either approach.
On top of the overall differences between feature flags as an engineering process and A/B testing as a growth and revenue process, we also find that increasingly, most tools have world-class analytics tools in place anyway and don’t need more data siloing and more tools to log into.
So, here’s how we see it - you should use the best data analytics tools in the world for A/B testing, and you should be able to implement those tests via Feature Flags that are focused on absolutely maximizing the value of feature flags in your software delivery process.
Increasingly, best-of-breed analytics tools with A/B testing - such as Amplitude Experiments, Statsig, and Growthbook - allow for Segment implementations of the data payload needed to run their experiences. We are connecting our Feature Flags to Segment to automate the process of using Harness Feature Flags with any of the best A/B testing analytics vendors on the market, allowing you to keep your data all in one place and letting you use the best tool for each job. Even until it’s fully automated, adding a simple segment call to your flags is simple and adds minimal overhead for the value it unlocks.
At Harness, we are laser focused on solving the problems associated with software delivery. It can be tempting to drift into adjacent areas, but when the problems are far apart, you deliver weak offerings that don’t satisfy the end users – and that’s just not our approach.