About a decade ago, “testing in production” was a not-uncommon joke among software engineers. It was a shorthand way of describing something you definitely would never do... until eventually, some tools started encouraging you to do exactly that. You could run a Google search for testing in production, and you’ll quickly get your fill of “testing in production” memes.
Testing in production, in short, means using live data and live users to figure something out. It can be performance-related, user-behavior related, or anything else that you want to see, learn from, and react to based on what’s really happening when your apps serve live users.
JJokes aside, testing in production is something you should absolutely think about. Test strategies have evolved over time, and while pre-prod tests remain crucial, the importance of testing under actual usage conditions can't be overstated. Consider this: despite doing every manner of pre-production testing, including exhaustive functional testing, you often find issues that weren’t caught in any test environment or staging environment - test data isn’t perfect. Performance testing and load testing only get you so far without production data. The user experience can differ vastly from one environment to another, making it essential to test in a real-world scenario. Integration tests, unit tests, regression tests, and any other software tests you run often miss out on some case that isn’t realized until real production data and production traffic run through the system. You’re running tests, but without running the real tests - this creates rework and slows down future feature release velocity.
Writing tests, good QA, and having PMs and other folks review changes prior to deployment - this all goes a long way in getting things right. But, there are some things that are elusive to try and learn with mock data, or at different scale, such as:
You can try to learn some of this on your way to prod, but the most effective way to answer these questions, both technical and user, is taking things for a spin with actual users and actual traffic; you need real production data and real production traffic to be as effective as possible.
Before we dive any further into testing in production, let’s be clear on what testing in production is NOT. This is especially important to clarify when we consider what lives in a production environment and the impact of poor production testing practices.
Testing in production is not constantly doing deployments and rollbacks. And, it is not being cavalier about what you release. When testing in production, you are not using your live production data to make sure things work. That’s still what software testing and QA are for across test environments, staging environments, and any other place you test your code before your production environment.
In your actual production environment, production testing is all about learning, iterating, and helping inform your decisions with the best and most accurate data possible - after code has already gone through the standard deployment process. By serving production traffic and real users, you collect user data, test software variations, and get monitoring signals that you can use to understand whether what you deliver is what you were trying to achieve.
Feature flags are critical to effectively testing in production. We say this because feature flags allow you to turn things on and off instantly against any criteria you want without the complexity of a rollback or redeployment. And to make sure we beat the dead horse, yes, this is all in a production environment. Let’s take a look at what required capabilities feature flags support in a production system that make them so critical:
If you’re not familiar with feature flags, we’ve got a great blog on How To Get Started With Feature Flags, as well as another blog on 5 Feature Flag Use Cases You May Not Have Thought Of.
Without feature flags, you are limited to blue-green deployments, canaries, and other very valuable but slower (and more expensive!) forms of production testing. Feature flags bring the cost of production testing down to almost nothing. They make testing in production simple and easy to coordinate.
Below are some examples of tests in production you might want to run. Typically, these tests relate to tangible business outcomes such as customer satisfaction, or revenue gains:
By definition, testing in production can - and should - cover a lot of territory. You can test load, behavior, resiliency, and more. Therefore, there’s no one-size-fits-all answer. It all comes down to what you want to do.
Testing in production, when done properly by using feature flags, is by nature already a way to mitigate risks. The concept of testing on live data in a production environment is scary, and it’s the use of feature flags that are the most effective risk mitigation tactic! If you implement a good feature flagging solution, then your risk factors are no longer in the technology and its capabilities or limitations, but in the organizational process.
There are a few things to keep in mind when thinking about feature flags and testing in production for the first time that will help you mitigate risk in the long term:
Harness Feature Flags is a complete feature flag management tool that allows users to create and manage flags both in code and through a UI. Harness uses a flexible targeting model that lets you apply your flags any way you want - against users, regions, clusters, accounts on certain billing levels, or anything else you can think of. Take advantage of this to test from a wide variety of angles simultaneously.
In addition to the ability to rig tests, there’s a layer of automation that can also be instrumented. This can especially be helpful if there are standardized tests in production that you run, or you know exactly what you want to happen when certain behavior occurs. For example, you could run three different versions of a new UX, and based on which causes users to stay on the app for the longest time, you want to automatically roll out that feature to the whole user base and remove the other two. That delivery, verification, and trigger-based rollout can all be automated using Harness.
Testing in production has gone from a quip among software engineers to a reality with the spread of modern Continuous Delivery and feature flags.
Feature flags specifically help increase velocity, valuable feedback, and responsiveness while lowering risk and cost. This is critical, because the best tests to learn from are the ones that most match reality. And, reality means production.
While it can initially be difficult to take the leap into production testing with feature flags, there are some questions we can use to get started. It becomes a piece of cake to iterate and expand as your team gets more comfortable.
Have you read our eBook on Feature Flags yet? It’s free and doesn’t require an email address! If you’re looking to learn more, it’s a great resource. Download The Basics: Feature Flags 101 today! We'd also love to point you to our piece on the Best Feature Flag Tools so you can find a solution that works for you.