Two of the scariest things in the world for anyone operational or security-minded are:
- Code that has to run inside your codebase as a dependency.
- Anything that your production user experience has an uptime reliance on.
It’s no surprise that feature flags, both as a concept and a specific solution we sell at Harness, can draw some extra scrutiny. They may appear to present risk on both of these vectors. But in this post, we’ll take a look at how feature flags aren’t a security or performance risk – and how they actually improve your security and performance posture as an organization.
One note before we proceed. In this article, I will reference aspects of Harness Feature Flags as a product more specifically than we often do on these blogs. This is not to exclusively promote our product, but rather to be able to provide clear and specific explanations. You will find that most feature flag products on the market do a respectable job of most of these same points, as the overall category of feature flags has evolved responsibly and thoughtfully over time.
When using Feature Flags, you will end up targeting your users, regions, or tenants based on criteria exposed to the service via the SDKs (more on those in a bit). A risk here is that some of the information that could potentially be used may in fact be sensitive, or a violation of PII policies.
Because of this risk, we have drawn a line on not collecting any data proactively. While there may be some simplification value in looking for common user or framework objects and automatically passing them to Harness, we do not want to collect data without your explicit choice to send it to us.
So, when configuring the data you will use for flag targeting, all data must be manually coded to be sent to us on your side. If you don’t send it, we won’t have it.
The SDKs are the crux of how any feature flag solution works. You install the SDKs into your code, they communicate with the upstream service, send user or server data for targeting, and receive rule configurations to determine what states of features are served.
Because these SDKs run in your code, we (and everyone else who builds them in our industry) treat them with great care. As we already covered, our SDKs do not proactively collect or communicate any data. Additionally, they are all open source so that they can be fully audited at any time.
Every feature flag vendor has slightly different ways their SDKs phone home, but all will follow one or two similar patterns. At Harness, our SDKs support both streaming and polling mode. Streaming mode is where you receive server-sent events proactively, from Harness, to provide real-time updates. Polling mode is favored when you want less connectivity, and instead the SDKs will fetch updates periodically – they don’t require ongoing connection. Both are commonly used.
We often get asked: How do we make sure feature flags aren’t an upstream dependency for your application? What if Harness is down? What if Google Cloud or AWS is down? What if the communication is experiencing very high latency for a particular user session? It’s important to know that feature flags will never result in your users seeing a broken experience.
Feature flag tools are generally architected with all of these concerns in mind. At Harness, we have 3 levels of resiliency that you can rely on:
- The SDKs themselves will cache evaluations (and cache rules for the server sdks), so they will always have a first-level fallback if there is a connectivity issue.
- When implementing your flag in the code, you will always have a hardcoded default that will be used if there is no cache AND no connectivity.
- We also provide a relay proxy that you can run to sit between Harness and your applications. It provides a full cache, limits necessary connectivity to Harness (only the proxy needs to talk to Harness), and can even be side-loaded.
When you combine these things, there is never a situation where your application will fail to resolve because of an impact to the feature flag service, as well as never a time where you cannot change a mission-critical flag, since the proxy can be side-loaded if upstream Harness cannot be reached.
Helping, Not Hurting
As you’ve seen, feature flag solutions – including ours – are designed with awareness of the risk they could inadvertently carry. Between OSS SDKs, caches, proxies, in-code defaults, and more, these risks are carefully addressed so that you can flag in confidence.
Given that, it’s worth considering the ways that feature flags actually become a critical part of your security and reliability posture, rather than a risk to it.
- Instant kill switches in production for any changes, without requiring a rollback.
- Audit log of all changes.
- Robust governance and permission controls around who can make changes and how.
- No more devs or ops people running playbooks directly on the resources! Use flags to make critical processes visible and repeatable.
There’s a lot more to know about the specifics of how feature flags are built for security and performance. This is only meant to provide an overview. But, hopefully, you can see the level of care taken to guarantee that feature flags make your application more secure and more performant, with the possibility of introducing risk designed out of the service.