Integrating feature flags with Open Telemetry allows for enhanced observability by tracing the impact of flags on application behavior through span attributes, which enables querying traces based on specific flag states.
Already using Open Telemetry? If so, you might be curious about the benefits of integrating feature flags into your application. This article explains how to level up your observability following a simple, code-level integration. Follow along as this guide reviews the benefits of integrating these technologies and suggestions for next steps.
This article assumes code-level familiarity with Open Telemetry, an open source observability platform. Open Telemetry requires an implementation, and there are many to choose. For the purpose of this article and code demonstration, I chose to use Honeycomb.io and Java. Honeycomb’s Java collector uses the -javaagent bytecode instrumentation technique to instrument the application.
The sample application is written in Java and is a simple socket server. Clients connect to the server on default port 5009. The client can type a word or phrase, and the Threaded Echo Server will respond with the phrase verbatim.
while (true) {
try {
line = brinp.readLine(); // buffered reader of client socket
if ((line == null) || line.equalsIgnoreCase("QUIT")) {
socket.close();
return;
} else {
out.writeBytes(line + "\n\r");
out.flush();
}
if(flag2treatments.get("next_step").equals("on")) {
doSomeWork();
Span span2 = tracer.spanBuilder("next_step").startSpan();
Thread.sleep(2000);
span2.end();
} else if (flag2treatments.get("next_step").equals("off")) {
doSomeWork();
} else {
throw new Exception("unexpected Split treatment: " + flag2treatments.get("next_step"));
}
} catch (Exception e) {
e.printStackTrace();
return;
}
}
Listing 1: Threaded Echo Server handles client interaction.
The Threaded Echo Server sleeps for two seconds when the feature flag “next_step” is turned on. The sleep is wrapped with a “next_step”, “span2”. When the “next_step” flag is off, only the usual doSomeWork is performed.
With “next_step” switched on, we can find this trace in Honeycomb.
In Figure A, the client types four words. The next_step performance of handling each word is almost exactly two seconds, the time our code waits.
With “next_step” toggled off, this is the trace.
The feature flag had two impacts.
It introduced a new, nested span to the work of handling a client word. The span was named after the feature flag that creates it, resulting in the green bars that illustrate each word of a single session in a single trace.
It also introduced two seconds of sleep time into handling the word, making it easier to see the new span instances.
It is enough to see that flags and spans can interact, but there is an additional opportunity.
Span span = tracer.spanBuilder("echo").startSpan();
String USER_ID = "dmartin-opentel";
span.setAttribute("userid", USER_ID);
String[] featureFlags = {"next_step", "multivariant_demo", "new_onboarding"};
Map<String, String> flag2treatments = new TreeMap<String, String>();
for(String flag : featureFlags) {
String t = client.getTreatment(USER_ID, flag);
span.setAttribute("split." + flag, t);
System.out.println("set span attribute: " + "split." + flag + "," + t);
flag2treatments.put(flag, t);
}
Listing 2: Preparing the Threaded Echo Server’s top-level “echo” span with feature flag treatments
In Listing 2, three flags are expected to be in use by the program: “next_step”, “multivariant_demo”, and “new_onboarding”. Using Harness FME, all flags are evaluated up front and stored in a flag2treatments map. This means that a dynamic change to a flag treatment will be ignored for the rest of the program execution, and in another blog we could show techniques for avoiding that.
In this example, it’s fine. While the treatments are cached, they’re stored as span attributes. Why do you want to put the feature flag “impressions” into the span?
If your span has the feature flag name and treatment, you can query for traces that show (or don’t show) a particular flag. This makes it much easier to isolate your trace sessions when you’re looking to handle a problem specific to a feature flag.
Feature flags are not good candidates for bytecode instrumentation. The hard part of introducing a flag is not the SDK, but rather the thought that must go into what you want to supply when the flag is toggled on and off (or with multi-variant flags).
In one vision of the future, a span is synonymous with a flag. Flags would be reverse dependent on the span/flag that includes them. You could turn on and off whole portions of live, running application code by identifying the span or spans that require rollback. The flagging toggle interface isn’t well suited to this broad purpose though, and the complexity could be overkill, depending on how many spans you have used to instrument your app.
In the near term, consider manually wrapping your feature flag changes with a span on top of the flag itself, and give yourself Open Telemetry analytics on your flag.