Harness Feature Management & Experimentation (FME) is a game-changer for AI models like HeySam, allowing precise control over how preference data is collected and used to fine-tune AI responses. By leveraging traffic types and feature flags, companies can tailor AI interactions to sound more human, making customer experiences more natural and personalized—without the complexity of traditional flagging systems.
“Please hold while I transfer you now.”
“This is Erica, your Bank of America Virtual Financial Assistant. How may I help you today?”
“The current wait time is approximately 20 minutes.”
We’ve lived with natural language assistants for decades. But not for a moment did they fool you into thinking you were talking to a human. In fact, just reading these quotes makes me wanna press 0 for a human.
But something changed with ChatGPT It feels human.
I’ve even caught myself typing “Thank you” to it. Bizarre, right? Why would I thank a program? It has no feelings.
This humanness—the way ChatGPT and modern AI agents communicate—is what makes countless AI use cases possible, from AI companions to AI employees. And it’s big business. Google paid $2.7B to acquire Character.ai, a site where users chat with fictional characters.
One technique is Direct Preference Optimization (DPO), which aligns ChatGPT’s responses to human preferences. Essentially, it works by showing two possible answers and asking humans which one sounds better. With this data, ChatGPT learns to respond more naturally—"more like this, less like that."
But DPO isn’t just for foundational models; it’s also invaluable for tuning AI agents built on top of them.
I’m the founder of HeySam, the AI Sales Engineer & Call Intelligence platform that sales teams use to augment their human Sales Engineers. Naturally, Sam needs to sound like a Sales Engineer, not a bot. So, we use DPO too. But here’s the catch—Sales Engineers at Salesforce sound different from those at Harness. Plus, we can't use one customer’s data to train Sam for another.
How to collect preference data per customer to fine-tune Sam so it speaks their language.
That’s where Harness Feature Management & Experimentation came to our rescue.
We already used Harness FME for feature flagging, paywalls, and experimentation. But one feature makes it uniquely valuable for collecting preference data for DPO: traffic types.
Traffic types let us create feature flags that turn on or off per user, per organization, or even per request. Every other feature flagging vendor forces you to create flags per user, which doesn’t work well for preference data collection.
Why?
Because we don’t want to permanently put some users into a “which answer do you prefer?” experience. Instead, we want to randomly show this experience to a small percentage of requests across all users within a customer. Like this:
We model this experience with a feature flag pseudo-code like this:
Harness Feature Management & Experimentation gives us the flexibility to precisely control when and how we collect preference data.
And as a bonus, we can create feature flags based on users or customers—all within the same account.
What do we do with this data? We use it to fine-tune our underlying LLMs, ensuring Sam responds like your company’s Sales Engineers.
So next time you find yourself thanking an AI agent, remember—behind that human touch is AI alignment, powered by tools like Harness Feature Management & Experimentation.