Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

January 31, 2025

Using Feature Management & Experimentation to Make an AI-Model More Human

Authors:

Table of Contents

Harness Feature Management & Experimentation (FME) is a game-changer for AI models like HeySam, allowing precise control over how preference data is collected and used to fine-tune AI responses. By leveraging traffic types and feature flags, companies can tailor AI interactions to sound more human, making customer experiences more natural and personalized—without the complexity of traditional flagging systems.

“Please hold while I transfer you now.”

“This is Erica, your Bank of America Virtual Financial Assistant. How may I help you today?”

‍“The current wait time is approximately 20 minutes.”

We’ve lived with natural language assistants for decades. But not for a moment did they fool you into thinking you were talking to a human. In fact, just reading these quotes makes me wanna press 0 for a human.

But something changed with ChatGPT It feels human.

I’ve even caught myself typing “Thank you” to it. Bizarre, right? Why would I thank a program? It has no feelings.

This humanness—the way ChatGPT and modern AI agents communicate—is what makes countless AI use cases possible, from AI companions to AI employees. And it’s big business. Google paid $2.7B to acquire Character.ai, a site where users chat with fictional characters.

How does OpenAI make ChatGPT sound human?

One technique is Direct Preference Optimization (DPO), which aligns ChatGPT’s responses to human preferences. Essentially, it works by showing two possible answers and asking humans which one sounds better. With this data, ChatGPT learns to respond more naturally—"more like this, less like that."

But DPO isn’t just for foundational models; it’s also invaluable for tuning AI agents built on top of them.

I’m the founder of HeySam, the AI Sales Engineer & Call Intelligence platform that sales teams use to augment their human Sales Engineers. Naturally, Sam needs to sound like a Sales Engineer, not a bot. So, we use DPO too. But here’s the catch—Sales Engineers at Salesforce sound different from those at Harness. Plus, we can't use one customer’s data to train Sam for another.

The challenge?

How to collect preference data per customer to fine-tune Sam so it speaks their language.

That’s where Harness Feature Management & Experimentation came to our rescue.

We already used Harness FME for feature flagging, paywalls, and experimentation. But one feature makes it uniquely valuable for collecting preference data for DPO: traffic types.

Traffic types let us create feature flags that turn on or off per user, per organization, or even per request. Every other feature flagging vendor forces you to create flags per user, which doesn’t work well for preference data collection.

Why?

Because we don’t want to permanently put some users into a “which answer do you prefer?” experience. Instead, we want to randomly show this experience to a small percentage of requests across all users within a customer. Like this:

We model this experience with a feature flag pseudo-code like this:

// Block customers who opted out of fine-tuning
if request.org is in list ["acme", "beta"] then split 100%:off

// Exclude specific question types
else if request.question_type is in list ["feature_request", "pricing"] then split 100%:off

// Randomly show the preference experience to 1% of high-confidence answers
else if request.confidence == 'HIGH' then split 1%:on,99%:off

// Default to off
else split 100%:off

‍

Harness Feature Management & Experimentation gives us the flexibility to precisely control when and how we collect preference data.

And as a bonus, we can create feature flags based on users or customers—all within the same account.

What do we do with this data? We use it to fine-tune our underlying LLMs, ensuring Sam responds like your company’s Sales Engineers.

So next time you find yourself thanking an AI agent, remember—behind that human touch is AI alignment, powered by tools like Harness Feature Management & Experimentation.

Similar Blogs

CI/CD

Using Feature Management & Experimentation to Make an AI-Model More Human

How does OpenAI make ChatGPT sound human?

The challenge?

Similar Blogs

Large Segments, Targeted Impact: Controling Feature Rollouts at Scale

Elixir SDK Is Available for Feature Management & Experimentation

4 Myths, 4 Realities About Experimentation

Step-by-Step Guide to Regrettably Maintaining a DIY Feature Flag Tool

Using Feature Management & Experimentation to Make an AI-Model More Human

Similar Blogs

Large Segments, Targeted Impact: Controling Feature Rollouts at Scale

Elixir SDK Is Available for Feature Management & Experimentation

4 Myths, 4 Realities About Experimentation

Step-by-Step Guide to Regrettably Maintaining a DIY Feature Flag Tool

the State of

Software Delivery2025

Software
Delivery
2025