We tested our product with 40 synthetic users before talking to a single real one. Here's what that changed

Every UX course, every research handbook, every senior designer who’s ever mentored someone has delivered the same line: talk to real users before you build anything.

They’re right. I’m not going to argue with that.

But here’s what none of those courses cover — what do you do when you’re six weeks from launch, your team is three people, you have a product to validate right now, and traditional user recruitment takes longer than your entire pre-launch window?

We built Articos, an AI-powered user research platform, and in the weeks before our public launch we ran 40 synthetic research sessions before a single real user ever sat down with us. Not because we thought it was the “correct” approach. Because it was the only one that fit our timeline.

What came back changed more than I expected. Some of it held up when we eventually talked to real users. Some of it didn’t. And the gap between those two outcomes taught me something more useful than either session type could have done alone.

The catch-22 nobody in UX talks about

There’s a version of user research that looks like this: you write a recruitment brief, post it to a panel, screen respondents, schedule six sessions across two weeks, run the interviews, synthesise the notes, and land on insights that actually shape your product.

That process is good. It works. And for most teams building most things, it’s completely inaccessible.

Not because they don’t value research. The founders and PMs and agency leads I talk to almost universally believe in it. They skip it because the economics are broken for where they are. A recruitment cycle runs 2–6 weeks minimum. A usability research agency quotes $5K–$15K per study. And by the time insights land, the decision they were supposed to inform has already been made — by whoever shouted loudest in the last team meeting.

So teams default to the next best thing: gut feel, internal opinion, and the confident founder who’s “been in the industry 12 years and knows what users want.” That’s not a failure of intention. It’s what happens when the only available tool is too expensive and too slow to be useful.

That was exactly our situation. Six weeks out from launch, we had real design decisions sitting open with no good way to validate them quickly. So we tried a different order.

What 40 synthetic sessions actually looked like

I want to be specific here, because “synthetic users” is one of those phrases that sounds either magical or meaningless depending on who you ask.

In practice: we defined a set of decisions we needed to make before launch. These weren’t abstract — they were things like “which of these two onboarding flows gets a user to their first research session faster” and “does this homepage messaging actually communicate what we do, or does it sound like every other AI tool.” Concrete, bounded questions with real consequences if we got them wrong.

We ran 40 sessions across those questions. AI-generated personas — built from our target ICP profiles, demographic and behavioral parameters — worked through the flows and answered the questions we’d have asked real users. The output was structured: what worked, what didn’t, where people dropped, what they found confusing.

Here’s what came back that surprised us.

The onboarding drop point wasn’t where we thought. We’d assumed users would disengage during the persona-building step — it’s the most novel part of the product, and we’d already watched a few beta users slow down there. The synthetic sessions said no. The friction was earlier, on the initial “describe your research goal” screen. The blank text field with no prompt was creating a decision paralysis moment. Users didn’t know how specific or vague to be. We’d been so close to it for so long that we’d stopped seeing it.

Our main headline made no sense to people who hadn’t already bought into synthetic research. We’d written “Research in 30 minutes, not weeks” and thought it was clear. The sessions kept surfacing a different reading: 30 minutes to do what, exactly? The mechanism was missing. People understood speed. They didn’t understand what they were getting at the end of it. We rewrote the subheading three times as a direct result.

Two UX patterns we thought were obvious turned out to be ambiguous. One involved how we displayed confidence scores on research outputs. The other was a navigation pattern we’d borrowed from a tool we liked. Both were flagged consistently across sessions. We changed one before launch and it took four hours to fix. The other we shipped anyway — more on that in a moment.

These weren’t small observations. They changed what we built and how we explained it. And it happened in days, not weeks.

Then we talked to real users. Here’s where it held up — and where it didn’t

Three weeks post-launch, we started running real user sessions. Moderated interviews. Actual humans. The version we should have done from the start, if time had permitted.

Two of the three findings from the synthetic sessions held up completely.

The onboarding friction point was real. Real users slowed down at exactly the same moment. We’d already fixed it before launch based on the synthetic finding, and the post-launch data backed that up — our activation rate improved meaningfully once users had a prompt guiding their first input. The synthetic sessions had identified the right problem.

The headline issue held too. Real users asked the same question: what exactly do I get at the end? The rewrites we’d done hadn’t fully resolved it — we needed one more pass — but the synthetic sessions had pointed at the correct thing.

The confidence score display, though? That’s where things diverged.

The synthetic sessions flagged it as confusing. Real users either didn’t notice it, or found a workaround instinctively. A couple of them actually said they found it useful in a way they couldn’t quite articulate. The navigation pattern we’d decided not to change? Same story — flagged by AI, largely ignored by humans.

What I think happened: synthetic sessions were good at identifying structural friction — the places where a flow breaks down logically, or where language is ambiguous. They were less reliable on the things that only emerge in the messy reality of actual use. The small adaptations humans make. The moments where something is technically confusing but people work around it anyway. The emotional texture of finding something useful even when it’s imperfect.

That’s a meaningful distinction.

When to go synthetic-first — and when not to

I’m not going to pretend this is a universal framework. But based on what we learned, here’s roughly how I think about it now.

Synthetic-first is worth it when:

  • You’re validating structure, not emotion. Flow logic, information hierarchy, whether a headline communicates its intent — these are structural problems. Synthetic sessions surface them reliably.
  • You’re time-constrained and the alternative is making the decision blind. Imperfect signal beats no signal.
  • You’re trying to eliminate obvious problems before investing in real user sessions. Think of it as a filter, not a replacement. You’re reducing the number of issues a real user has to wade through.
  • You’re pre-revenue or pre-launch with a small team. The economics make sense in a way they don’t for teams with research resources.

Real users first is non-negotiable when:

  • You’re building for communities with specific lived experience you can’t accurately model. Accessibility. Culture. Specific professional contexts. Synthetic personas have limits and they’re most visible here.
  • Emotional response matters as much as logical response. How someone feels about a pricing decision. Whether they trust a new kind of tool.
  • You’re past the structural problems and into the subtle ones. Once the obvious friction is gone, the remaining issues are usually human and contextual. Synthetic sessions won’t find them.

The sequence that works for us now: synthetic to pressure-test structure and messaging early, real users to validate what’s left once the obvious issues are cleared. Real user time is more valuable when you’ve already removed the easy problems.

The uncomfortable part

The UX field’s insistence on talking to real users first isn’t wrong. It’s just built for a world where teams have the time and budget to do it properly — and a lot of the teams who most need research don’t live in that world.

The honest alternative to synthetic research, for most early-stage teams, isn’t rigorous user recruitment. It’s shipping based on the founder’s instincts and hoping for the best.

I’d rather have imperfect signal than confident assumptions.

The 40 sessions we ran weren’t a replacement for real users. They were the thing that made our eventual real user sessions much more useful — because we showed up with better questions, a tighter product, and a clearer sense of what we still didn’t know.

If you’re sitting on a product decision right now with no way to validate it properly, the choice probably isn’t “synthetic research or real user research.” It’s “synthetic research or nothing.” That changes the calculation.