For Product Managers

How Great Games Actually Get Made — A Raw Fury Producer's Playbook

Name: Testify
Price range: $

9 min read read

Adrian Campbell

CTO

UX Playbooks

A player finishes your build, fills in the survey, and gives it a 4 out of 5. They write that they really liked it. Then you watch the footage back: the sighing, the tutting, the long quiet stretch where someone has clearly stopped enjoying themselves. Two records of the same session, and they disagree. Which one do you believe?

Paul has spent the best part of twenty years around games, from Blizzard and ZeniMax to nearly six years at indie publisher Raw Fury, where his focus now is player experience. We sat down with him for a fireside chat about what actually makes a great producer. Almost none of his answer was about process or tooling. It was about reading the space between what players say and what they do, and protecting the people making the game while you do it. This playbook turns that conversation into a method you can use around your next test.

Two halves of the same story

Survey feedback can be a player's account of how they think the session went. It is filtered through memory, mood, and how the questions were worded. Useful, but partial.

The footage is the other half. It is the record of what actually happened: where someone's eyes and hands stopped agreeing, the moment they went quiet, the point where they stopped playing the game and started trying to work out why nothing was happening. Paul puts the rule plainly:

You have to see people playing your game.

Paul Wilson, Raw Fury, Producer

You can never just take an opinion from someone who has finished and said it was great, and call that your finding.

Why does this matter to a producer specifically? Because the developer is the one making the game, and they will act on whatever feedback reaches them. If the only thing that reaches them is a simplified response or playtest score, they will take that steer. Your job is to make sure the feedback arrives with its context attached, and that the context is read correctly. As Paul describes it, a player is "not telling you always what they're missing. Sometimes they're just telling you what they're feeling." Reading the gap between the feeling and the cause is the work.

What this playbook helps you do

By the end of a playtest cycle run this way, you should be able to:

Walk a developer through results without spooking them off the project.
Tell the difference between a player who is challenged and a player who is leaving.
Turn a hundred-plus surveys and hours of footage into a short list of things worth doing.
Point to the exact moment in a recording where an experience problem starts.
Make a change, retest the same area, and confirm the problem is gone across players.

None of this depends on a particular genre or team size. It depends on watching, distilling, and protecting trust while you do it.

Build the trusting relationship before the first test

Before any of the testing mechanics matter, there is the relationship. Paul is unambiguous about the order:

Building strong, trusting relationships with your devs will always be the first thing that you should do. Everything else you can learn and will change.

Paul Wilson, Raw Fury, Producer

The reason is information. A developer who trusts you will tell you the truth about how the project is really going: their work rate, their fears, where their confidence is, what is quietly slipping. Without that, you only learn a team is struggling when they are forced to tell you, usually too late to do anything useful with marketing, partners, or the schedule. With it, you find out with eight months left, while there is still room to move.

Getting there is less mysterious than it sounds. Learn their game thoroughly, play it back to back, and tell them what you actually think you are experiencing. A developer wants to hear that, because they have been spinning in their own head about how things will land. You become an early mirror for their game. It also means meeting people where they are, literally. Paul's publishing team is spread across time zones and cultures, and he treats that as a producer's problem to solve: take the late call, notice that the person you are talking to has been up since 3am, and adjust. The relationship is built in those small accommodations long before a test report ever lands.

Set an expected playtest score before players see the build

We setup our feedback to gather an overarching score from players to help provide an abstract number to guide future discussions. Here is the habit most teams skip, and the one that prevents the most damage. Before the test runs, agree with the developer what you expect the result to be.

Paul's example:

We only expect this to score 60 out of 100, maybe even less. And we think that's good.

Paul Wilson, Raw Fury, Producer

Said in advance, a 62 lands as confirmation, not catastrophe. The developer is not hurt by the number, and just as importantly they can see that you, as the publishing partner, are not about to pull the brakes on the whole project over an early score. You decided together what a healthy result looks like at this stage, so nobody is shocked when it arrives.

This works because development should be a growth curve. A game halfway through production scoring 85 or 90 is the strange result, not a low one. As Paul notes, if it is already that high you have to ask what is left to do. Set the expectation, and a low early score becomes a place to refocus the next milestone rather than a reason to panic. Skip it, and the same number reads as a crisis. The phrase to keep in mind: don't spook the game team.

Watch people play, don't just read the score and survey feedback

This is the core of the method. There is a class of problem you can only catch over someone's shoulder.

Paul describes it as a UI and UX disassociation: "the gameplay that's happening mechanically, with their eyes and with their hands, that are not lining up, and then causing the player to have to think at a time when they probably shouldn't have to." A player rarely writes "my eyes and hands weren't lining up" in a survey. They might not even be aware of it. But it is right there in the footage, in the half-second hesitation before they find the button, in the small course-corrections that say the interface is fighting them.

A survey cannot surface that, and Paul is careful not to dismiss surveys to make the point. Good surveys are a craft of their own, with real science around leading questions and seeded answers. Each method shows you something the other cannot, and the producer who reads only one is working with half the picture.

Read the gap between the footage and the survey

Once you are watching and reading the surveys, you will hit the contradictions. Paul has seen both directions of it. Someone sighs and tuts their way through a session, sounds thoroughly unimpressed, then scores it 4 out of 5 and writes that they liked it. And the reverse: a player who sounds like they are having a great time, then dunks on the game in the survey.

These gaps are not noise to be averaged away. They are the work.

It is up to us to reverse engineer from that end emotional state, all the way back through to our feature sets, and the UI and UX.

Paul Wilson, Raw Fury, Producer

The player gives you a feeling. You trace it back to a cause.

The discipline that makes this reliable is volume. One frustrated player is an anecdote. When you have twenty players and eleven of them stumble in the same window, in roughly the same place, that is a signal your design team can act on. None of them told you the solution, and that is fine. They all told you where the problem surfaces, which is the part only players can give you.

Distil the wall of data into a handful of findings

A real test produces an intimidating amount of material. Paul's arithmetic: 40 players, a survey per session across six sessions, and you are suddenly holding something like 120 surveys plus the footage. To a small team, that does not read as insight. It reads as work. Tasks. Difficulty. A wall of red lights, heavy enough that a solo developer starts disagreeing with the feedback simply because it is too much to hold.

So the producer's job is to compress it without losing context. Take the hundred-plus surveys down to a handful of actionable areas: the things players genuinely enjoyed, and the clear opportunities. Then distil further. Paul often runs two tests around a milestone, one before and one a few weeks after, and overlaps them to see what changed. He will also deliberately test two different groups: a friendly community that is soft on the game, and anonymous playtesters who have no loyalty and treat it purely as a system to master. The truth usually lives in the marriage of the two.

What you hand the developer is the distillation, walked through, not the raw dump. You still give them full access to every survey and recording, but you go first as a barrier between them and the cold data. As Paul says, anybody can throw test results at someone. A producer has a responsibility for the impact of the information they hand over, and at what stage they hand it over.

Separate challenge from frustration

Not every negative reaction is a problem to fix, and confusing the two will send a team chasing the wrong things. Paul draws the line precisely.

Challenge is fine. Players can be challenged, and they can even be angry at an outcome, because anger usually means "I lost the exchange." That is the game working. Frustration is the dangerous one:

I'm trying to do something, it is not working, I don't know why it's not working, I'm lacking information, there's things missing here.

Paul Wilson, Raw Fury, Producer

The moment a player crosses from challenge into frustration, you are losing them. And a survey score will happily paper over that crossing.

When you find it, make it undeniable. Paul's team direct-links straight to the exact moment in a recording, "so that there's no doubt about what the player experienced," even when the player in question was a generous 4-out-of-5 reviewer who would never have flagged it themselves. You are not arguing with the developer's taste. You are showing them a timeline where a supportive player hit a wall, so the conversation is about the footage, not opinions.

A faster way to scan many sessions

Nobody has 120 hours a week to watch every recording end to end, and neither does the developer. This is where a sentiment timeline earns its place or third party tools can add real value.

Beneath each Go Testify recording sits a tracker of red and green moments across the session: a heat map of where players felt positive and where things turned. Paul uses it to work fast. He can look across six sessions, spot that they all go red in the same spot, jump to that point in the video, and confirm the same thing is happening over and over. The automated summaries are also super useful to help guide, but he also likes confirming the voice in his head and previous assumptions. It is the bridge between "we have a wall of footage" and "here is the exact thing to fix," and it scales the over-the-shoulder watch from one player to a whole panel.

Do and don't

Do	Don't
Build the developer relationship before you need it.	Lead with data before there is any trust.
Agree an expected score with the team before the test.	Drop a game playtest score on a developer cold.
Watch the footage for what the survey can't show.	Treat the survey feedback as the whole result.
Trace a player's feeling back to a cause in the build.	Take "it was great" or "it was bad" at face value.
Confirm a problem across many players before acting.	Rebuild around a single frustrated session.
Treat challenge as healthy and frustration as the alarm.	Fix everything a player struggled with.

A pre-test checklist for producers

Before your next playtest goes out, run through this:

You have a real, trusting line of communication with the developer.
You and the team have agreed what score and outcome you expect at this stage.
The test aim is specific enough that the footage will answer a real question.
You have a plan to watch sessions, not just collect surveys.
You know which two player groups you are comparing (for example community vs anonymous).
You have a way to scan many sessions quickly and jump to the moments that matter.
You have decided how you will walk the developer through results, not just send them.

The producer's real job

Strip away the systems and the tooling and Paul keeps returning to one idea: stay fixed to the truth of the project. What is the game you are making, why are you making it, and what are players actually doing when they play it. Surveys and footage are two readings of that same truth, and the producer is the person who holds them against each other and decides what is real.

That is the part that does not change with technology or process. The trust comes first, so the developer tells you what is really happening. The expectations come next, so a low early score doesn't frighten anyone. After that it is mostly watching, closely enough to tell a player who is challenged from one who is quietly walking away. Get those right and a wall of feedback turns into a short list of decisions your team can actually make.

Watch the full fireside chat with Paul on YouTube.