How (Not) To Handle Player Feedback
A personal frustration with the games industry is how often we fail to derive useful lessons from past failures.
I don’t know the best practices for collecting and responding to player feedback, but I’ve seen certain strategies fail loudly and often enough to think twice about them.
This post is a catalogue of those strategies. In some cases these failures are attitudinal — a developer argues with feedback instead of responding to it, for example. That’s a tricky company culture issue, difficult to solve. But other failures are structural — a developer inadvertently authors a process that dictates poor outcomes.
This post is concerned with feedback from closed betas, alpha tests, demos and the like. More intimate testing, like in-person playtest sessions or focus tests, is a well-documented topic; there are plenty of good resources on running playtest sessions.
There aren’t many good resources on running effective broad feedback sessions — this post included! But what I can provide is a catalogue of potential mistakes to avoid.
Listening to the Wrong Players
Being open and responsive to feedback, collecting and categorizing it well, and doing everything else right, matters little when the feedback comes from the wrong people — a garbage in, garbage out problem. The first half of this post delves into who the “wrong people” are, and the various traps developers can fall into when engineering playtest populations.
Self-Selecting Superfans
I first wrote about Overland in 2019, which I reprinted below in 2022.
In that article I detailed issues with the game, but left out why those issues existed, something I’ll dive into now.
Overland released in 2019 but was available as early as 2016 for “First Access” buyers. Importantly the “First Access” cost $20 — as much or more than the game was expected to sell for at full release. Players could pay a premium to play early.
Here’s what lead developer Adam Saltsman wrote in 2016:
We did two big Overland experiments and were really happy with the results from both of them. The first thing we did is start selling access to the playable alpha in limited quantities through our amazing partners at Itch.io. We even collaborated with them in developing the new Itch.io Refinery toolset. Currently Overland has over 2000 players and we could not have done it without them!
I feel a little weird spelling it out here but it is very important to us — the player reactions to the alpha, even the first round, were very positive. Overwhelmingly positive. Slightly disconcertingly positive.
This comes off as light in tone; the developers should have been more disconcerted.
Players looking to pay extra for an early version of a game are likely fans of the developer’s previous games or enamored with the concept — the people most prone to heap praise upon it.
I’d guess that “First Access” players were misleadingly positive, as first access selects high enthusiasm players.
But I don’t have to guess as I was lurking in the Finji (publisher) Discord at the time.
Many of the members there were superfans — fans of the game, fans of the developer, on a first-name basis with team members, etc. There were two bits of recurring feedback that stood out to me.
The first was that the game was too easy, with players regaling each other with tales of resounding victories, when the game was probably too initially difficult and hard to understand. According to Steam achievements only a third of players made it a third of the way through the game.
A second bit of consistent feedback, in response to the lukewarm reception, was that the game was pearls before swine — too unique and sophisticated for typical “gamers” and critics.
That’s not helpful feedback even if true. If gamers can’t appreciate your game that’s ultimately your problem. But also, it’s not true. There’s no game exactly like Overland but there are hundreds of similar games; it’s a grid-based tactics game, a well-worn genre.
The (Too) Invested
Valve’s Artifact is another game I’ve written about previously, in Why Artifact Failed.1 As with Overland I examined the game design issues, but not the process that allowed those issues to slide.
Artifact endured a long invite-only test period. As with Overland many testers were superfans. But worse, some testers had vested interests in both Artifact and Valve.
Those testers included card game “content creators” hoping to turn the game into streaming “content”, and tournament participants, organizers and shoutcasters for other card games or other Valve games.
These testers held a financial interest in Artifact’s success. At first blush that sounds great: they’ll deliver the best possible feedback to ensure success.
But these testers also wanted to stay on Valve’s good side, especially given that Valve can be fickle when determining which players or casters to invite to events or partner with.
This is even more pernicious when testers are expected to serve a marketing function. We’ve all seen testers say, with prodding, “I can’t say much about the game, but what I can say is that it’s really special!” (See Apex Legends, for example) In other cases testers are encouraged to share more detailed impressions, with the understanding that future access rests on those impressions. It’s similar to how a cottage industry of geek-movie influencers reliably put out positive early reviews, knowing that they’ll keep getting event access as a result.
After Artifact failed Valve promised a rework, Artifact 2.0. Not only did Valve once again invite content creators to Valve HQ to test and then report positively on the game, but in some cases the same content creators who swore by Artifact 1.0 — the people most out-of-touch with the general audience. In that case the desire to market Artifact subsumed the desire to improve it. Artifact 2.0 did little to change the reception of the game and was arguably worse overall. (I would argue that!)
Convention Attendees
This one is brief: convention attendees are looking to have a good time, and most players won’t be negative to a developer standing beside them in a booth. I can’t count the number of flops that were reportedly hits at conventions.
A total lack of interest is probably more meaningful than excitement, but certain types of games, like those with deep themes or that rely on sound and atmosphere, tend to struggle. Immediately fun multiplayer titles are often convention hits — Evolve anyone?
Huge demand for stations at a convention — as with Resident Evil 2 Remake or Batman: Arkham Asylum — is a good sign. But nebulous convention enthusiasm is worth little.
Pro Players and Experts
Expert and pro players are a valuable resource when used appropriately. If you want to know if your fighting game holds up in tournament play you need top players to test it. If you’re hunting for exploits or imbalances you need people good at spotting and abusing them.
But while it’s tempting to believe that expert players make for good game designers the opposite is often true.
Pro players are by definition outliers. They don’t experience games in the same way typical players do, and they often have different values. They tend to undervalue aesthetics, mood, coherence, story, “fun factor”, intuitiveness, etc, and overvalue skill-testing, skill expression, and fussy mechanics that lean into their personal strengths.
2XKO employs some top fighting game players and it shows in the number of fussy mechanics like “partial charge” moves that exist to inject inorganic skill-testing. It’s overstuffed with systems familiar to experts but daunting for novices.
To use a much older example, one of the key developers of Master of Orion 3 was a Master of Orion 2 superfan and expert player. I love Master of Orion 2. I love the mechanics, sure, but also the aesthetics and tone. I like how weird and silly the aliens are. I love that invading a planet shows your troops marching across the surface melting enemies with lasers.
But those aesthetic touches aren’t “game design” in the strictly mechanical sense.2 They aren’t rules. You might appreciate them at 10 or 100 hours but not notice or skip over them at hour 1000.
Master of Orion 3 has all that “fun-factor” removed. Master of Orion 2 is a spreadsheet game with heaps of character on top; Master of Orion 3 is a spreadsheet game presented as a spreadsheet.
Ignoring fun-factor and aesthetics is how you get the infamous “characters are just functions” quote from a pro-player-turned-developer on Marvel vs Capcom: Infinite.3 The X-Men aren’t in MVC:I so there’s no Wolverine, but characters are just functions and MVC:I has a different claw guy, so you know…same difference!
In reality a main draw of these games is the characters. The most common Marvel Tokon topic is roster speculation. “Characters are just functions” is true from a rules standpoint — but only from a rules standpoint.
“Content creators” overlap with expert players and also tend to be extreme outliers. Someone who plays the same video game for 10 hours a day as a job has a fundamentally different perspective than normal players. Trying to please content creators may make some sense, but designing to their whims is probably a mistake.
Bungie tried hard to please streamers with Marathon, but the streamers they catered to couldn’t make the game blow up, and in some cases won’t even stream it because their audience demands Arc Raiders instead. In that case humoring content creators may have been a mistake from a design standpoint and was largely irrelevant from a marketing standpoint.
Being good at video games and being good at designing them are entirely separate skills. This may seem counterintuitive, but a champion racecar driver doesn’t engineer the car.
Reduced Populations
One last category of wrong people to listen to are the result of a certain methodology: long-term playtests where less-interested testers drop out while the most enthusiastic remain. (“Reduced” in the way you reduce a sauce)
I’ll again use Artifact and its long playtest as an example.
From talking to some testers and reading reports from others, a certain dynamic emerged: testers complained about problems, developers made only minor changes, and the most frustrated testers dropped out. From the developer perspective the number and severity of complaints decreased — success!
A long continuous playtest that doesn’t forcibly rotate testers will almost inevitably end up overly reduced.
Another danger inherent in long test periods is one I touched on in On The Value of Fresh Eyes: that over time players grow blind to baseline flaws. In particular it’s easy to forget a rough onboarding process when you’re dozens of hours past it.
It’s good to know how your game plays at 20, 50 or 100 hours, especially in certain genres. But in the age of refunds, huge numbers of releases, and players and influencers leaving reviews after 20 minutes of play, solid first impressions are more important than ever.
When one decent N64 game released every three months those games functioned like feature films with captive audiences; they could get away with slow openings. Today games function more like made-for-TV films that have to hook viewers before the first commercial. It’s a huge detriment if your pool of testers can’t remember — and thus can’t help improve — those first hours.
.
Listening to the Wrong Players is a Process Problem
“Don’t listen to the wrong players” is trite advice.
More meaningfully, who the wrong players are isn’t always intuitive. And it’s easy to design a process that inadvertently selects for the players with the least useful feedback. A broad cross-section of the potential audience is ideal, but often the process is designed to capture a narrow one.
That’s it for listening to the wrong players. Onto everything else.
Setting Expectations
It’s fairly obvious — just common sense really — that setting appropriate expectations for a work-in-progress game is smart. But I’ll use Marathon as an example of how it can go wrong.
Here’s the expectation-setting for the Marathon alpha:
At first glance this is great. You don’t want graphics feedback if you’re changing the graphics, or to hear about lack of maps when you’re holding back maps.
The problem is that this gives players — and the development team — the impression that the team is sitting on (or planning to make) a better version of the game. It dissuades players from sharing feedback in certain areas, as that feedback may be outdated or be addressing known issues.
The UI in the alpha is partial and unpolished. A player might withhold feedback on the UI, assuming their issues are known, and it’s easy for the developer to dismiss feedback by telling themselves that improvements are already underway. If a player finds the UI flow confusing they may not report it, assuming that’s a known issue, when the team’s plans for improving the UI don’t include flow changes.
The UI in the shipped version of Marathon isn’t particularly polished and has a number of immediately obvious issues — mostly the same problems as in the test builds.
Telling players that the alpha lacks “narrative and storytelling systems” implies that the shipped game includes them, but it also lacks narrative and storytelling systems.
This reads as “while we’ll accept feedback on anything, storytelling feedback is pointless.” Which is, I think, the development team misleading (probably unintentionally) both the players and themselves.
This expectation-setting dissuades feedback. And it tacitly overpromises; it makes ranked play and the Marathon ship sound like major content when they’re fairly minor. I’ve played dozens of hours and I don’t even know what “pinnacle” content refers to.
Setting baseline expectations makes sense. “This alpha only includes half the playable maps and characters. Everything is a work in progress.” That level of disclaimer is fine.
You can give players in-game hints as to which parts are more ripe for criticism. Slay the Spire 2 uses MS Paint placeholder art in places — most gamers will pick up on that and not focus their energy on critiquing those screens. A fighting game test that only offers 4 of 16 characters could render unavailable character squares as greyed out or question marks. You can make unfinished content look clearly unfinished and trust players to pick up on that.
But, most if all, I think you have to be willing to suck it up.
It’s annoying if players keep harping on how trees in the game look bad, when, in the build on your desktop, those trees look better.
But the alternative is far worse: players think the trees look bad but underreport it, then the game ships with those same trees. Rather than encouraging players to filter their feedback you should perform the filtering.
I’ll close out this section by quoting this Reddit post on the Marathon alpha, People don't understand what the Alpha was even for. (I’ve bolded some parts for emphasis)
They mentioned 2 main reasons they had the alpha. Technical capabilities and moment-to-moment gameplay. People were running max settings at over 100 frames. Think of how this compares to their main competitor in this genre: Tarkov. Every content creator that has left their 2 cents worth has mentioned ‘yeah, the moment to moment is great, but...’ and then they go off on some tangent. They were testing the moment to moment here people. So this is GOOD feedback. People claim that they wanted more, that the game didnt have an IT factor. IT WAS A CURATED, SMALL SLICE of the game. This game is looking to hold onto the secrets of Tau Ceti IV for as long as it can. One of the biggest criticisms I have seen everywhere was about how the exteriors looked, to which I would point you to the FIRST thing in the list of things that aren’t included: FINAL VISUALS AND GRAPHICS.
The biggest issues currently imo are prox chat and some kind of solo q/ queueing system that prevents solos from getting steamrolled. Other than that the game looks like a ton of fun the way it currently stands. I still haven’t got a code and probably wont but people need to understand that all this negativity coming from press/youtubers about the game needing something more or something special must not have understood that the something special is being kept from us until later. They weren’t testing the special stuff for this playtest. They told us that going in.
To paraphrase: you shouldn’t critique the alpha, except through narrow avenues, because Bungie is sitting on a much better and richer version of the game.
They were not.
The exteriors were improved substantially, alongside other notable improvements. But crucially the “something special” that was missing in alpha is missing in the shipped game as well. I don’t blame this poster for believing the “something special” was real, as Bungie implied as much, and probably believed it themselves.
Rather than act as though the test version lags far behind the “real” version, which dissuades feedback and allows developers to write that feedback off, I think the healthier approach is to act as though the test version is the real version. “It’s fixed on my machine” is a poor engineering attitude and a poor playtesting attitude as well.
Holding Back Too Much
Developers are reasonably wary of giving out too much content too early during playtests — of giving away the milk for free. I don’t like the framing Bungie used for the Marathon alpha —that the excluded content fundamentally changed the game — but the amount of excluded content was fine.
Holding back content is a problem when that content does fundamentally change the game.
Valve’s Artifact flopped largely due to an onerous monetization scheme. To play Artifact casually players had to buy cards and card packs; there was no way to earn them through play. The competitive draft mode required paid event tickets for each set of matches. Artifact wasn’t free-to-play either. The monetization model was like buying Street Fighter 6 for full price, paying $15 to unlock Ryu, then paying $1 for each online match.
The prolonged Artifact test period included no monetization.
When testers joyfully pronounced that Artifact was amazing, one of the best card games of all time, one that they loved playing, implicit but unstated were two pivotal words: “for free.”
Testing a game has limited value when developers hold back fundamentally warping content or systems. At best they’re testing a context-free slice of the game.
Being Too Conservative with Changes
Star Trek: Voyager – Across the Unknown discourages save scumming, at the cost of a save system that’s obtuse and unpredictable. This was one of the biggest complaints with the demo, the developers pledged to change it, then the game shipped with basically the same system. (To their credit they quickly patched it again after release)
Marathon’s UI is hard to quickly parse and obscures important information. This was a known issue for months, that the game still shipped with, and that the team is still trying to address post-release.
Riot Games pledged to address 2XKO issues like awkward controls and overly long combos, but 2XKO shipped with only minor changes that didn’t alleviate those issues.
In these cases, developers got the feedback, acknowledged it, and made moves to address it, but those moves were simply too small.
This Sid Meier quote applies:
One of my big rules has always been, "double it, or cut it in half." Don't waste your time adjusting something by 5 percent, then another 5 percent, then another... just double it, and see if it even had the effect you thought it was going to have at all. If it went too far, now you know you're on the right track, and can drop back down accordingly. But maybe it still didn't go far enough, and you've just saved yourself a dozen iterations inching upward 5 percent at a time.
Overreacting to in-person playtests is a real problem, because watching live players get stuck or frustrated for even a couple minutes feels catastrophic. But I’m hard-pressed for examples of games that responded too drastically to non-live playtest feedback.
You can fix issues post-release; fixing up the save system in Star Trek: Voyager is better late than never. But a rough launch can be unrecoverable and delaying until after launch can make certain fixes much tougher.
Riot Games acknowledged, pre-release, that the length of 2XKO combos was turning off some players. After disappointing launch metrics and layoffs the team said they’d make changes to combo length, only to backtrack a few weeks later.
I suspect the team realized the game had already lost the players turned off by long combos, and that changing them would risk losing the remaining players. Riot is left in an awkward state: on-the-record that combo length is a problem they aren’t planning to address.
As I wrote about at length, the control scheme is another big issue. But reworking it is a huge task, isn’t new content (which the game desperately needs), and might fail to recapture disenchanted players.
These issues need changing — 18 months ago. Now these changes aren’t better late than never, but simply too late.
Some developers may fear that being too reactive to feedback could result in design by committee, abandonment of core pillars or lack of vision. I sympathize. If players are better at designing your game than you are then they should be the professional game designers.
I think the key is knowing which aspects are non-negotiable core pillars and which ones are utilitarian supporting features.
A core pillar of 2XKO is “accessible” gameplay. Refusing to budge on accessible gameplay was reasonable. But the team did budge — 2XKO includes plenty of features that make it less accessible. Where the team refused to budge was on a “modern” control scheme more complicated than those in rival games. To me this is backwards: accessibility is the non-negotiable core pillar, while the specifics of the control scheme are merely one possible way to support that pillar.
Deciding on core pillars is easier said than done. (“Easier said than done” applies to this post as a whole, I know) And plenty of games fail because, while well-made, they’re built on the wrong core pillars. As I like to phrase it: “they simply made the wrong game.” To some degree this applies to Marathon: Bungie chose to make a “hardcore” PvP game that minimized PvE content.
But 2XKO strikes me as a game that changed both too much and not enough during testing, with certain problematic systems lingering while others swung wildly. At the highest level it swapped from a 1-on-1 fighter to a tag fighter without much explanation. Maybe “social play” is a key pillar, and a tag fighter supports that via duos play, where each player controls one character on the team. But duos mode, while fun, comes off as a secondary feature, with UI quirks and locked-off content.
I’m not sure they knew which features were core and which tertiary, and thus struggled with how much change to allow.
Viewing Unsatisfied Players as Debate Opponents to be Defeated
I wrote about this at length in Why Artifact Failed so I’ll quote myself here.
I used to read the Star Wars Galaxies message boards, not because I was interested in the game but because it was fun to watch the developers interact with fans. They seemed intent not on understanding player complaints but on proving them incorrect. A player would complain that a weapon was weak, and the response from the devs was “according to our spreadsheet this weapon has above-average DPS - you’re wrong.”
But the more productive approach is to try to determine the root cause of the complaint. Perhaps the weapon really is weak due to a bug. Maybe the spreadsheet has the wrong formula. Maybe the weapon’s effective DPS is very different from theoretical - it has a low rate of fire and high overkill, or works well only at specific ranges. Maybe it has very little knockback or hitstun, so while it has high DPS that may not be the best metric, and a metric like “how much damage it does before the enemy is in melee range” is more relevant. Maybe players don’t understand how to use it correctly, maybe it’s weak against a common enemy type or for a certain group composition. Maybe it just has unsatisfying sound effects. All of those are possible explanations that the typical player is not sophisticated or invested enough to articulate.
As a developer you have to be charitable in your interpretations and willing to play detective. As I wrote in that piece:
While data can prove that a game is balanced (by some metric) or not overly RNG-reliant, no amount of gesturing to a spreadsheet can prove that a game is fun.
Too many developers respond to “I’m not having fun” with, effectively, “that’s wrong, you are!”
Testing Done Right - Marvel Tokon: Fighting Souls
I don’t know what the testing best practices are, but I have a good idea of what to avoid. So I’ll end with an example of testing done right not wrong.
Marvel Tokon: Fighting Souls has run two closed beta tests.
Signup required a single website click for anyone with a PSN account — no account creation on a third-party site, no codes, no mandatory Discord participation, no mandatory spend. They sidestepped any initial filtering issues.
Both tests were a single weekend, which is enough time to play a lot, but not enough time to lose a fresh perspective.
The expectation-setting was minimal: the team shared which characters and stages would be available, and before the second test highlighted some changes from the first one. There was no whiff of “we’re sitting on a much-improved version” or “don’t bother with feedback in these areas.” Just “here’s the game in its current state, knock yourselves out.”
The most positive aspect of the tests is how receptive the team has been while maintaining core principles.
Tokon is (roughly) a Marvel vs Capcom style tag fighting game, with teams of four characters. Four characters is a lot, so the key wrinkle in Tokon is that all characters share one health bar, meaning you can pilot one character rather than having to learn four to even get started.
This design has received some pushback, especially from fighting game “influencers” who can quickly learn multiple characters.
So far Arc System Works hasn’t budged.
This is, I think, the right kind of stubborn. They’re making a bet based on the assumption that a Marvel game will attract Marvel fans and not just fighting game sickos — that the average player wants to pick their favorite character and jump in. That having to learn four characters just to get going is too big an ask.
This may prove to be the wrong bet, but it’s an understandable one.
On less central features they’re willing to negotiate, in ways that only improve the experience.
Players complained that wall breaks were too frequent. Beta 2 introduced a new stage with unbreakable walls, and after Beta 2 wall breaks were made less frequent across the board.
Unique assist attacks are a big draw in tag fighters. (Your Doom rocks or Captain Corridor) In the beta versions of Tokon calling an assist in the air triggers a dubiously useful generic air assist, and calling an assist during a combo triggers a generic “strike assist” rather than a unique one. This was my biggest complaint: rather than leaning into the unique aspects of each character the game homogenized them.
After Beta 2 the team announced that both of these had been changed: aerial assists work very differently, and assists done during combos can be character unique assists rather than generic.
There have been other smaller changes as well: tagging is faster and more dynamic, ambiguous high/low mixups during tagging were removed, etc.
The reception has been almost universally positive. They were responsive to feedback and changed many features but didn’t sacrifice the core experience or key pillars. The new version isn’t a different game; it’s a better version of the same game.
Which, assuming you aren’t just making the wrong game, is the ideal outcome of long-term testing: forging the best version of what you were already making.
For what it’s worth (absolutely nothing!) ChatGPT claims this is the single best writeup about the failure of Artifact
As a side note, someday I might write “game design isn’t rules design” about this distinction. This is also why I believe European-style board game design is vastly overrated as a video game design philosophy.
In fairness, the speaker was in a tough spot. Disney was moving away from the Marvel characters they didn’t hold the movie rights to. The fan-favorite X-Men characters don’t appear in MVC:I for corporate strategy reasons, not game design ones.






Here for the Silver Surfer cover
Nice to see you post after a while!