One Score to Rule Them Allcomments powered by Disqus
Posted on Friday, December 9 2011 @ 11:53:32 PST
This member blog post was promoted to the GameRevolution homepage.
Sit down and grab some popcorn: this is a long one.
Metacritic has Game Revolution’s average score at 64--a full four points below the next closest publication. Of course, when you’re converting a ‘B’ grade to a 75/100 and a ‘C’ to a 50/100, it isn’t hard to see why. In fact, every score below an ‘A-‘ is converted lower than its American academic equivalent—it is no wonder that publishers don’t send these guys review copies. Were Game Revolution to begin covering as much shovelware as other publications (yet another topic for a later day), I would expect that average to plummet well into the mid fifties.
So, any publication grading on a letter scale is out. But that still leaves 112 review websites and magazines just waiting to be analyzed. Since they all grade on a number scale, there should be no issue with comparing their scores against one another once everything is converted to the same numerical base. Right?
You are not going to believe this.
Among publications that score numerically, there are a wide variety of grading scales. Pictured below are the ones I encountered while poring through the remaining 112 publications:
That’s a lot of different numbers. But what if the numbers themselves didn’t actually matter? We know that Metacritic converts all of its scores to an ‘out of 100’ grade—doing so is the primary mechanism that allows the website to justify its averaging of disparate review grades into a single Metascore. So let’s apply that same conversion directly to the grading scales:
Now that’s a clearer picture. As long as a grading scale has the same amount of grade intervals, or what we’ll call ‘degrees of differentiation’ here, the Metacritic conversion treats the scales as identical. This is because the intervals between each number in a given scale are identical. In fact, we can boil down any numerical grading scale that meets this criteria to just the degrees of differentiation. For instance, out of five half points and out of ten whole points both have 10 degrees of differentiation, while out of ten decimal and out of one hundred whole points both have 100 degrees. Metacritic’s methodology implies that there is no difference between these scales. Ten degree scales are the same as twenty degree scales are the same as one hundred degree scales—they’re all out of 100, so just average them!
But what if this wasn’t the case? What if a publication’s average review grade was affected simply by number of discrete intervals they placed after a perfect score? Imagine that I have five games of varying quality, and I want to express to my readership that each game is worse than the one above it. In order to find five unique points of grade differentiation, I would have to travel farther down the numerical part of a ten degree scale (100,90,80,70,60) than I would down a one hundred degree scale (100, 99, 98, 97, 96). Even if I held the exact same relative opinion about how much I enjoyed the games, my nominal grades would change solely based on the scale I was using.
An extreme hypothetical case, to be sure, but the question remains: is it possible that some publications score games lower than others only based on their grading scale, even if everyone is using the same equivalent numerical range?
Not only is it possible; it is the god-damned truth.
Ignoring the lone 5 degree entry, all of the above differences are statistically significant at the 90% confidence level. The difference in score between 20 degrees and 100 degrees is significant at the 95% confidence level. The difference in score between 10 degrees and 100 degrees is significant at the 99.9% confidence level.
What does it all mean? It means that an 80/100 on a one hundred degree scale and an 80/100 on a ten degree scale are DIFFERENT grades. It means that if I am a game reviewer, I will score the exact same game an average of 4 points out of 100 less if I use a ten degree scale versus a one hundred degree scale. It means that by averaging together all review scores for a game and then stacking up each individual publication against that average, Metacritic is falsely portraying a fair score comparison where none actually exists.
Now, I’ll level with you: In the grand scheme of things, this is more a moral concern than a practical one. Because most major game review publications manage to review most major games, the Metascore is equally skewed for all of them. Generally, Metacritic serves its purpose—the better games get better scores and vice versa. What the above issue demonstrates is actually a portion of a much more fundamental error in the way that we as consumers regard and digest game reviews.
What I am about to tell you next has no accompanying reams of data or sucker punch statistics. It is not something I can derive or state to you with any level of mathematical confidence. It is a one hundred percent unsupported opinion, and all I can do in lieu of providing any numerical argument is to preface with two questions.
First question: Do these three scores represent the same quality of game?
Second question: What about these three?
A review cannot be objective because it is directly tied to a single human being’s experience at a specific point in time. A review should not be objective because entertainment is inherently a subjective experience. When I play a game, I could not care ****ing less if some evangelist has anointed it with the sweat from Christ’s armpit. I only care about how much fun the game is TO ME.
A review score can help me determine enjoyment to some extent, but it is ultimately supplemental to the actual written review. If some bro who loves hammering his dick into the ground with a cleat rates Dick Stompers 2012 a 9.5/10, I am going to be seriously mislead if I purchase on the grade alone. I can’t stand having my dick pulverized by spiky feet! And if the reviewer is worth his salt, he will give me the subjective context I need in the review proper to understand where my tastes overlap with and differ from his own and how those differences will affect my enjoyment of the game in question.
I care if you are a veteran of the genre. I care if you played the prequel. I care if you love the IP. I care if it was the story or the mechanics that made you dock that game a point. Hell, I even care if you had a shitty time. When Daniel published his Warhammer 40k: Space Marine review broken multiplayer and all or Colin devoted a quarter of his Half-Life 2 review to tell me how much he hated Steam, I said, “**** yeah, journalism”. Don’t give me the idyllic future; give me the information I need to figure out as best I can what my experience with this game is most likely going to be.
I’ve played modestly-scored games that I’ve loved and skipped highly rated games I knew I would hate, and I owe it all to that pointless garbage dumped between the game title and the score. So for the love of God, get to know your reviewer and read the review. Metacritic might cover the broad strokes, but chances are the review of that one game made for you and you alone is not punctuated by a perfect score.
Serious Sam VR
The Last Hope Valtos Update trailer. (0:26)
War of Crown
Official trailer. (1:17)
Hearts of Iron IV
Death or Dishonor announcement trailer. (1:04)
Dragon Ball Xenoverse 2
A trio of threats, DB Super Pack 3 launch trailer. (1:36)
Rick and Morty: Virtual Rick-ality
Rick and Morty Virtual Rick-ality official trailer. (1:30)
|More On GameRevolution|