Menzen Tsumo: I give this rating system three and a half thumbs liked

Rating systems come into play in a lot of a person's daily life. Hotel ratings, vehicle crash-test ratings, Yelp reviews, all kinds of systems exist for the express purpose of concisely expressing the quality of something.

I've recently been thinking about different kinds of rating systems and their pros, cons, and trends, so here I'll detail a lot of what I've come up with. Specifically, I'll be investigating tier lists, five stars, out of 10, and like/dislike.

For the fighting game fans, tier lists are the primary method by which characters in fighting games are rated. In a tier list, elements are placed into roughly equivalent categories based on their perceived quality. Importantly, tier lists usually assign letter grades much like the American school system, which is the absolutely critical part of the system.

People have an inherent idea of what kind of quality deserves what kind of grade. Characters deserve an A if they are overall very strong with no major weaknesses--exactly like a student would deserve. S-tier characters are beyond even that, having some ridiculous strength that sets them apart. On the lower side, B characters have notable weaknesses and C characters are really just unfortunate.

The major benefit of this system is that it provides a context common across all fighting games that doesn't really need to be explained: everyone knows what kind of performance warrants what kind of grade, so the cast can be shifted up or down relative to another game and this still accurately expresses that the cast in one game is overall stronger than in another game. It also allows the expression of minor, but notable, variations by giving characters +/- grades consistent with their strength.

Notably this approach doesn't scale particularly well when rating arbitrarily large numbers of things. With only five real grades to consider, you end up with something very similar to a five-star system. Which of course, needs no introduction; it's the kind of scoring many online retailers and movie critics use.

Depending on what the system is being used to rate, common practices can differ on what deserves five stars. Product reviews often use five stars to mean "worked as expected". Movie critics reserve their five stars for the best of the best; genre- and period-defining films. Importantly, there is no real universal context in which a five-star system is based, so people don't have an inherent bias toward what the system should represent. Unfortunately, it ends up often being reduced to a binary like/dislike system by people who want to change the moving average score by as much as possible in the direction they feel it should move.

YouTube uses a like/dislike system, and actually moved to it from a five-star system a few years ago. It's very useful in expressing aggregate opinions across many people; it really doesn't matter if 1000 people rate a video between 1 and 5, you can get the same information by having them choose whether they felt overall positive or negative about it. In general this system isn't very good at comparing elements, but websites like reddit have molded it into a pretty good shape to do just that.

Let's look again at what the previously-discussed systems are used for. Tier lists are used for representing an easily-tractable dataset across a handful of categories given a common background context. Five-star systems intentionally remove the context, and are used typically to indicate the quality of a single element compared to a perceived average, rather than against any other set of elements. Like/dislike systems are a way of crowdsourcing that same indication.

So what do we use to represent a single person comparing many things? The out of 10 system works well here. Used on MyAnimeList.net for example, this system rates elements on a scale from the integers between 1 and 10, usually not allowing fractional scores. This system is extremely good at creating sorts of elements. The five-star system usually assumes a perceived average, but the out of 10 system is better at producing that average by taking an individual user's ratings. This allows you to have a good idea of the person's likes or dislikes, and also provides a lot of freedom in how you want your scale to work.

Many people, influenced again by the American school system, often set a 7.5 as "average", consistent with a C grade, and rate higher or lower based on this. So you get a sort of bell curve centered around 7.5, rendering the lower scores relatively useless. You could also take the more general approach and set the average to be 5, for whatever definition of average you'd like to use, and then distribute around it. But many people are averse to this, since 50% is a failing grade.

A friend of mine and I, curiously, take very similar approaches on opposite sides of that center. I make the assumption that the shows I watch and rate on MyAnimeList are, in general, better than the aggregate of all shows. If I watched and rated every single show, my average would be a 5. But since I, in general, watch things I consider better than average, my average score sits at 6.4. Conveniently this also lets me express the categories I like, which are the following:

10 - Otherwise a 9, but significantly influenced my perceptions or opinions in some way
9 - Extremely enjoyable, no significant weaknesses
8 - Great
7 - Pretty good, but not very notably so
6 - Decent to fairly good, OR high-quality but not appealing to me, OR mediocre-quality but highly entertaining
5 - Mediocre, OR good-quality but not appealing to me, OR bad but highly entertaining
4 - Weak, but with a potential redeeming quality
3 - Lacks any redeeming quality whatsoever
2 - Absolutely godawful
1 - Offensive to my sensibilities

Note that I dedicate more ratings toward things that I enjoy less than my average. I feel that the things I enjoy fit nicely into those four ratings, but I need extra divisions to accurately represent how awful some of the things I could watch are.

My friend takes the exact opposite approach, setting a 4 to "decent", and anything worse than mediocre gets a 1 or 2. He reserves his 10 slots for extremely small numbers of shows he considers to be the greatest things ever created, and likes to emphasize subtle differences in the things he does like.

That's the beauty of this system: given a single person's set of rated elements, it is easy to determine the strategy and distribution they use, and adapt your interpretation of the list based on that. A clear weakness is that separate rating sets cannot be directly compared, however.

Menzen Tsumo

Monday, October 7, 2013

I give this rating system three and a half thumbs liked

No comments:

Post a Comment