Raw Scores vs. Ranks

postblog01.png

Written and edited by Sherman Charles

Graphic by Cleidson da Silva

Raw Scores: What are they and what do they mean?

The results from adjudicated events (e.g. competitions for showchoir, concert choir, solo voice, small ensemble, large ensemble, dance, acting, concert band, marching band, etc.) are often based on a set of criteria that is outlined on a scoresheet. These criteria are broken down into detailed categories which are assigned a maximum point value. A judge assigns a score to each of these categories to evaluate the event’s competing performances. These awarded points are known as Raw Scores.

Raw Scores quantify a judge’s subjective, I repeat, SUBJECTIVE interpretation of a performance based on the categories that are used on a given scoresheet. Competitions that use Raw Scores to determine final standings simply go by the total number of points awarded by the judge or panel of judges; The competitor with the highest score wins, the second highest gets second place, and so on. This method for calculating final results is straightforward and easy to understand, which is why many competitions choose to use Raw Scores.

As mentioned previously, judges use subjectivity to evaluate competing performances. This means each judge uses their own personal experiences, knowledge, education, expertise, convention, opinions, ethos, pride, feelings, emotions, preferences, et cetera, et cetera, et cetera, to determine whether the individual or group that is performing is meeting the criteria spelled out on the scoresheet. Because each judge comes from a unique background, one person’s standard for a category may be vastly different from the next person’s. To put this in terms of numbers, one judge’s 7 out of 10 may be equivalent to another judge’s 8 out of 10.

This can be explained by a lack of a common standard reference for each category. In other words, there is no concrete ideal for what is 100% perfect tone, execution, technique, overall effect, or most other categories. In addition, the numerical distance between what is perfect and what the judge is currently observing is inexact. Therefore, judges often use relativity to evaluate each performance against one another.

This means that judges award scores using preceding performances and the level at which the performers are competing as standard references. In an oversimplification of the mental process for this, as a judge watches and scores each performance throughout the day, they are thinking about whether the current performance was better, the same, or worse than the previous performances and in what aspects was it better or worse. They then assign scores based on these observations. In other words, they are scoring each performances relative to all of the other performances. As long as each judge maintains a consistent reference, their Raw Scores will reflect their honest professional opinion on all of the performances they evaluate.

So, Raw Scores offer a lot of information to participating performers, and deciding results based on Raw Scores is intuitive and easy to understand. However, its simplicity is also its Achilles Heel. All judges across all competitions should have reasonably similar expectations for what is considered perfect. Most of the time, this is not true, even among the best judging panels out there. Additionally, scoring methods that use Raw Scores to determine final results can be manipulated by a single judge. If a judge wants to throw a competition in favor or against a particular performance, they can inflate or deflate their scores to manipulate the total scores that determine places. So, even if 4 out of 5 judges have Performance A winning, the rogue judge can deflate their scores and inflate Performance B’s scores so much that they will win. This is known to experts on voting theory as Strategic Voting. This is a serious weakness that is often exploited by savvy judges.

The good news is that this is easily remedied by converting Raw Scores to Ranks.

Ranks for the Win

Ranks reassign Raw Scores to intervals of whole numbers that represent the order in which a judge has placed each performance. In other words, the judge’s highest scoring performance is converted to a value of one (1), the second highest receives a two (2), and so on. So now, numerical distance (also known as margins or point swings) is no longer as strong of a factor when calculating results (see footnote at the bottom of this post for a super nerdy tidbit of info). This helps eliminate the problems of not having a common standard reference for each category. Each judge can have their own standards and judge on whatever scale they like, as long as they remain consistent while evaluating each performance.

The following example is from the 2018 Johnston Showzam Director’s Handbook with a few modifications for simplicity:

Raw Scores

 
Participants are listed in the first column and the judges are listed in the first row. The last two columns correspond to the Sum of all points awarded to each participant and the final Place that they received. This figure shows that if the result…

Participants are listed in the first column and the judges are listed in the first row. The last two columns correspond to the Sum of all points awarded to each participant and the final Place that they received. This figure shows that if the results are calculated by Raw Scores, School B would win, even though more judges have School A winning by Ranks.

 

Convert to Ranks

 
Participants are listed in the first column and the judges are listed in the first row. This figure shows the Rank conversion of the Raw Scores in the above figure. The majority of judges favor School A, therefore, School A should win.

Participants are listed in the first column and the judges are listed in the first row. This figure shows the Rank conversion of the Raw Scores in the above figure. The majority of judges favor School A, therefore, School A should win.

 

If we calculate this competition’s results based on Raw Scores, School B would win even though the majority of judges gave School A high scores. This is because Judge 4 gave School A a really low total score. If we convert to Ranks, it becomes clear who the winner should be. More judges have School A ranked higher thank any other School. Even though this seems intuitive, it is actually quite difficult to calculate in an objective manner.

A number of methods have been proposed to calculate results based on Ranks, and each of them have their own strengths and weaknesses. This is a very important topic to discuss, but since there is so much to say, I will leave this for a future post.

Which method should I choose?

Some situations require Raw Scores. For example, your competition might have a carefully proportioned scoresheet that has the perfect ratio of categories and points in each caption and the judges are evenly split between captions. Converting to Ranks would completely ignore your beautifully designed scoresheet and your intentions for giving certain value to certain aspects of a performance. Just remember, you always run the risk of allowing a rogue judge to throw your contest. In my opinion, the safer choice would be to proportion your judges rather than your scoresheet. This will also be discussed in a future post.

Ultimately, the choice is yours to make. Much of your decision on which method to use depends on how much you trust your panel of judges. However, even if you completely trust your judges, I highly recommend converting to Ranks anyway. The only real negative impact is potentially having to explain how Ranks work to a confused competitor.

And, just to put it out there, if you only have one judge for something like a solo competition, there is no sense in converting to Ranks. Just stick with Raw Scores.

Finally, just to plug our product a bit, the Carmen Scoring System is the most flexible and versatile system available. It allows you to choose whether you would like to calculate results based on Raw Scores or Ranks as well as a variety of other customizable settings. If you would like to learn more about our system, get in touch and we will show you around!

Footnote

For those super nerds out there, Ranks, as they are defined for the purposes described herein, are NOT ordinal. They are in fact, intervals. There are several types of data out there, three of which are nominals, ordinals, and intervals. Nominals group data points together based on a common definition, but these groups do not have any order (e.g. red, blue, green, purple). Ordinal data does the same thing AND there is a clear order to them (e.g. Not Happy, A Little Happy, Happy, Very Happy), but how far apart each category is from the other is unclear. Intervals group data together into categories that are in a specific order AND provides information about how far apart each category is from the others (e.g. the distance between 10º and 11º is equal to the distance between 57º and 58º). The whole point of converting from Raw Scores to Ranks is to put each performance in categories that are in a specific order and that are equally spaced between 1st and last. Who knew you would be learning about statistics and data types in the competitive arts!