Predictive Power of Ranking Systems

felix.feroc · Mar 10, 2022

In a recent discussion of ranking systems as used to seed tournaments, I realized I thought that the best way to evaluate a ranking system is through its predictive power. I therefore decided to test a set of ranking systems to see which one was best at predicting tournament outcomes.

The ranking systems I have come across are:

Current ranked-ladder elo: used by Wandering Warriors Cup, discussion of which inspired this post
Average of max ranked-ladder elo and current ranked-ladder elo: used by Master of Socotra 2 and Return of the Clans
Tournament elo: maintained by the fine people at aoe-elo, who calculate elo based on games in tournaments rather than from the ranked ladder
Combination of tournament and ranked-ladder elo: used for World Rumble, one-half tournament elo, one quarter max ranked-ladder elo, one quarter current ranked-ladder elo
ATP: a non-elo alternative based on the system used for professional tennis, where points are awarded based on where a player places in a tournament and the prestige of the tournament. @robo developed the prestige algorithm and maintains the spreadsheet with all the data and calculations.

Before I describe how I tested the ranking systems, I want to show the results:

We see that the tournament elo, mix of tournament and ranked-ladder elo, and ATP all performed best and are within a couple percent of each other. Max + current ranked-ladder was about one-third worse than the average of the top three, and current ranked-ladder elo was about 40% worse.

The way I tested the systems was to have each of them predict how a series of single-elimination tournaments should play out given the player rankings at the time, then compare these predictions to what actually happened.

For tournaments, I chose all the S- and A-Tier 1v1 tournaments from 2021 and 2022 that finished with a standard single-elimination bracket. I used the NCAA.com system for evaluating March Madness brackets to score each prediction. The final scores in the graph are the cumulative totals for all the tournaments.

I ran the same tests with many different parameters: just S-Tier, just A-Tier, only Random Map, just the last three rounds of the tournaments, and using an alternative March Madness scoring system. These other tests make the story more complicated without changing its fundamentals. All the variations had the same three at the top and very close together, and most had the same order for the top three. The average of max + current ranked-ladder elo was always fourth and current ranked-ladder elo always came in last. I therefore chose to ignore these tests for the main discussion, but I wanted to enter this caveat here for those who like caveats.

As the results are based on only twenty tournaments over the course of one year, I have no idea how robust they are. But I hope they provide some assistance to tournament organizers when deciding how to seed tournaments and to the rest of us when second-guessing them.

nimanoe · Mar 10, 2022

Thanks, this really shows how much better these methods are than just using the ladder ELO, even with a range of different tournaments that have a wide range of different players participating.

I wonder if a combination of the ATP rankings with the tournament elo would give an even better result though.

felix.feroc · Mar 10, 2022

nimanoe said:
I wonder if a combination of the ATP rankings with the tournament elo would give an even better result though.

I just ran the numbers, and in fact it does worse than either (492 vs ATP 494, TE 517)!

siestes · Mar 10, 2022

I wonder how is it still possible in 2022 that TOs still use purely the ranked ladder to seed their tourneys tbh

Care to run the test for the formula we used for holy cup @felix.feroc ? to see if it was worth the time invested to come up with it or if we should have just sticked to tournament elo

(formula was (CUR+HIG)/2+max(TE;2000)*4+max(ATP/5+1800;2000)/6
CUR: current ladder
HIG: max ladder
TE: tournament elo
ATP: ATP)

LowEloNobody · Mar 10, 2022

How are player's ranked if they have not yet competed in a tournamnet (and presumably wouldn't have a tournament elo)?

felix.feroc · Mar 10, 2022

LowEloNoOne said:
How are player's ranked if they have not yet competed in a tournamnet (and presumably wouldn't have a tournament elo)?

I basically assume they lose. And I verified that everyone in reality who made it to the second round actually had a score, so the problem of two unranked players meeting never happened.

robo · Mar 10, 2022

Did you use players elo/rating/ranking at the time of the event, or retroactively applied today's?

felix.feroc · Mar 10, 2022

hallogallo said:
Care to run the test for the formula we used for holy cup @felix.feroc ? to see if it was worth the time invested to come up with it or if we should have just sticked to tournament elo

So, I got 505, which is better than ATP alone, but worse than Tournament-Elo and the Tournament-Elo/Ranked-Ladder-Elo blend. So it is up there but likely not worth the effort.

felix.feroc · Mar 10, 2022

robo said:
Did you use players elo/rating/ranking at the time of the event, or retroactively applied today's?

I used it at the beginning of the event. For ATP, I copied the spreadsheet and altered TourneyResults:P1 to look at R1 instead of today() and adjusted TourneyResults:R1 to the day the tournament started to get the ranking points.

LowEloNobody · Mar 10, 2022

felix.feroc said:
I basically assume they lose. And I verified that everyone in reality who made it to the second round actually had a score, so the problem of two unranked players meeting never happened.

I meant, if a tourney host wants to use the superior method of seeding, how do they seed new players or players without a tourney elo?

felix.feroc · Mar 10, 2022

LowEloNoOne said:
I meant, if a tourney host wants to use the superior method of seeding, how do they seed new players or players without a tourney elo?

A couple things.

First, ranking is used for seeding, but you don't have to seed all the way down. In fact, professional tennis sometimes uses something tourney-geek calls "tiered seeding" which he has tested and thinks is awesome. Basically, he finds that except for the top, you probably are better off not seeding.

But if you had to seed, I would probably use the average of max and current elo for those who don't have tournament elo, maybe divided by 10 so it is clear it is below tournament elo.

siestes · Mar 10, 2022

felix.feroc said:
So, I got 505, which is better than ATP alone, but worse than Tournament-Elo and the Tournament-Elo/Ranked-Ladder-Elo blend. So it is up there but likely not worth the effort.

thanks! tournament elo proven best seeding method oboi @Tarsiz :tongue:

felix.feroc said:
But if you had to seed, I would probably use the average of max and current elo for those who don't have tournament elo, maybe divided by 10 so it is clear it is below tournament elo.

below a certain point tournament elo is not really accurate anymore (below 2000 or something probably). Also, the further you go down the ranked ladder the more it's accurate. Below a certain point the ladder can just take over for seeding.

TheCapybara · Mar 10, 2022

My impression, though this may not have been robo and Tarsiz's intention when creating/refining it, is that ATP works beyond just predictive power in that it incentivises tournament participation beyond the S-/A-Tier tournaments. In my head, I'd value it for more than just how strong its predictive capabilities are.

siestes · Mar 10, 2022

squeaker said:
My impression, though this may not have been robo and Tarsiz's intention when creating/refining it, is that ATP works beyond just predictive power in that it incentivises tournament participation beyond the S-/A-Tier tournaments. In my head, I'd value it for more than just how strong its predictive capabilities are.

I don't see how ATP incentivises more to participate to more tournaments than tournament elo. One of the recurring complains about tournament elo is that it weights lower tier tournaments equally as A/S tiers, so by that metrics it incentivises higher rated players to take lower tier tourneys seriously, if they want to keep a high seed. (which is probably lower in the priority list of top players anyway tbh)

It also seems it's a tiny bit better at predicting results. But well, elo is supposed to be a prediction tool to begin with, so i guess that is logical.

TheCapybara · Mar 10, 2022

hallogallo said:
I don't see how ATP incentivises more to participate to more tournaments than tournament elo. One of the recurring complains about tournament elo is that it weights lower tier tournaments equally as A/S tiers, so by that metrics it incentivises higher rated players to take lower tier tourneys seriously, if they want to keep a high seed.

Well, the point is that by playing tournaments you can only gain points in the ATP system, whereas on Tournament Elo you can lose points for playing. That's what I mean by incentivising playing. Plus, you are rewarded the same regardless of opponent, whereas TE is weighted by the strength of your opponent, so, in theory, if you're playing lower tier tournaments against weaker opposition, you're putting much more at risk in terms of your ranking compared to ATP.

siestes · Mar 11, 2022

I see what you mean. I guess to simplify we could say you are incentivised to play more lower tier tourneys with ATP, and you are incentivised to play more seriously lower tier tourneys with TE.

This is all hypothetical anyway tho, because I don't think players care that much about tournament ranking (ATP or ELO)... At least not in the current state of things, where even tournament seeding is not even influenced that much in general by tournament ranking.

If tournament seeding was exclusively based on tournament elo/atp, or even better if there was prize money or even prestige attached to it then maybe they would start to care?

felix.feroc · Mar 11, 2022

hallogallo said:
It also seems it's a tiny bit better at predicting results.

Yikes! I hope this is not the general takeway of my post. Although I tried to make the tests as fair as possible, I don't think the differences between the top three is really significant enough to describe one as any better than the others. There are just too many assumptions and too few samples to say anything more definitive than that they are comparable.

hallogallo said:
This is all hypothetical anyway tho, because I don't think players care that much about tournament ranking

I totally agree with this, which is super useful for ranking algorithms, which really don't want players gaming them. In tennis, you get players to care about points by being the only game in town. If you want a guaranteed spot in a tournament, a bye in the first round, or just an easier first few rounds, you have to play the ATP point game. Given the DIY character of tournament organization in AoE II -- which I think is a huge net asset, btw -- this ain't gonna happen.

siestes · Mar 11, 2022

felix.feroc said:
I hope this is not the general takeway of my post.

don't worry it's not. I was just teasing tarsiz because we had discussions in the past, where he assured that the whole premise of the ATP system was that TE was **** and we needed a better system and that's why he and robo created ATP. The general takeaway of your post is that in the end it's quite similar and that TE is not actually ****. :smile:

TheCapybara · Mar 11, 2022

hallogallo said:
I see what you mean. I guess to simplify we could say you are incentivised to play more lower tier tourneys with ATP, and you are incentivised to play more seriously lower tier tourneys with TE.

Yeah, I think that's probably about right.

hallogallo said:
This is all hypothetical anyway tho, because I don't think players care that much about tournament ranking (ATP or ELO)... At least not in the current state of things, where even tournament seeding is not even influenced that much in general by tournament ranking.

Agreed. And we've seen that a lot of the very top players don't really care about seeding for tournaments from their lack of ladder grind when ladder is used, so I mostly just see this as an interesting exploration of possibilities.

hallogallo said:
If tournament seeding was exclusively based on tournament elo/atp, or even better if there was prize money or even prestige attached to it then maybe they would start to care?

I guess that would be the best case. Or even for qualification to a major tournament (like the ATP Finals for the top 8 tennis players each year).

Tarsiz · Mar 11, 2022

@hallogallo My point was never that the ATP-inspired sheet was better at predicting results than the tournament-elo! Fun fact, I am a big tennis fan, and love nothing more than elo ratings for tennis, which sometimes provide an interesting "snapshot" of who is in good shape than the 52-week ranking system (the ATP also has a "Race" ranking, that only counts points for the running season - so it lets you see who is performing well in the current year).

I have two main gripes with tournament elo that I think the ATP sheet addresses better:
- Slow decay. RiuT has been ranking in the top 10 of the aoe-elo for the longest time while he was barely playing any match, and definitely not having top 10 results when playing.
- Initialization. Players who enter the competitive scene are given an arbitrary elo, which is either 2000 or more rarely 1900. This has led to nonsensical situations in tournaments such as Visible Cup IV, were "Ladder 2k3" players would play "Ladder 1k7" players and... earn close to the number of points you would win for a victory against someone of your level on aoe-elo, since:
a) most "top 50" players hover around 2000 on the website
b) newcomers are initialized at 2000
And you need not be a wizard to predict a ladder 1k7 will get absolutely smoked by a ladder 2k3 (I know we've established the ladder is a pretty inaccurate predictor, but still).

The corollary of the second issue is that we've had situations where some players would "farm" A or B-tier events with very little challenge and raise to a top 15 position on aoe-elo. This was notably the case for ACCM (at the same time as his peak of form on S tier events when he was arguably a top 6 player so that was fine), Vinchester (whose actual peak came a bit later, but has proved consistently good in later tournaments) and Babaorum (who, despite his Visible Cup win, was in my opinion nowhere close to the top 15).

Finally, it all depends on the metrics used by Felix to compare these systems. I don't know how the March Madness system works, but just by thinking how you'd compare the predictive power of the systems, you'll design a metric that might very well benefit one over the other. As with everything Age of Empires related, it depends.

That being said I think we can all agree using ladder only for seeding in tournaments is nonsensical.

Rustyiesty · Mar 11, 2022

hallogallo said:
below a certain point tournament elo is not really accurate anymore (below 2000 or something probably). Also, the further you go down the ranked ladder the more it's accurate. Below a certain point the ladder can just take over for seeding.

Interested to hear suggestions for an optimal switchover? Top 16 (>2175) or Top 32 (>2100) on aoe-elo? Top 16 would translate to coming back to highest ladder ratings at 2500~, top 32 at 2450~.

I guess it's fair to say that 2450 highest is a rough boundary between pro and semi-pro levels? 2500 for being seeded in S-Tier tournaments? Current Elo seems to be around 50~ Elo lower - which is roughly the level of one decay for everyone.

Spring_ · Mar 11, 2022

Once you get a good seeding consistently the ranking system actually helps you place better in tournaments. is this factored into the equation in some way? for example, if I played in a tournament right now I would have a much harder road through the tournament than say viper who would have the other top players who are his real threat being spaced out so that he can place at a higher level. Of course its good to seed the best players so you can get the right people paid and have exciting finals ect but I wonder how much this will hurt some of the lower rated players ability to be that outlier who can go through the whole field even being ranked 200x rating lower or never performing in a tournament to the end stages previously. the algo seems to not only analyze the values of players but also biases the structure towards certain players imo.

felix.feroc · Mar 11, 2022

Spring_ said:
Once you get a good seeding consistently the ranking system actually helps you place better in tournaments. is this factored into the equation in some way? for example, if I played in a tournament right now I would have a much harder road through the tournament than say viper who would have the other top players who are his real threat being spaced out so that he can place at a higher level. Of course its good to seed the best players so you can get the right people paid and have exciting finals ect but I wonder how much this will hurt some of the lower rated players ability to be that outlier who can go through the whole field even being ranked 200x rating lower or never performing in a tournament to the end stages previously. the algo seems to not only analyze the values of players but also biases the structure towards certain players imo.

This is addresses a super important point. The algorithm itself only tells you its guess of the order of ability of each player. Seeding is how you use this information. The tourney-geek site is really good at talking about how to structure a bracket so it is fair to everyone, so that being in the bottom quartile doesn't almost automatically mean you are out after the first round. He describes the seeding of some professional tennis tournaments that do this really well.

But the algorithms I am testing are just supposed to provide rankings. I agree that TOs should probably not blindly use them to organize the entire bracket.

felix.feroc · Mar 11, 2022

Just out of curiosity, I just compared how successful the ranking has been in March Madness since 2011 compared to our own ranking systems. The NCAA brackets using the pre-tournament rankings get around 55% of the points available, but the top systems here go from 63% to 66%, so I think they should be very proud of themselves.

eC_Gurke · Mar 11, 2022

What a great post in a time where I was multiple times very close to never open this site again.
Thank you for reminding me the reasons i still open aoezone.

Menu

Shortcuts

Tournaments

Recorded Games

Today's birthdays

Predictive Power of Ranking Systems

Halberdier

Administrator

Halberdier

Halberdier

Two handed swordman

Halberdier

Administrator

Halberdier

Halberdier

Two handed swordman

Halberdier

Halberdier

Longswordman

Halberdier

Longswordman

Halberdier

Halberdier

Halberdier

Longswordman

Champion

Well Known Pikeman

Champion

Halberdier

Halberdier

Halberdier

Time

Calendar

AoE Live Streams

DE Top 5 RM 1v1

DE Top 5 RM Team Game

DE Top 5 Empire Wars 1v1

DE Top 5 Empire Wars Team Game

Voobly Top 5 RM 1v1

Voobly Top 5 RM Team Game

Voobly Top 5 DM 1v1

Voobly Top 5 DM Team Game

Share