Yet another tournament elo suggestion - Third version!

DaisyChain · Aug 10, 2021

First off, if you checked this before, know that I improved it. I'm currently in the third version, and I think it gives excellent results!

Introduction

There have been more and more suggestions these last months to create a consistent and reliable ranking of players that uses competitive results instead of ladder. We all agree that ladder is completely irrelevant (Liereyy is 51st on ladder as I write this, that's enough proof). There are enough competitive games on DE to create granularity without needing ladder, especially at the highest level.

I personally find the work done by aoe-elo fascinating and that's definitely the basis of my own work. I still have some issues with the way they operate and I think there are a few flaws within their system. I've been working on a system I find more accurate, even if you could probably find problems with what I designed.

In the end, it's not so much intended as a pure official way to seed players, and more as a way to discuss elo and rankings, and also to dwelve in DE history.

What I kept

I use the elo system, as it's universally accepted as the best way to rank players. I think everyone understands elo, but here is a basic summary.

Basically, you gain points for each win, and lose points for each loss. You gain/lose as many points as your opponent loses/gains. That's exactly how it works on ladder.

That number of points won/lost depends on the relative strength of the two players: if you are a heavy favourite, you will win less points for a win and lose more points for a loss.

The exact number of points depends on something called K-factor. A K-factor of 32 means up to 32 points are distributed: two even opponents gain/lose 16 points, and then it scales up to a theoretical +32/-32.

Just like aoe-elo, I consider games individually, and not series. This way, a 3-0 win is more rewarding than a 3-1 win or a 3-2 win. This makes it more accurate and rewards players who hold their ground against a theoretically stronger opponent. The consequence of that is that, just like on aoe-elo, we sometimes end up with someone winning a set 3-1 or 3-2 but losing point if he was a heavy favourite, because that's treated as 3 wins and 1/2 loss. Once you're very high, it becomes extremely hard to keep climbing, even if you win almost every game.

K-factor and importance of games

An important difference with aoe-elo is that I only used results from DE. There are enough games on DE to have accurate ratings, and the data is also easier to find. I'm also doing that alone ^^

One of the main issues I have with aoe-elo is that all games are weighted equally. That should not be the case: the first round of a qualifier is not as important as the final of an S-tier tournament. Series have no weighting either, so the length of the series actually indirectly decides their importance: a 6-4 set is not necessarily twice as important as a 3-2 set, even if that tends to be the case. When that's the case, that should be because it's a final of a big tournament, not because it has more games.

Thus, my most important change is that I gave a different K-factor to sets based on the round, the number of games of the set, the tier of the tournament, and some other factors in specific circumstances.
That also allows me to consider showmatches, which is not the case in aoe-elo. Their K-factor is kept low, but I think it's obvious that a 1000$ showmatch is treated more seriously by the players than the first round of a100$ tournament, so having it included seems obvious.
I obviously don't consider for-fun showmatches and extremely weird settings.

I also take into account every game, but I average the games of a set. This way a 2-1 and a 4-2 results are as impactful (but usually BO7 tend to be in important sets so they have a higher K-factor naturally).

It's easier to look at an example with the first game of DE:

TaToH and DauT played a BO9 showmatch.
That kind of showmatch without an especially high prizepool has a K-factor of 8 in my ranking.
TaToH and DauT both start at 2000 elo, like every player: this means they each have a 50% chance of winning each game and can gain a maximum of 4 elo.
By winning 5-1, TaToH gains 4-(1/5*4) which is average to 3 points. As you see, it's not that impactful and it's safe to include showmatches.

Extra points added

There's another potentially controversial change I made. The issue is that top players tend to only face other top players. They don't match up often enough with lower players. There's a real issue with qualifiers: taking part in a qualifier is a huge boost to your aoe-elo ratings, even if you lose. That's something I definitely wanted to change.

The best thing I came up with is that I gave a boost to players who are invited/who autoqualify to an event, based on the K-factor of the qualifier, to compensate for the fact that they did not have the opportunity to gain points by qualifying. That breaks the strict rule of the elo system where the sum of points should not change, but that gives me extremely satisfying results. That compensates for the fact some players almost exclusively play in big tournaments and avoids overevaluating players who take part in numerous smaller tournaments and qualifiers. Before adding that, I had run several months and Max was way too low (he was invited to NAC3 and HC3 but did not really perform, even if he proved he deserved his top 16 spot by qualifying to RBW1) and a player like Vinchester was 6th (invited nowhere, but started qualifying everywhere).

That also creates ladder inflation over time, which I would argue is a good thing here. With everyone starting at 2000, I aim for the average of the top players to be at least 2100, which takes time and is significantly helped if I inject points before events. I'll probably decrease the number of points I inject after some time, or entirely remove it, but I found that it really helped early ratings.

Rankings

I'm honestly pretty happy with the early results, you can find them in "Archives" just below.

I give my first real ranking right after. There are still flaws to discuss and like any system it has its outliers, but this looks very promising. I think the basis is very good. I'll run it until at least the end of 2020 and see if I'm satisfied with that version.

Here is my first provisional ranking, after NAC3:

I only included the 9 players who took part in NAC3, but I think it's extremely interesting to see how quickly players moved even if everyone started at 2000 one month earlier.
Viper is the first player to break 2100.

Here is the second provisional ranking, with the 16 players qualified for HC3 right when qualifiers end:

Hera was 1 point over Viper for one day, but otherwise Viper was firmly ahead. Hera still has a lot of points; he traded wins with Viper during the many A-B tier events of January 2020 like Bonjwa Fight Club, e-Paradise, Fair Civs Cup or Empires Showdown.
I think everything else looks really accurate. 7 of my top 8 will reach quarters in HC3.

Here is the third provisional ranking, with the same 16 players, but after HC3 (and a few events/showmatches):

Viper completely dominates the competition, he is the first player to break 2200. The top 8 is the top 8 of HC3, with Villese and Nicov being the best players outside of that top 8 which looks pretty accurate. Yo is a bit low because he did not take part in smaller events and had unlucky brackets, but nothing extraordinary.

Here is the fourth provisional ranking, including the 16 players who will take part in RBW1 and the 2 players who took part in HC3 but won't participate in RBW1:

This is extremely provisional as some players will quickly lose points during the first round of RBW1, but it's interesting to remember that BacT's start to DE was terrible. Viper also has an absolutely insane lead of 99 points over Hera, meaning it's almost impossible for him to maintain his rating without extraordinary performances.

Here is the fifth provisional ranking, displaying a change of dynamic:

Viper climbed to 2265 elo after his 3-0 win to Hera in semis, with a 114 points gap to Hera and a 119 points gap to Yo. But that quickly changed in the seismic final of RBW1 where Yo gained 42 points and Viper won 42 points, the single most impactful set in DE.
That gives an extremely interesting an accurate rating. Viper is still first, Yo is clearly second, Hera clearly third, Liereyy clearly fourth, TaToH clearly fifth, MbL and DauT sixth and seventh but close. This honestly looks very similar to what I remember from that time.
dogao is still eighth from Hidden Cup, LaaaaaN is ninth from reaching top 8 in RBW1, and Nicov is tenth as the strongest player who has not yet made it into a top 8.

After those provisional ratings, I give my first real ranking at the end of Visible Cup, in early June 2020. It includes 32 players who either took part in an S-tier event or played at least 10 competitive games on DE.

There's a lot to unpack.

I talked about the top players in "archives", but overall I think it's hard to argue with that top 8. It's simply HC3 top 8, with Viper still considered the best despite his loss of RBW1, Yo looking like a clear second, Hera like a clear third, Liereyy like a clear fourth, TaToH like a clear fifth, MbL and DauT pretty close at sixth and seventh, and dogao the most questionable inclusion because he reached top 8 in HC3 but not in RBW1.

Things start changing from ninth place, with the rise of Vinchester. It is entirely deserved: he qualified for RBW1 by beating Vivi, he won Visible Cup convincingly (4-1 in the finals) and he even beat TaToH in the finals of a Hun War tournament. You could maybe argue he should be around 12th, but he clearly deserves a solid spot.

If we look at top 20 (all the players above 2k), it's actually almost perfect. It includes all the players who qualified to the first three S-tier events of DE, with one exception. Vinchester only took part in one event but I explained his momentum so ninth is ok-ish, otherwise the top 14 is 13 players who took part in both HC3 and RBW1.
Among them, LaaaaaN ranks first (so 10th) after reaching QFs in RBW1, Nicov is 11th as the strongest player who did not reach a QF, then others are pretty close from one another. 15th is Vivi who missed RBW1, 16th is Max who qualified to RBW1 but in the second qualifier and who had slightly worst performances overall, those make sense. Slam qualified to RBW1 but got swept first round and barely played, so he deserves to be on the lower end of those players. That leaves Barles as 17th who is the Visible Cup finalist, Belgium as 18th who reached semis, and the other semifinalist was Kasva who only played nine games so isn't included, but is currently at 2004.

The outlier there is repard: he is 19th mostly thanks to RusAOC. I lowered their K-factor over and over to compensate for the uneven games and regional restrictions, but they still have some impact. I don't want to just not include them, that seems ridiculous.
That's one of the many reasons why I think some ladder inflation from S-tier events is good: you'll never gain 100 points from RusAOC alone, so if the average of top players is at 2100, you need to actually beat top players to reach that. I expect that to be fixed after a few more months.
dench being twenty-third comes from RusAOC too.

The one who should be probably be in that top 20 instead of repard is BacT. The main issue here is that he only played 9 sets, and he lost a lot of them. He failed to qualify in both RBW1 qualifiers, losing to Luca and to Lyx who are far from the biggest names. Even if he qualified for HC3, he simply can't have a high rating with 2 huge underperformances over only 9 sets. He just needs to play more and slowly regain elo.

I changed quite a lot of things since the first version, so I'm more than happy to receive feedback.

Michaerbse · Aug 10, 2021

Tough to read the entire post but I hope I caught the relevant parts. :cool:

As usual I'm a big fan of statistics and like the attempts to find ratings that are as accurate as possible (whatever this even means).

I like the idea of different K factors for different tournaments, although it will be very difficult to find an appropriate balance.

Some things I'm not a big fan of:

DaisyChain said:
Another issue I have with tournament-elo is that longer series are always more impactful: longer series tend to be more important because they take place in the later stages of a tournament, but this is why they're more impactful, not because they have more games. A S-tier tournament having a BO9 final is not more important than one with a B05 final.
After calculating the amount of elo gained/lost based on games won, I divide it by the number of games needed to win (so I divide it by 2 for a BO3, by 3 for a BO5, by 4 for a BO7, by 5 for a BO9). This ensures that longer series are not more impactful.

Disagree with that. Longer series should be more impactful because they simply carry more meaning. Beating someone 1-0 doesn't give much more intel than "player A has at least a small change to beat player B". Beating someone 5-0 tells a very different story and should lead to a bigger change.

DaisyChain said:
I think including showmatches makes sense: they are indirectly less impactful because it's one set instead of several in tournaments, and there's probably more tryharding involved from the top players in a 1000$ showmatch than in an earlier round of a 200$ tournament.

For aoe-elo we discussed this multiple times and always ended up not including showmatches. Apart from the obvious difficulties to even collect data - there is an insane amount of showmatches, many of which are not properly announced or documented - we never figured out a way to properly decide on their relevance. Money is an indicator but there are multiple examples where money did not play a significant role. E.g. I recall a Bo21 between DauT and Viper sponsored by MattSalsa, where afterwards he threw in another few thousand dollars to make it a Bo37 including stuff like Forest Nothing etc. Price money-wise it was a super important match but it was still a full troll match. One reason was of course that it was quite clear Viper would win anyway so you can argue that the result (guess what, Viper won) reflects their actual skill but I still struggle with including matches that both players don't take serious.

DaisyChain said:
Then came the question of which ratings to start with. I decided from the start to only use DE events for my rankings, but it's obvious that players had to start with different ratings when DE was released. I used tournament results of the previous years, my own memories and videos like Nili's rankings to have an idea of what to do.

I really dislike this tbh. There will never be a "right" way to judge players so with this you always manipulate the ratings the way you'd like them to be. If you develop a proper system, the ratings will settle themselves sooner or later. Ratings should start either at an objectively set initial rating or - like "real" Elo in chess - at the rating performance you play in your first x games. The latter is tough to realize though in AoE2 with many tournaments being single elimination.

DaisyChain · Aug 10, 2021

That's a very welcome feedback to my first model. While I disagree with some points, that gave me food for thought.
I made changes and came up with a second model. First post has been entirely updated.

DaisyChain · Aug 12, 2021

I ran my first model until early July 2020 and I'm extremely happy with the way it works for top players.

For all the good things regarding my top 16/top 24, I'm pretty unhappy with how medium players are rated.

I scrapped the model entirely and rewrote it to make all players start at 2000. There will be things I really dislike and a real luck factor in who you draw, but let's hope that evens out in the long term.

That still left me with the issue I have with everyone starting at 2000: the need to reward top players in a fair and consistent way and show that they are better, even if they don't play enough games vs lower players to prove it.
I came up with a solution that I explained in my initial post. I edited everything there and basically restarted from scratch, and I think I now have a much better approach that will give better ratings, even if it will take more time to adjust than my first model.

DaisyChain · Sep 22, 2021

I had posted the first version and received good feedback, I changed a lot of things.

This third version of my rankings is the most promising for me. My ranking is still less organic than purely using elo, but I find it extremely accurate.

That takes into account all games until the finals of Visible Cup 2020, in early June 2020. It includes the 32 players who played 10 sets by that point or had played in an S-tier event.

I will obviously complete that with later events, but that's a good starting point for me.

I explained the methodology in the initial post, and also discussed the outliers and why I think they'll be fixed in later ratings. I copy that in spoiler below.

There's a lot to unpack.

Overall I think it's hard to argue with that top 8. It's simply HC3 top 8, with Viper still considered the best despite his loss of RBW1, Yo looking like a clear second, Hera like a clear third, Liereyy like a clear fourth, TaToH like a clear fifth, MbL and DauT pretty close at sixth and seventh, and dogao the most questionable inclusion because he reached top 8 in HC3 but not in RBW1.

Things start changing from ninth place, with the rise of Vinchester. It is entirely deserved: he qualified for RBW1 by beating Vivi, he won Visible Cup convincingly (4-1 in the finals) and he even beat TaToH in the finals of a Hun War tournament. You could maybe argue he should be around 12th, but he clearly deserves a solid spot.

If we look at top 20 (all the players above 2k), it's actually almost perfect. It includes all the players who qualified to the first three S-tier events of DE, with one exception. Vinchester only took part in one event but I explained his momentum so ninth is ok-ish, otherwise the top 14 is 13 players who took part in both HC3 and RBW1.
Among them, LaaaaaN ranks first (so 10th) after reaching QFs in RBW1, Nicov is 11th as the strongest player who did not reach a QF, then others are pretty close from one another. 15th is Vivi who missed RBW1, 16th is Max who qualified to RBW1 but in the second qualifier and who had slightly worst performances overall, those make sense. Slam qualified to RBW1 but got swept first round and barely played, so he deserves to be on the lower end of those players. That leaves Barles as 17th who is the Visible Cup finalist, Belgium as 18th who reached semis, and the other semifinalist was Kasva who only played nine games so isn't included, but is currently at 2004.

The outlier there is repard: he is 19th mostly thanks to RusAOC. I lowered their K-factor over and over to compensate for the uneven games and regional restrictions, but they still have some impact. I don't want to just not include them, that seems ridiculous.
That's one of the many reasons why I think some ladder inflation from S-tier events is good: you'll never gain 100 points from RusAOC alone, so if the average of top players is at 2100, you need to actually beat top players to reach that. I expect that to be fixed after a few more months.
dench being twenty-third comes from RusAOC too.

The one who should be probably be in that top 20 instead of repard is BacT. The main issue here is that he only played 9 sets, and he lost a lot of them. He failed to qualify in both RBW1 qualifiers, losing to Luca and to Lyx who are far from the biggest names in 1v1. Even if he qualified for HC3, he simply can't have a high rating with 2 huge underperformances over only 9 sets. He just needs to play more and slowly regain elo.

Memeluke · Sep 22, 2021

Tg-only player
signs up to SBARRO for fun
wins it
refuses to elaborate further
leaves
#24 best 1v1 player in the world

Tarsiz · Sep 22, 2021

Accurate? Not quite the word I would have used...

Any ranking that doesn't put Liereyy in the top 2 fails to catch the dynamics of the past 12 months. He has been, all across the board, the most consistent player. The only thing to note is that he tends to lose a lot of finals... But he consistently makes finals, unlike Yo, Hera and Viper who all made and won one major final.

DauT is even a more outlying case, he reached and won a major final on the only occasion he's gone past the quarterfinals. Not consistent, but with a really high peak.

Tatoh and MbL seems to benefit from recency bias as they made semis in the latest majors. JorDan is not even in your list?

As we get down there are more and more inconsistencies. How on earth is ACCM 13th behind F1Re when he has made major quarterfinals more often than F1Re qualified for majors? By results alone he should be firmly into the top 10.

Also repard at #19 and dench at #23, lol.

DaisyChain · Sep 22, 2021

I don't think you quite read it was in May 2020 :smile:

I edited to make it slightly clearer. But you had a lot to say for something you did not read.

Tarsiz · Sep 22, 2021

DaisyChain said:
I don't think you quite read it was in May 2020
I edited to make it slightly clearer. But you had a lot to say for something you did not read.

Either you reposted the old version or your rating has not improved since May 2020...

DaisyChain · Sep 22, 2021

Yeah, whatever. I'll let you re-read that as many times as you need to understand.

DaisyChain · Sep 22, 2021

To make it clear, I edited the post, although I think it was pretty obvious what I meant.

I'll obviously post a followup post once I reach another important landmark: probably somewhere in late 2020. And after a few more dozen hours on this I'll obviously try to have it up to current date. I just wanted to share that interesting state.

archxeon · Sep 22, 2021

It was not clear to me as well either. There was a date saying May 2020 and we were supposed to assume it was the cut off date and it was not clear if it was the start date or end date. Now its better.

Anyways, even if we assumed correctly, its hard to judge accuracy for something that takes into account something going on in a specific range of time more than a year ago.

Clear AND concise might help you get better feedback. :smile:

Also, a bit of humility is a good thing.

DaisyChain · Sep 22, 2021

I guess it was poorly phrased then.

Menu

Shortcuts

Tournaments

Recorded Games

Today's birthdays

Yet another tournament elo suggestion - Third version!

Halberdier

Halberdier

Halberdier

Halberdier

Halberdier

Champion

Champion

Halberdier

Champion

Halberdier

Halberdier

Longswordman

Halberdier

Time

Calendar

AoE Live Streams

DE Top 5 RM 1v1

DE Top 5 RM Team Game

DE Top 5 Empire Wars 1v1

DE Top 5 Empire Wars Team Game

Voobly Top 5 RM 1v1

Voobly Top 5 RM Team Game

Voobly Top 5 DM 1v1

Voobly Top 5 DM Team Game

Share