I decided to start analyzing expert players recorded games from a statistical and scientific point of view. I will be analyzing games from Voobly. For this purpose I wrote a script that daily downloads game recordings from Voobly into my own database. The reason to have my own database is that Voobly removes game recordings after about 2 months. I started collecting game recordings on October 15th 2017 and have collected about 20 000 games from 30 top players (by my choice) by the end of May 2018.
To analyze the games I am using the AoC-mgz parser written by happyleaves, which can be found here: https://github.com/happyleavesaoc/aoc-mgz
Big thanks to happyleaves for writing this parser since it made my work much easier and quicker. Despite that it still takes a lot of time to write the algorithms needed to collect and analyze the data I am interested in. Currently I can only do resources analysis on 1v1 Arabia map but I’m planning to expand the capabilities of my analyses to other maps (e.g. Arena) and multiple players (e.g. for analyzing team games).
Let’s start with the first analysis...
Analysis #1: The Viper’s maphax on Arabia
For my first analysis I decided to test the well known myth of TheViper’s maphax. The idea was to check if the distribution and the quality of resources on the map was really in TheViper’s favor.
I analysed 212 game recordings of games played by TheViper between October 15th 2017 and May 20th 2018. All of the games were played on the map Arabia between 2 players (1v1).
I decided to test properties of the following resources on the map:
In addition I split the Gold and Stone to primary and secondary clusters. For this analysis I did not take into account the gold and stone clusters with 3 piles of resource since they are usually located quite far from both players.
Gold, stone and berries
As mentioned before I split the gold and stone to primary (7 gold or 5 stone) and secondary (4 gold or stone) clusters.
The properties I check on this two types of resources were:
I will briefly explain these properties with visual aid below.
Image 1: Distance of the resource from the TC.
Image 2: Angle. Straight blue line represents axis between player's and opponent's TC. Angle is defined
between 0 and 180 degrees.
Image 3: Free edges ratio - number of blue squares over total number of neighbouring squares.
Image 4: Obstructions: Number of trees on the straight path between the TC and
the resource.
Image 5: Bad elevation: Similar to free edges. Non-flat squares are red and flat squares are blue.
Deer
Location of deer on the map and the number of deer is somewhat random. Sometime deer can be quite far away from the starting TC and it is not viable to lure them to your TC. In this regard I only took into account deer that are located within 30 tile distance radius from starting TC. Since the deer groups differ in numbers and are not stationary I decided not to cluster them and I consider every deer as a separate resource. There are only 2 attributes a deer can have which are distance and number of obstructions to the TC.
Wood
Just like deer, locations and amount of wood in the vicinity of the starting TC is also random. Unlike deer, trees are not freely moving around which means it makes sense to cluster them together into “forests”. Single trees (stragglers) are not considered to be a forest. The question is who is the owner of a forest since wood ownership can not be determined as easily as gold, stone or berries. I decided to give ownership of a forest to a player whose TC is closer than 45 tile distance to the center of the forest. In addition to the distance and angle property of each cluster I added a few forest specific ones:
Results
Once I had the data I did some exploration and created charts to visually compare the values of attributes.
I will not show you every single chart since there are too many (+ this forum only allows max 10 image uploads) but I will show the most interesting ones. You can find all the charts in this google doc:
On variables that are normally distributed I also performed statistical t-test. T-test is used to test if the average values on 2 sets of values (e.g. TheViper's primary gold distance and Other's primary gold distance) are statistically different. The test is commonly used by scientist to reject hypotheses questions (e.g. Is TheViper's primary gold closer/further to his TC that Other's primary gold is to their TCs?). T-test score of less than 0.05 are (usually) considered statistically significant and I will only show t-test score where this occurs.
Image 6: Secondary gold - bad elevation. It seems that TheViper's secondary gold is more often surrounded with flat surface.
Image 7: The same thing could be said for primary stone.
Image 8: Deer obstructions on path to TC if there are any. It is clear that the TheViper's deer that have obstructed path have less obstructions that the Other's deer. The numbers also confirm that:
TheViper's deer average obstructions: 4.58
Other's deer average obstructions: 5.54
t-test score: 0.0054
The difference in averages is quite significant.
Image 9: Another example of statistically significant difference. This time in the Other's favour. Probably harder to see at first but
the Other's have on average bigger wood patches.
TheViper's wood patch average size: 101.78
Other's wood patch average size: 105.53
t-test score: 0.023
This in my opinion the most interesting charts. You can find the rest of them in the google doc:
https://docs.google.com/document/d/1OR_rcgvL0edPj4K2hjiG9O3RnTCJrwOKI1H9kvkp4Wo/edit?usp=sharing
To conclude, it does not seem that TheViper is using "maphax", since there is not much evidence for it. Images 6, 7 and 8 do seem to show some properties to TheViper's favor, but there are others that suggest otherwise (image 9 and some other images in the google doc).
Please let me know what you think about this analysis. If you have any questions, suggestions write it in the comments or send me a PM.
To analyze the games I am using the AoC-mgz parser written by happyleaves, which can be found here: https://github.com/happyleavesaoc/aoc-mgz
Big thanks to happyleaves for writing this parser since it made my work much easier and quicker. Despite that it still takes a lot of time to write the algorithms needed to collect and analyze the data I am interested in. Currently I can only do resources analysis on 1v1 Arabia map but I’m planning to expand the capabilities of my analyses to other maps (e.g. Arena) and multiple players (e.g. for analyzing team games).
Let’s start with the first analysis...
Analysis #1: The Viper’s maphax on Arabia
For my first analysis I decided to test the well known myth of TheViper’s maphax. The idea was to check if the distribution and the quality of resources on the map was really in TheViper’s favor.
I analysed 212 game recordings of games played by TheViper between October 15th 2017 and May 20th 2018. All of the games were played on the map Arabia between 2 players (1v1).
I decided to test properties of the following resources on the map:
- Gold
- Stone
- Berries
- Deer
- Wood
In addition I split the Gold and Stone to primary and secondary clusters. For this analysis I did not take into account the gold and stone clusters with 3 piles of resource since they are usually located quite far from both players.
Gold, stone and berries
As mentioned before I split the gold and stone to primary (7 gold or 5 stone) and secondary (4 gold or stone) clusters.
The properties I check on this two types of resources were:
- distance from the TC,
- angle in respect to the axis between player’s and the opponent’s TC,
- free edges ratio (not included in the the final analysis),
- number of obstructions on the path between the resource and the TC
- the presence of badly elevated tiles adjacent to the resource tiles
I will briefly explain these properties with visual aid below.
Image 1: Distance of the resource from the TC.
Image 2: Angle. Straight blue line represents axis between player's and opponent's TC. Angle is defined
between 0 and 180 degrees.
Image 3: Free edges ratio - number of blue squares over total number of neighbouring squares.
Image 4: Obstructions: Number of trees on the straight path between the TC and
the resource.
Image 5: Bad elevation: Similar to free edges. Non-flat squares are red and flat squares are blue.
Deer
Location of deer on the map and the number of deer is somewhat random. Sometime deer can be quite far away from the starting TC and it is not viable to lure them to your TC. In this regard I only took into account deer that are located within 30 tile distance radius from starting TC. Since the deer groups differ in numbers and are not stationary I decided not to cluster them and I consider every deer as a separate resource. There are only 2 attributes a deer can have which are distance and number of obstructions to the TC.
Wood
Just like deer, locations and amount of wood in the vicinity of the starting TC is also random. Unlike deer, trees are not freely moving around which means it makes sense to cluster them together into “forests”. Single trees (stragglers) are not considered to be a forest. The question is who is the owner of a forest since wood ownership can not be determined as easily as gold, stone or berries. I decided to give ownership of a forest to a player whose TC is closer than 45 tile distance to the center of the forest. In addition to the distance and angle property of each cluster I added a few forest specific ones:
- size - the number of the trees in the forest
- presence of a pond
- size of the pond
- wood to water ratio
Results
Once I had the data I did some exploration and created charts to visually compare the values of attributes.
I will not show you every single chart since there are too many (+ this forum only allows max 10 image uploads) but I will show the most interesting ones. You can find all the charts in this google doc:
On variables that are normally distributed I also performed statistical t-test. T-test is used to test if the average values on 2 sets of values (e.g. TheViper's primary gold distance and Other's primary gold distance) are statistically different. The test is commonly used by scientist to reject hypotheses questions (e.g. Is TheViper's primary gold closer/further to his TC that Other's primary gold is to their TCs?). T-test score of less than 0.05 are (usually) considered statistically significant and I will only show t-test score where this occurs.
Image 7: The same thing could be said for primary stone.
Image 8: Deer obstructions on path to TC if there are any. It is clear that the TheViper's deer that have obstructed path have less obstructions that the Other's deer. The numbers also confirm that:
TheViper's deer average obstructions: 4.58
Other's deer average obstructions: 5.54
t-test score: 0.0054
The difference in averages is quite significant.
Image 9: Another example of statistically significant difference. This time in the Other's favour. Probably harder to see at first but
the Other's have on average bigger wood patches.
TheViper's wood patch average size: 101.78
Other's wood patch average size: 105.53
t-test score: 0.023
This in my opinion the most interesting charts. You can find the rest of them in the google doc:
https://docs.google.com/document/d/1OR_rcgvL0edPj4K2hjiG9O3RnTCJrwOKI1H9kvkp4Wo/edit?usp=sharing
To conclude, it does not seem that TheViper is using "maphax", since there is not much evidence for it. Images 6, 7 and 8 do seem to show some properties to TheViper's favor, but there are others that suggest otherwise (image 9 and some other images in the google doc).
Please let me know what you think about this analysis. If you have any questions, suggestions write it in the comments or send me a PM.