Trying to make a universal rating scale for Go

<-- want dark text on light background? give it a try

I tried making a linear regression between the https://www.goratings.org/ rating and the EGD rating, but the correlation is only 0.4572, 95%-confidence interval for the slope is [0.2237,0.36], not good enough.

I tried making a linear regression between the https://www.goratings.org/ rating and the EGD rating. I chose just 5 players who have more than one recorded win and loss on goratings.org, spread over many years, and that also have a solidly established EGD rating spanning many years. I got the following linear regression: EGD-rating=1888+0.2919*(goratings.org rating), but the correlation is only 0.4572, p-value is 2.109 * 10^-15. The 95%-confidence interval for the slope is [0.2237,0.36]. The scatter plot:

Because of the lack of overlap between professional and amateur Go players and their two incompatible rating systems, I wanted to glue the rating systems together using a linear regression on rating data of a few players who appear in both https://www.goratings.org/ and the EGD . First I used the following datapoints:

I have complained before about how there is little overlap between professional and amateur Go players. This lack of overlap is not just a lack of interaction between the two groups, but there isn't even a universal rating system for rating all players. Instead there is one system for amateurs and another one for professionals and these two aren't even comparable. The system for professionals doesn't even measure playing strength. Professional ranks are instead awarded for somewhat arbitrary achievements (even achievements that are unrelated to actual play, as honorary awards) and usually never taken away.

I would love to have a universal rating scale like FIDE's Elo system. I applaude goratings's attempt at creating a numerical rating scale for professionals that measures actual playing strength. But the site doesn't rate the vast majority of amateurs, so it's still not universal. Some of the players in goratings.org also appear in the EGD and so I could try to glue the two together using these players.

I searched a few players that would be suitable and tried to make a linear regression between their goratings.org rating and their EGD rating. First I used the following datapoints:

player goratings.org rating EGD rating date
Mateusz Surma 2918 2667 2015-12-17
2909 2720 2016-10-31
2909 2720 2016-11-04
Pavol Lisy 2906 2652 2013-12-13
2904 2721 2014-10-23
Ilya Shikshin 2864 2744 2013-12-16
2934 2769 2013-09-03
2941 2787 2017-06-11
2953 2793 2017-10-13
2956 2797 2017-12-12
2966 2802 2018-06-13
2971 2812 2019-10-10
Hans Pietsch 3048 2704 1997-06-27
Catalin Taranu 2840 2816 2002-06-27
2838 2816 2002-10-24
2838 2816 2002-11-14
Alexander Dinerchtein 3085 2740 2003-04-21
3086 2740 2003-06-17
Ryan Li 3058 2771 2016-03-01
3069 2771 2017-06-19
3069 2771 2017-06-21
Ali Jabarin 2913 2692 2014-10-27
Artem Kachanovskyi 2908 2718 2016-10-31
2908 2713 2016-10-02
2908 2758 2018-06-16
Benjamin Lockhart 2866 2693 2016-06-06
Fernando Aguilar 2912 2696 2002-03-19
2911 2696 2002-09-02
Antti Tormanen 2945 2707 2016-07-28
2945 2707 2017-06-15
2968 2707 2022-12-19
2970 2707 2023-06-29
Fan Hui 3021 2812 2013-12-13
3028 2807 2014-12-13
Andrii Kravets 2828 2682 2016-06-05
Stanislaw Frejlak 2840 2704 2021-06-08
player goratings.org rating EGD rating date
Mateusz Surma 2918 2667 2015-12-17
2909 2720 2016-10-31
2909 2720 2016-11-04
Pavol Lisy 2906 2652 2013-12-13
2904 2721 2014-10-23
Ilya Shikshin 2864 2744 2013-12-16
2934 2769 2013-09-03
2941 2787 2017-06-11
2953 2793 2017-10-13
2956 2797 2017-12-12
2966 2802 2018-06-13
2971 2812 2019-10-10
Hans Pietsch 3048 2704 1997-06-27
Catalin Taranu 2840 2816 2002-06-27
2838 2816 2002-10-24
2838 2816 2002-11-14
Alexander Dinerchtein 3085 2740 2003-04-21
3086 2740 2003-06-17
Ryan Li 3058 2771 2016-03-01
3069 2771 2017-06-19
3069 2771 2017-06-21
Ali Jabarin 2913 2692 2014-10-27
Artem Kachanovskyi 2908 2718 2016-10-31
2908 2713 2016-10-02
2908 2758 2018-06-16
Benjamin Lockhart 2866 2693 2016-06-06
Fernando Aguilar 2912 2696 2002-03-19
2911 2696 2002-09-02
Antti Tormanen 2945 2707 2016-07-28
2945 2707 2017-06-15
2968 2707 2022-12-19
2970 2707 2023-06-29
Fan Hui 3021 2812 2013-12-13
3028 2807 2014-12-13
Andrii Kravets 2828 2682 2016-06-05
Stanislaw Frejlak 2840 2704 2021-06-08

Unfortunately the correlation is just 0.1886 and the p-value is 0.2707, i.e. not a statistically significant difference from a random cloud of data samples. See the scatter plot here:

The dates are from when these players defeated Asian professionals. I hoped that around that time, the players' goratings.org rating would be relatively accurate. The rating can only be accurate for players who have both won and lost games, ideally not years apart. Unfortunately, the correlation was quite bad. Using this, I got a correlation of 0.1886 and a p-value of 0.2707, i.e. not a statistically significant difference from a random cloud of data samples. The 95%-confidence interval for the slope is [-0.1001,0.3457]. That is too big to be useful. See the scatter plot here:

I tried again with other data points. I chose just 5 players (Mateusz Surma, Alexandre Dinerchtein, Ilja Shikshin, Ali Jabarin, Artem Kachanovskyi) who have more than one recorded win and loss on goratings.org, spread over many years, and that also have a solidly established EGD rating spanning many years. I got the following linear regression: EGD-rating=1888+0.2919*(goratings.org rating), but the correlation is still only 0.4572, p-value is 2.109 * 10^-15. The 95%-confidence interval for the slope is [0.2237,0.36]. The improved scatter plot:

I tried again with other data points. I chose just 5 players (Mateusz Surma, Alexandre Dinerchtein, Ilja Shikshin, Ali Jabarin, Artem Kachanovskyi) who have more than one recorded win and loss on goratings.org, spread over many years, and that also have a solidly established EGD rating spanning many years. The hope is that these players have ratings that are closer to their true ratings. I used more than a dozen data points for each of those players and calculated the linear regression with the same calculator. The best fit is EGD-rating=1888+0.2919*(goratings.org rating), but the correlation is still only 0.4572. At least the p-value is 2.109 * 10^-15, so at least we know the correlation is real. The 95%-confidence interval for the slope is still [0.2237,0.36] (too large for my taste) . At least the best-fit slope is close to 0.333, which is what we would expect from the traditional relationship between amateur and professional ranks, namely that a 9dan pro can give a 3-stone handicap to a 1dan pro. But that could be coincidence, I don't know whether the goratings.org rating was intended to have 100 points difference between successive professional ranks.

The low certainty is caused by all the players being so close in rating and the goratings.org rating being based on so few data points. A better regression would require some of the weakest Asian pros playing many games that are EGD-rated, until they have a well-established EGD rating. That is unlikely to happen. The alternative would be that one of the European pros becomes so strong that he or she is able to defeat strong Asian pros regularly. I don't have much hope for that happening either. There are of course far more Asian pros in the EGD, but their EGD rating can't be used. The problem is that the EGD rating of strong players changes only very slowly and only once per event. And the initial EGD rating is essentially random, being based on what the player claims (which sometimes has little relation to the actual playing strength). That means it can take dozens of events for the rating to converge to its true value. Just look at this for example: The rating rarely changes by more than 10 points per event. That means that for the EGD rating to fall by 100 points (the difference between 5dan and 6dan), the player would have to play more than 10 events, all resulting in rating losses. In reality it took Guo Juan almost 80 events to lose just 60 rating points. An Asian pro recorded for just 1 or 2 events can't possibly have an accurate EGD rating thanks to this slow change.

The improved scatter plot:

One possible alternative would be online play, but I don't know any accounts of strong professionals that regularly play ranked games online.

One possible alternative would be to use online play. But that requires all players to play on the same platform and it requires for a lot of players to have a known online account on that platform. Online platforms don't have a strict separation of amateurs and pros and allow for playing a lot of games. But without knowing who is who, that online rating can't be related to any offline rating. And of course online ratings don't fit offline ratings perfectly, so I would have to do at least two linear regressions, each of which has an error term. So far, I don't know online account names of even just one strong pro (I know of https://senseis.xmp.net/?KGSHighDanPlayers , but that's not useful - I need acccounts that still exist and play ranked games regularly) , so I have to postpone that idea.

Written by the author; Date 11.05.2026; © 2026 spinningsphinx.com

Paralinguistic/connotation key:
  • Mocking
  • Sarcasm, e.g. "Homeopathy fans are a really well-educated bunch"
  • Statement not to be taken literally, e.g. "There is a trillion reasons not to go there"
  • Non-serious/joking statement, e.g. "I'm a meat popsicle"
  • Personal opinion, e.g. "I think Alex Jones is an asshole"
  • Personal taste, e.g. "I like Star Trek"
  • If I remember correctly
  • Hypothesis/hypothetical speech, e.g. "Assuming homo oeconomicus, advertisement doesn't work"
  • Unsure, e.g. "The universe might be infinite"
  • 2 or more synonyms (i.e. not alternatives), e.g. "aubergine or eggplant"
  • 2 or more alternatives (i.e. not synonyms), e.g. "left or right"
  • A proper name, e.g. "Rome"
One always hopes that these wouldn't be necessary, but in the interest of avoiding ambiguity and aiding non-native English speakers, here they are. And to be clear: These are not guesses or suggestions, but rather definite statements made by the author. For example, if you think a certain expression would not usually be taken as a joke, but the author marks it as a joke, the expression shall be understood as a joke, i.e. the paralinguistic/connotation key takes precedence over the literal text. Any disagreement about the correct/incorrect usage of the expression may be ascribed to a lack of education and/or lack of tact on the part of the author if it pleases you.