Pan-atlantic Shogi rating-grade-handicap system

Pan-atlantic Shogi rating-grade-handicap system
(proposal)

1. Objective

A unified pan-atlantic (European/American) integrated Elo rating, dan/kyu grade, handicap system. To achieve this the following issues have to be addressed:

Tournament practices on either side of the Atlantic (section 4).
Correspondence between Elo ratings and dan/kyu grades; correspondence between handicaps and Elo rating differences (section 5).
Assignment of Elo ratings; promotion based on Elo ratings (section 6).

2. Revision history

Version 1 (6 July 1997): first draft.
Version 2 (3 August 1997): second draft; limited distribution to Kaufman, Cheymol, Grimbergen, Secelle, Casters, Fernandez.
Version 3.0 (5 October 1997): third draft; changes in section 6 proposed by Cheymol and Kaufman.
Version 3.1 (26 October 1997): main modification: change of lowest Elo to 1 rather than 500.
Version 3.2 (28 November 1997): formulation issues resolved.
Version 4.0 (26 July 1998): promotion issues and rating calculation for unrated/provisional players resolved. This draft to be presented to FESA and SFA for feedback.

3. Background

Five years ago John Kenney and I made a first pass at establishing a correspondence table between dan/kyu grades and Elo ratings. Our ultimate goal was to produce a promotion system based on Elo ratings as the (then and now) current point system that I had devised is inflationary by its very nature. This is one of the reasons Europe has so many 3-dans and relatively few 1- and 2-dans. The European correspondence table, also has a serious problem with compression of grades as will become clear in section 5. In America, over the past years Larry Kaufman has designed an integrated system that relates grades to ratings and rating differences to handicaps. It has proven to be very stable. Statistical analysis of the limited amount of data available seems to indicate that the correspondence between handicap and rating difference is the same for all strengths. The rating officers on both sides of the Atlantic (Larry Kaufman and Eric Cheymol) and I think the time is ripe to establish an integrated system so that a player's rating and grade mean exactly the same no matter on which continent he plays. I am involved in this matter as I have knowledge of and experience with both the European and the American systems and because unifying the two is a long-standing wish of mine. Considering that calculation of Elo ratings on both sides of the ocean is done almost identically, the potentially highest hurdle to unification has already been taken care of. Larry, Eric and I (as well as Hans Secelle, Reijer Grimbergen and George Fernandez) have discussed the issues extensively and this document is the result of these discussions.

4. Tournament practices

One requirement for arriving at a unified system is that tournament practices on either side of the Atlantic are accepted by the other side. This proposal does not aim to change these practices or impose new rules. There are differences in minimum time allotment that make games ratable (i.e., valid for rating calculations). In Europe this is 45 minutes plus byoyomi, in America 20 minutes plus byoyomi. This proposal does not intend to change these regulations; both sides need only accept that these differences exist and agree that they do not interfere with establishing reliable ratings. Of course, FESA might decide to regard games ratable if the time limits are 20 minutes+30 seconds, while recommending 45+30 in general and imposing 60+30 for Grand Prix tournaments. In America handicap games are an integrated part of the rating system. Again, the intention of this proposal is not that FESA accepts handicap games as ratable in European tournaments, but only that regarding handicap games as ratable in America does not preclude a unified rating system. It would be preferred if these differences disappeared over time, but this is not a requirement for the present purpose of establishing a single rating system. A unified rating system would also enable us to have a single pan-atlantic rating list, but again this is not a necessity. For all practical purposes it is easier to maintain two separate lists as long as the rating of few players that appear on both lists is the same on both lists.

5. Relationship between Elo ratings, dan/kyu grades and handicaps

The Elo rating is the primary indicator of current actual strength, while the dan and kyu grades are titles based on historic peak performances (much like Grand Master, Master and FIDE Master titles in chess). When unifying the current ratings systems, establishing the relationship between Elo ratings and dan/kyu grades is of crucial importance. I will focus on the European situation as the American system is already being brought in line with the system proposed here (see Table I). In Europe in 1992, the Elo difference between average 3- and 4-dans was 145 points; it was 125 on the average for all dan grades. Nowadays, the "theoretical" Elo difference between two subsequent dan grades is 100 points. This indicates that the European grades are too close together in terms of the number of Elo points that separates them. Based on the following observations, Eric Cheymol and I have come to the conclusion that this is indeed the case.

A) In Japan some clubs distinguish between weak and strong 1 dans, 2 dans, 3 dans, and 4 dans. With an Elo difference of only 100 between grades, it is virtually impossible to distinguish between weak and strong players of the same dan grade. Also, European 1-dans and 2-dans have good chances to beat a 3-dan. In Japan, a 1-dan would hardly ever beat a 3-dan if they both received their grades in the same club. Therefore, it seems logical to widen the ranges of Elo ratings that correspond to each dan grade.

B) European kyu and lower dan grades are probably at least 1 grade "harder" than the average Japanese. Assuming that 4-dans/top 3-dans in Europe and Japan are approximately of equal strength, this indicates that the Elo range for each grade below 4-dan is too narrow.

C) Based on his personal experience with playing in Japan, Larry Kaufman believes that 200 Elo points per dan grade is on the conservative side for Japan: the range is probably about 225-250 there and certainly larger than the 100 points now used in Europe. Since America uses 200 points, it seems reasonably to compromise and use 150 points for the lower dan grades.

D) The 2-dan and 1-dan populations in Europe (9 and 11, respectively) each are about half the size of the 3-dan population (17). Also, the actual average ratings for the lower dan grades are systematically lower than the "theoretical" ratings (see Table I). These two observations again indicate that the Elo ranges for the lower dan grades are too narrow.

Taking the above considerations into account and using the divider between 3-dan and 4-dan (2120 in Europe; 2100 in America) as an anchoring point, the pan-atlantic correspondence detailed in Table I is proposed. It follows the European system in that going from 3-dan to 15-kyu the Elo range per grade becomes progressively narrower (pan-atlantic: 200, 150, 100; European: 100, 80, 60, 40). It remedies the problems with the overall too narrow European ranges (as exemplified by items A-D above). Compared to the present European system, it doubles the width of the ranges for the higher dan-grades (4-dan and above). This brings them more in line with Japan and avoids inflation of these higher grades by making it more difficult to break through into the 5/6-dan grades (which are not awarded in Europe (yet?)). With 150 rather than 200 Elo points for the lower dan grade ranges, the system should still lead to grades that are "harder" than the Japanese. Promotion to 1-dan and 1-kyu should be more difficult than promotion within the other kyu grades; the proposed system adequately addresses this by making the 1 and 2-kyu Elo range appropriately wide (150 points).

Table I. Elo rating - dan/kyu grade correspondence
Grade	Elo rating
	Europe		America	Proposed pan-atlantic
	"theoretical"	average *	America	range	width
6-dan	2320 - 2419	--	2500 - 2699	2500 - 2699	200
5-dan	2220 - 2319	--	2300 - 2499	2300 - 2499
4-dan	2120 - 2219	2159 (8)	2100 - 2299	2100 - 2299
3-dan	2020 - 2119	1978 (17)	1900 - 2099	1900 - 2099
2-dan	1920 - 2019	1869 (9)	1700 - 1899	1750 - 1899	150
1-dan	1820 - 1919	1794 (11)	1500 - 1699	1600 - 1749
1-kyu	1740 - 1819	1694 (16)	1400 - 1499	1450 - 1599
2-kyu	1680 - 1739	1575 (9)	1300 - 1399	1300 - 1449
3-kyu	1620 - 1679	--	1200 - 1299	1200 - 1299	100
4-kyu	1560 - 1619	1435 (8)	1100 - 1199	1100 - 1199
5-kyu	1500 - 1559	1353 (4)	1000 - 1099	1000 - 1099
6-kyu	1440 - 1499	1375 (6)	900 - 999	900 - 999
7-kyu	1380 - 1439	1392 (2)	800 - 899	800 - 899
8-kyu	1320 - 1379	--	700 - 799	700 - 799
9-kyu	1260 - 1319	--	600 - 699	600 - 699
10-kyu	1200 - 1259	1182 (7)	500 - 599	500 - 599
11-kyu	1160 - 1199	--	400 - 499	400 - 499
12-kyu	1120 - 1159	1212 (3)	300 - 399	300 - 399
13-kyu	1080 - 1119	1173 (2)	200 - 299	200 - 299
14-kyu	1040 - 1079	--	100 - 199	100 - 199
15-kyu	1000 - 1039	1224 (3)	1 - 99	1 - 99
* Average of the July 1997 Elo list (numbers of players in parentheses). -- indicates 0 or 1 players.

Since overall the Elo ranges are widened in the proposed system, a one-time adjustment of ratings is appropriate. If this were not done, the rating of many lower kyu players would correspond to a grade that is 4 or 5 levels higher then their present one. The following adjustment addresses that issue adequately:

Elos ratings higher than 1900 are not changed;
Ratings between 1900 and 1500 are changed as follows: Elo(new) = 1900 - 1.5 * (1900 - Elo(old));
Ratings between 1500 and 1300 are changed as follows: Elo(new) = 1300 - 2.5 * (1500 - Elo(old));
Ratings between 1300 and 1100 are changed as follows: Elo(new) = 800 - 4.0 * (1300 - Elo(old));
Elo ratings lower than 1100 are set to 1.

The consequences of these adjustments for the present population would be that:

Several 1-kyus and a few 1-dans have good chances for promotion in the foreseeable future.
4 and 3-dan grades have become "harder" in that promotion to these grades seems more difficult within the proposed system than with the FESA point system. Within the point system players could obtain 3-dan promotion by scoring points against 3-dans, who according to their Elo rating perform below that grade. In fact, the inflationary nature of the point system is the main reason for the relatively very large 3-dan population. This indicates very strongly that the point system as a mechanism for promotion should be replaced for dan players if not for the entire population.
On average all kyu players better than 9-kyu will end up in an Elo bracket one level higher than their present grade. Also, the entire range of kyu-grades from 1 to 15-kyu will be populated.
Although the total Elo range has widened (the lowest rating now being 1 rather than 1000), new players will rise through the ranks (=grades) faster as the rating differences between them and their opponents are typically three times as large as in the present system, while the Elo range of each kyu grade has only doubled. This will prove quite stimulating for newcomers.

Table II. Handicap - Elo rating difference (DElo) correspondence
handicap	DElo	handicap	DElo
sente	25	3 p (right Lance)	675
Lance	75	3 p (left Lance)	700
Bishop	225	4 p	750
Rook	300	5 p (right kNight)	900
Rook + Lance	400	5 p (left kNight)	1050
2 p	600	6 p	1200

When computing the rating difference between opponents in a handicap game for rating calculations, the above handicap values are added to the rating of the handicap receiver first. Thus, if a player rated 1800 gives Bishop (225 points) to a player rated 1600, the game will be rated as if the 1600 player were 25 points higher rated (1600+225=1825) than the 1800 player.

Larry Kaufman has carried out a careful statistical analysis of a large number of games in terms of actual Elo rating differences, handicaps used and the outcome of the games. On the basis of that analysis and the grade-rating correspondences of table I, he proposes the system detailed in Table II for relating handicaps to Elo rating differences. As said before, this handicap system does not need to be adopted by FESA, but it might be advantageous to introduce it e.g. in clubs where handicap games are often played. This would help establishing fairly reliable ratings quickly even for players who do not often play in tournaments and also in cases where large differences in strength exist between players within a club.

In order to avoid drift of the population as a whole, every year the ratings of all active players with a rating of 1900 or above and at least fifty rated games are recorded, and one year from that moment the average of those of them that are still active is compared to their previous year's average. If any year's average is significantly lower than the previous year's average, the rating officers by unanimous agreement may raise the average rating of all players by an amount not to exceed the calculated difference. These calculations are carried out on the basis of ratings, not of grades.

6. Elo calculation and promotion based on Elo ratings

The following consensus rules and regulations governing Elo rating determination and promotions based on these ratings are being proposed.

6A) Gaining and losing Elo points

The FIDE "Logistic" Elo rating calculation formula (explained in the "Official Laws of Chess"), that is currently used to calculate European ratings, will be adopted without any changes. It relates a player's rating gain or loss (DElo in the formula below) upon completion of a tournament to the results of his games and the difference in ratings between him and his opponents:

where the summation extends over all games this player played in the tournament, V(i) is the result of the game against his i-th opponent (1 for a win, 0.5 for a draw, 0 for a loss), Elo(i) the rating of his i-th opponent, Elo his own rating before the tournament, and K a coefficient that, when divided by 2, indicates how many points a player gains (loses) when winning (losing) against a player of the same rating. A K-value of 20 is adequate for the higher dan grades, but lower graded players that can progress rapidly should be able to move faster, which the value of 40 enables them to do.

Table III. K-values and numbers of games required for promotion
grade range	Elo range	K-value	#games*
4-dan ... 6-dan	2100 - 2699	20	16 (8)
1-dan ... 3-dan	1600 - 2099	24	14 (7)
3-kyu ... 1-kyu	1200 - 1599	28	12 (6)
7-kyu ... 4-kyu	800 - 1199	32	10 (5)
11-kyu ... 8-kyu	400 - 799	36	8 (4)
15-kyu ... 12-kyu	1 - 399	40	6 (3)
* number of games above the lower bound required for promotion (number of games above the midpoint are given in parentheses).

If a player with a rating below 1200 gains points in a "tournament", his gain shall be doubled unless such doubling brings him over 1200. In that case his new rating shall be 1200 or the rating he would have obtained if his gain were not doubled (whichever is higher).

6B) Time table for calculation of Elo ratings on the basis of tournament results

Events are to be rated in order of their finishing date if at all possible. Therefore, an event should not be rated if the results of a prior event have not yet been received.

For purposes of the rating system, a "tournament" is defined as all games which are to be rated at one time, using the same starting rating. If multiple events are held in a short time, they may all be rated as if they were part of one large tournament.
No "tournament" can include more than twenty games by any one player. If necessary to avoid this, an event will be broken into two or more parts for rating purposes.

Club events are rated based on the date of the last game of the event. It is recommended that any event that takes a long time (e.g. more than 4 months), at the discretion of the club running the event, be split into two or more portions for rating purposes.

A tournament is ratable if at least two (provisional or established, see section 6C below) rated players participate. Once a "tournament" is rated, the new ratings will be used for subsequent events. The present European custom of using the semi-annual rating as the basis for all events over the next six months will be discontinued.

6C) Elo categories and procedure for rating tournaments

A player with less than 4 games is unrated. A player gets a provisional rating after 4 games and an established rating after 15 games. A rating remains in effect for an inactive player even if it is no longer published, unless unusual circumstances indicate that it would be more accurate to treat the player as a newcomer. A player who earns a club grade (either in a western club or in Japan) from unrated play that is more than a grade above the grade corresponding to his current rating and who has not played a rated game for at least one year may have his rating redetermined at the discretion of the rating officer. In that case, he will be regarded an unrated player with two games assumed at his new grade (see section 6C1 below).

When rating a tournament first the unrated players are rated, then the players with a provisional ratings and finally the established players:

The initial performance rating of unrated players is determined on the basis of the PRE-event ratings of provisional and established players (ignoring any games with other unrated players). This is done using the Logistic formula of section 6A, with two stipulations:
* if the player has no club grade, an assumed win and an assumed loss are added to his score (example: 5-0 becomes 6-1). These "results" are only added to a player's actual results to avoid the problem of perfect scores having no defined value. They are removed from the record once a player has a provisional rating and do not count towards getting an established rating.
* if the player has a club grade, he is assumed to have scored one win and one loss against players rated in the middle of that grade. These "results" are added to his actual results, remain on the record and count towards getting an established rating.
Subsequently, the performance ratings of these unrated players are iteratively recalculated with PRE-event ratings of provisional and established players and current performance ratings of other unrated players until no change occurs.
Using the Logistic formula of section 6A, the POST-event rating of each provisional player is determined on the basis of his own PRE-event rating, the PRE-event ratings of his established and provisional opponents and the POST-event performance ratings of his unrated opponents. If N, the number of rated games he has played (including the event being rated), is less than twenty, the net gain or loss for the event is multiplied by (20/N). This is not an iterative procedure.
Using the Logistic formula of section 6A, the POST-event rating of each established player is determined on the basis of his own PRE-event rating, the POST-event ratings of his unrated and provisional opponents and the PRE-event ratings of his established opponents. Again, this is not an iterative procedure.

This procedure gives provisional players fair POST-event ratings, but more importantly it uses ratings that are as reliable as possible for the calculation of POST-event ratings of established players.

6D) Determination of a promotion system based on Elo ratings

Promotion is an important issue: as demotions cannot occur, one must be very careful when promoting players. As indicated in sections 3 and 5 above (and known even when it was introduced), the point system is by definition inflationary. For example, a 1-dan with a 1923 rating is much stronger than one with a 1678 rating, and this should be taken into account in any promotion system. An Elo-based promotion system satisfies this condition.

Considering Elo ratings of European kyu players, it is clear that Elo ratings and kyu grades hardly correlate. In fact, Dutch kyu grades used to be "hard," but based on the current Elo list they seem "softer" than most others. The practice of kyu promotions being awarded by individual associations without any guiding principles has not worked and has led to the present discrepancies between ratings and grades. Replacing the current system of unregulated kyu grade assignments by an Elo-rating-based system would solve that problem.

Based on these observations, we discourage use of the point system for promotions and propose an Elo-based promotion system (for dan and kyu players alike) instead. In this system, promotion to a certain grade requires that a player satisfies one of the following three conditions:

he has had a rating above the lower bound rating corresponding to that grade over the number of consecutive games shown in Table III. This requirement is based on the system currently used in Europe. Examples: if a 2-kyu has maintained a rating of 1450 or higher over 12 games, he will be promoted to 1-kyu.
he has had a rating above the mid-point corresponding to that grade over half the number of consecutive games that are required by condition 1 (shown in parentheses in Table III). This requirement is based on the system currently used in America. Example: if an 11-kyu has maintained a rating of 550 (500+100/2) or higher over 4 games, he will be promoted to 10-kyu.
upon completion of any tournament, he has attained a rating corresponding to the next higher grade. Example: a 2-kyu with a rating of 1750 (lower bound of 2-dan) or higher will be promoted to 1-dan (one can indeed skip grades this way).

The numbers in Table III are chosen as a compromise between European and American practices. Refer to section 6E for promotions to 5 and 6-dan.

Although new Elo ratings are calculated upon completion of an "event", for purposes of promotion the ratings officers will keep track exactly after which game a player

A) exceeded the lower-bound and mid-point ratings of the grades above his current grade, and

B) has maintained those ratings during the number of consecutive games listed in Table III (which earns him promotion).

For this reason it is very important that results of an event be reported to the ratings officers in the actual playing order. If the actual order in which games took place cannot possibly be determined, only then will it be assumed that rating points were gained at a constant rate and an interpolation scheme will be used to determine after which game a player crossed the promotion threshold the first time (A) or when he has earned promotion by maintained a rating above that threshold for enough consecutive games (B). The game 'thr' after which the threshold Elo(thr) was passed is given by:

thr = #games * ( Elo(thr) - E(0) ) / ( Elo(end) - Elo(0) ), rounded off to the nearest integer

where Elo(0) and Elo(end) are the Elo ratings at the beginning and end of the event, respectively, while #games is the number of games making up the event. This interpolation scheme should be avoided whenever possible, though.

As mentioned in section 6A, new Elo ratings can and should be calculated after each "event" or (mostly relevant for lower kyu players) after each club night. This makes it convenient for national associations and clubs to base promotions of their own players on Elo ratings. For an Elo system and an Elo-based promotion system to be conveniently used in tournaments and clubs alike, it is advantageous if a pan-atlantic shogi pairing/Elo-calculation program become available for both PC and Macintosh. It is proposed that this option be considered seriously by FESA and SFA.

6E) Promotion to 5 and 6-dan

Nihon Shogi Renmei (NSR) had asked foreign shogi organizations not to award 5 or 6-dan promotions unless they were obtained in Japan. It is proposed that neither European nor American organizations will promote a player to 5 or 6-dan at this point in time, unless the promotion is ratified by NSR.

* If a player has had a rating of 2300 (5-dan lower bound) for 16 consecutive games and his rating was above 2400 (5-dan mid-point) at least once, NSR will be requested to ratify promotion to 5-dan.

* If a player has had a rating of 2500 (6-dan lower bound) for 16 consecutive games and his rating was above 2600 (6-dan mid-point) at least once, NSR will be requested to ratify promotion to 6-dan.