In a recent paper in the Proceedings, Bilalić et al. (2009) argued that different participation rates of men and women represent the key factor that has to be taken into account when the comparatively small number of women at the top level of certain intellectually demanding activities needs to be explained. Their conclusion was based on the results of an analysis of ratings of German chess players. According to Bilalić et al. (2009), 96 per cent of the observed differences in performance between the top 100 pairs of male and female players could be attributed to differential participation rates. The first purpose of this comment is to argue that their conclusion was premature and caused by an inappropriate statistical approach. The second purpose is to propose a more adequate method of analysis and to show that participation rates only explain two-thirds of the observed differences.
Bilalić et al. (2009) assumed that the ratings of German chess players are realizations of normally distributed random variables. Then, they calculated approximately the expected rating of the kth best male and the kth best female player. Fig. 2 of their paper contained the differences of these two values for k = 1, … ,100. What these authors did not mention, however, is that their model predicts a rating of 3031 for the best male German player and a rating above 2700 for the 16 best male German players. Currently, there are only 33 players in the world with a rating above 2700 and there is no German belonging to this elite group. The highest rating ever achieved by a human player is 2851, which is significantly lower than the expected rating of 3031 predicted for the best German player according to the model of Bilalić et al. (2009). Therefore, this model seems inadequate to describe the upper tail of the distribution of ratings of German chess players.
I will now describe an analytical approach that does not rely on the questionable assumption of a normal distribution for the rating. Assume there are nf female players and nm male players, and let Rk denote the rank of the kth best female player in the ordered combined list of male and female players. Under the assumption that gender has no effect on rating performance, it follows that the distribution of Rk is a negative hypergeometric distribution (Johnson & Kotz 1969), i.e.: 1 for k ≤ s ≤ nm + k. The expected value is ERk = k · (nm + nf + 1)/(nf + 1) and it is straightforward to calculate the 0.05 per cent quantile rk,0.0005 and the 99.95 per cent quantile rk,0.9995 for this distribution. If there is no gender effect on rating performance, then with probability of at least 99.9 per cent, the rank Rk of the kth best female player would be between rk,0.0005 and rk,0.9995. Figure 1 compares the observed rank rk of the kth best female player with its expected value and with the quantiles rk,0.0005 and rk,0.9995. The discrepancy is evident. With the exception of the best female German player (even for her, the observed rank r1 = 87 is considerably above her expected rank of ER1 = 21 but at least is within the interval [r1,0.0005, r1,0.9995] = [1,157]), the observed ranks of the best 100th female players are above their 99.9% confidence intervals. For example, for the 100th best female player, the observed rank is 5505, whereas her expected rank is only 2116 and, assuming no gender effect, her rank is between r100,0.0005 = 1510 and r100,0.9995 = 2849 with a probability of at least 99.9 per cent.
Perfect agreement between the observed rank and the rank expected under the assumption that gender has no effect on rating performance for the kth best female player would occur if she possessed rating fk of the round(ERk)th best player in the combined list (where round(ERk) denotes rounding ERk to the nearest integer). Analogously, let ER*k denote the expected rank of the kth best male player and mk the rating of the round(ER*kk)th best player in the combined list. Then, dk = mk − fk may be considered to represent that part of the rating difference between the kth best male and the kth best female player, which can be attributed to differential participation rates of men and women. Figure 2 compares these differences dk with the differences between the actual ratings of the best 100 female and male players. Only between 41 and 71.1 per cent (mean value: 66.9%) of the actual rating differences are explained by different participation rates of men and women, which is substantially lower than the 96 per cent obtained by Bilalić et al. (2009). The unexplained gap between the two curves varies between 99 and 170 rating points (mean value over 100 pairs: 124.5). If two players with a rating difference of 124.5 points compete in a match over 100 games, the expected result is 67 : 33 in favour of the higher rated player. Therefore, the conclusion of Bilalić et al. (2009) that ‘there is little left for biological or cultural explanations to account for’, appears to be premature.
I am grateful to Karen Hirschmann for pointing my attention to the work of Bilalić et al. (2009).
- Received December 9, 2009.
- Accepted January 22, 2010.
- © 2010 The Royal Society