//
Uncategorized

# From Zermelo to Zuckerberg: the Method of Paired Comparisions

Calculating the strength of players in a competition has a long and interesting history. The renowned mathematician Ernst Zermelo introduced a simple probability model for game outcomes as a function of the players’ unknown strengths. The model assumes that games (such as chess) result in only wins and losses. Letting Ar and As be competitors r and s, the model is given by

P(Ar defeats As) = gr / (gr + gs)

where gr and gs are the unknown player strengths.

This model is often referred in the statistics literature as the Bradley-Terry model due to a well-cited paper by those authors. Computationally,  it is more convenient and common to work with a re-parameterized version of this model using logarithms.

Setting vr = ln gr for player Ar, the model can be rewritten as

P(Ar defeats As) = exp(vr)/ exp(vr) + exp(vs) = 1/ 1 + exp(vs – vr))

Of course the base of the exp/log functions that is used is not important, and the final probability function in this form is recognizable as the standard logistic cumulative distribution function evaluated at (vs – vr).

Viewed in this way, and in a modern context of having a large data set of pairwise comparisons, we can see that to apply the model we should fit data by a logistic regression, that is finding values for the differences (vs-vr) that best fits. Best fit in this context is often associated with maximum likelihood or least-squares fit. A least-squares fit can be obtained using an inverse power-method (see, JP Keener (1993)). Today, using the maximum likelihood estimate is often preferred, and was the original fitting described by Zermelo.

An interesting aside: a contemporary of Zermelo named Louis Leon Thurstone, a pioneer in the field of  psychometrics, derived an alternative method of paired comparison analysis called  the “law of comparative judgment” in which player strength differences were modeled by a cumulative normal distribution rather than a logistic distribution. Although conceptually different, it appears that in practice this difference is negligible.

Without loss of much generality we can focuses attention on estimating strengths in data sets (tournaments) that are irreducible (see note on irreducibility in last post). A data set is irreducible in this context if in every possible partition of players into two non-empty subsets, some player in the second set has defeated at least one player in the first set. This is just another way of stating a strong connectivity condition. Zermelo showed that with irreducible data that there is a unique solution (up to constant multipliers) and it is effectively computable. Today, statistical software solves this problem very quickly. Here is a very simple implementation of Zermelo’s maximum likelihood regression.

```
# Pairwise Comparison Strength Estimation based on Zermelo
# Estimates strength values gamma_i where model is prob(i beats j) = gamma_i / gamma_i + gamma_j
# Zermelo(1929) shows that the gamma sequence is monotone increasing function of the likelihood.

from numpy import *

def loglikelihood(W,gamma):
k = len(gamma);li=0
for i in range(k):
for j in range(k):
li+=W[i][j]*(log(gamma[i])-log(gamma[i]+gamma[j]))
return li

def normalize(v):
vmag = sum(v)
return array([ v[i]/vmag  for i in range(len(v)) ])

def EZUpdate(oldgam,Wins):
oldgam=normalize(oldgam)
k= len(oldgam)
newgam= zeros(k)
for i in range(k):
sumnbi=0
for j in range(k):
if i != j:
sumnbi += (Wins[i][j]+Wins[j][i]) / ( oldgam[i] + oldgam[j])
newgam[i]= sum(Wins[i])/sumnbi
return newgam

H =array([ [0,30,0],[1,0,1],[ 1,0,0 ]])
gamma=1./len(H) *ones(len(H))
print gamma, loglikelihood(H,gamma)
for k in range(10):
gamma=EZUpdate(gamma,H)
print gamma, loglikelihood(H,gamma)
```

Here is a run of the above implementation of Zermelo’s method on the graph H.

[ 0.33333333  0.33333333  0.33333333] -22.8738569585
[ 0.625       0.04166667  0.33333333] -7.96202160795
[ 0.63100137  0.04067797  0.26953125] -7.91614734819
[ 0.67438705  0.04303501  0.24514091] -7.89303807414
[ 0.70357453  0.04451092  0.22794655] -7.8815401825
[ 0.72300156  0.0454254   0.21597754] -7.87590861589
[ 0.73599084  0.04599938  0.20771761] -7.87318042316
[ 0.74472777  0.04636577  0.20203746] -7.8718688727
[ 0.7506368   0.04660359  0.19813622] -7.87124173068
[ 0.75465101  0.04676018  0.19545754] -7.87094296786
[ 0.75738721  0.04686449  0.19361814] -7.8708010086

We can explain the calculation of the likelihood as follows: given the parameters gamma[i], the probability that a graph H is realized in an experiment is (from binomial distribution) simply the product of the probability of each edge appearing with its correct orientation. This value is Prob(H | gamma[i]) = Product over all (i<j) (
(W[i][j] + W[j][i] choose W[i][j]) *
(gamma[i]/gamma[i] + gamma[j])^W[i][j] *
(gamma[j]/gamma[i] + gamma[j])^W[j][i]))

Again, the goal of Zermelos method is to find the gamma that maximizes this value. Note that the first term does not involve gammas, so it can be ignored. And since the log is a monotoniclly increasing function we can simplify the maximization by taking logs.
Thus we consider the problem of maximizing the Sum over all pairs (i,j) of W[i][j]*(log(gamma[i])-log(gamma[i]+gamma[j])).

The Social Network of Zuckerburg
There is a scene in the movie “The Social Network” in which the actors portray the building of a website that that is used to compare the attractiveness of pairs of pictures. The actors discuss how to calculate the overall strength or attractiveness of the competitors and refer to a formula that is the base 10 logistic function that comes from the Zermelo model we have examined above.