//
Uncategorized

# Demystifying the PageRank and HITS Algorithms

Here is just enough linear algebra to master network-based ranking methods.
The algorithms known as PageRank and HITS are the two most prominent examples of network-based ranking methods. These methods require only a little bit of mathematics to understand well.  The main underlying model is that the rank of any page is dependent on the number and location of backlinks (or in-links) to that page, and the importance or quality of the pages that contain those backlinks. Most explanations of how this is precisely computed uses the language of linear algebra and “eigenvalues”, which often obscures the relatively simple ideas underlying them.
Here is an attempt to make these algorithms more accessible by boiling down the math to the essential details. Let G be a directed graph. We think of the nodes of G as webpages and the directed arcs as hyperlinks. Let A(G)=A be the adjacency matrix where A[i,j] = 1 if (i,j) is an arc, is 0 otherwise. The ith row of A is the (characteristic) vector representing all the out-neighbors of the ith node, and the jth column of A is the vector representing the in-neighbors of the jth node. The transpose matrix A’ has the roles reversed, where the ith row of A’ is the vector representing all the in-neighbors of the ith node, and the jth column of A’ is the vector representing the out-neighbors of the jth node.Matrix multiplication can give simple algebraic representations of graph structures. For example, the square of the adjacency matrix A^2 has as its [i,j]th entry the number of 2-hop paths from ith node to jth node. And the product A’A has as its [i,j]th entry the number of nodes that are common in-neighbors to both the ith and jth node. Note how A’A is a symmetric matrix, likewise AA’ is also symmetric. Symmetric matrices have the property that the [i,j]th entry matches the [j,i]th entry for all pairs i,j.

Stochastic Matrices.
Let us place a probability distribution on the out-neighborhood of each node – that is, we give a positive weight (between 0 and 1) to each arc, with the property that the sum of weights out of each node is exactly 1. We call such a matrix a stochastic matrix. Here are two example graphs that we will use as running examples in this post. The smaller graph G1 has 4 nodes with edges (1,3), (2,3)(3,4) (4,1)(4,2),(4,3).
The larger graph G2 has 6 nodes and is drawn as follows.
Figure 1. Stochastic matrix obtained from a digraph where a uniform distribution is used at every node.

Markov Chains and PageRank.
Given a stochastic matrix P we consider each entry as a transition probability of a random walk, or equivalently, the transition probability matrix of the Markov chain. A stationary distribution of a Markov chain is a probability distribution d on the nodes, so that P’d =d, i.e., d is a fixed point under the action of the transition matrix. For such matrices we can apply the fundamental theorem of Markov chains, which says that, under certain conditions, stationary distributions always exist and are unique.
PageRank associates this unique stationary distribution with the ranking of the nodes of a web-graph. The proof of the existence of the stationary distribution or fixed point,  relies on a result called the Perron-Frobenius Theorem for non-negative, irreducible matrices, which essentially states that any such Markov chains has a left eigenvector of P with eigenvalue 1. We can simplify matters by considering the restriction to positive matrices, which indeed what PageRank does in practice.

The Perron-Frobenius Theorem for non-negative matrices:

Let A = [a(i,j)] be an n × n  matrix. If A has nonnegative entries (a(i,j) >= 0 for 1 ≤ i, j ≤ n), then there exists  positive real number r, such that r is an eigenvalue of A associated with an eigenvector with nonnegative entries. Futher, if A is irreducible (see note below), then there is an eigenvector with strictly positive entries associated with the largest eigenvalue r, and r is largest in absolute value among all eigenvalues (i.e, all other eigenvalues r’ are such that  |r’| < r).

An important consequence is that the dominant eigenvalue r cannot be a multiple root, and therefore in applications we can avoid any situation of having more roots than corresponding linearly independent eigenvectors.

Irreducibility in linear algebra is a fundamental notion related to underlying digraph being strongly connected (meaning there are paths between any pair of nodes).  A matrix is irreducible if it is not similar via a permutation to a block upper triangular matrix (that has more than one block of positive size. Replacing non-zero entries in the matrix by one, and viewing the matrix as the adjacency matrix of a directed graph, the matrix is irreducible if and only if the digraph is strongly connected. Similarly, a Markov chain is irreducible if there is a non-zero probability of transitioning (even if in more than one step) from any state to any other state. Informally speaking, we say a Markov chain is periodic if positive probability paths only occur in cyclic pattern. More formally, a Markov chain P is aperiodic if for all x, y we have gcd{t : P^t
(x, y) > 0} = 1. All becomes workable if the chain is irreducible and aperiodic. We can summarize the discussion above as follows.

Fundamental Theorem of Markov Chains: If a Markov chain P is irreducible and aperiodic then it has a unique stationary distribution π. This is the unique (normalized such that the entries sum to 1) left eigenvector of P with eigenvalue 1. Moreover, P^t(x, y) → π(y) as t → ∞ for all x, y.
The Algorithm for PageRank.

A simple consequence of the Perron-Frobenius Theorem for positive matrices and the Fundamental Theorem of Markov Chains  gives us an algorithm for computing the PageRank (the unique stationary distribution) for graphs to which it applies. Namely, by simply powering up or iteratively updating a distribution vector, starting from an arbitrary initial state (usually taken as the uniform distribution). Each iteration of this algorithm takes time O(|V|^2) but can be improved to O(|V| + |E|), and will often converge very quickly depending on the structure of the network.
```from numpy import *

def pagerank(H):
n= len(H)
rho = 1./n * ones(n)  # initial vector of pageranks
theta=0.85            # damping factor
G = (theta * H) + ((1-theta) * outer(1./n * ones(n), ones(n)))
print rho
for j in range(10):
rho = dot(rho,G)
print rho

#Testing
H1 = array([[0, 0, 1, 0], [0, 0, 1, 0],[0,0,0,1],[1./3,1./3,1./3,0]]);
pagerank(H1);

H2= array([[0, 1./2, 0, 0,1./2, 0], [0, 0, 1, 0, 0, 0],[ 1./3,1./3,0, 1./3, 0,0],
[ 1./2, 0, 0, 0, 1./2, 0], [1./2, 0, 0, 1./2, 0, 0],[ 0, 1./2,1./2,0, 0, 0]])
pagerank(H2)
```

Output:
[ 0.25 0.25 0.25 0.25]
[ 0.10833333 0.10833333 0.53333333 0.25 ]
[ 0.10833333 0.10833333 0.2925 0.49083333]
[ 0.17656944 0.17656944 0.36073611 0.286125 ]
[ 0.11856875 0.11856875 0.41873681 0.34412569]
[ 0.13500228 0.13500228 0.33656916 0.39342628]
[ 0.14897078 0.14897078 0.37847466 0.32358378]
[ 0.12918207 0.12918207 0.3824324 0.35920346]
[ 0.13927431 0.13927431 0.35888383 0.36256754]
[ 0.14022747 0.14022747 0.3769938 0.34255126]
[ 0.13455619 0.13455619 0.37294289 0.35794473]
[ 0.13891767 0.13891767 0.3676632 0.35450145]
[ 0.13794208 0.13794208 0.37410212 0.35001372]
[ 0.13667055 0.13667055 0.37117209 0.35548681]
[ 0.13822126 0.13822126 0.3705612 0.35299627]
[ 0.13751561 0.13751561 0.37249176 0.35247702]
[ 0.13736849 0.13736849 0.37114503 0.35411799]
[ 0.13783343 0.13783343 0.37135986 0.35297327]
[ 0.13750909 0.13750909 0.37182593 0.35315588]
[ 0.13756083 0.13756083 0.37132629 0.35355204]
[ 0.13767308 0.13767308 0.37152649 0.35312735]
[ 0.16666667 0.16666667 0.16666667 0.16666667 0.16666667 0.16666667]
[ 0.21388889 0.21388889 0.2375 0.14305556 0.16666667 0.025 ]
[ 0.22392361 0.19381944 0.21743056 0.163125 0.17670139 0.025 ]
[ 0.23103154 0.19239786 0.20037153 0.16170341 0.18949566 0.025 ]
[ 0.23103154 0.19058534 0.19916318 0.16230759 0.19191236 0.025 ]
[ 0.23197304 0.19024297 0.19762254 0.16299232 0.19216913 0.025 ]
[ 0.23193667 0.1902066 0.19733153 0.16266493 0.19286028 0.025 ]
[ 0.23200881 0.19010868 0.19730061 0.16287622 0.19270568 0.025 ]
[ 0.23202414 0.19013058 0.19721738 0.16280175 0.19282614 0.025 ]
[ 0.23202011 0.19011352 0.197236 0.16282937 0.19280101 0.025 ]
[ 0.23202644 0.19011708 0.19722149 0.16282396 0.19281103 0.025 ]
[ 0.23202429 0.19011566 0.19722452 0.16282411 0.19281142 0.025 ]
[ 0.23202538 0.1901156 0.19722331 0.16282513 0.19281057 0.025 ]
[ 0.23202511 0.19011572 0.19722326 0.16282443 0.19281147 0.025 ]
[ 0.23202518 0.1901156 0.19722337 0.1628248 0.19281106 0.025 ]
[ 0.23202519 0.19011566 0.19722326 0.16282465 0.19281124 0.025 ]
[ 0.23202518 0.19011563 0.19722331 0.1628247 0.19281118 0.025 ]
[ 0.23202519 0.19011564 0.19722329 0.16282469 0.1928112 0.025 ]
[ 0.23202518 0.19011564 0.19722329 0.16282469 0.1928112 0.025 ]
[ 0.23202519 0.19011564 0.19722329 0.16282469 0.1928112 0.025 ]
[ 0.23202518 0.19011564 0.19722329 0.16282469 0.1928112 0.025 ]

So the iteration gives about 2 decimal place accuracy after 10 iteration steps.
As it turns out the larger graph has faster convergence properties compared to the smaller graph.

Determinants and Eigenvalues of Matrices

A determinant is a function that associates a scalar value to a square matrix. The value of the determinant can tell you if the matrix has an inverse or not. A matrix with a nonzero determinant is called regular or an invertible matrix (and, we can calculate its inverse matrix). If the determinant is zero the matrix is called a singular matrix and it does not have an inverse.Formally, a determinant is computed by a sum of products over all permutations of numbers 1 to n, and therefore has n! terms in the sum. Determinants over 2 by 2 matrices, that is where n = 2, have only two terms (a11 * a22) – (a12 * a21) and are simple to calculate by hand. For larger values n>2, the number of terms start to get out of hand.
All regular, square matrices can be transformed into singular matrices by subtracting common value from the diagonal entries of the matrix. The problem of transforming a regular matrix into a singular matrix is referred to as the eigenvalue problem. The Eigenvalue Problem is determined by subtracting c*I from A and setting the determinant to 0, and then solving for c. Given a solution c to the eigenvalue problem we can find a vector (a class of vectors) X such that AX=cX. Any such X is called the associated eigenvector for eigenvalue c.
One of the simplest methods for finding the largest magnitude eigenvalue and an associated eigenvector is called the Power Method. Given a matrix A and an initial vector b0 that is a linear combination (that is, a weighted sum) of eigenvectors, we can see that by repeatedly multiplying each vector bi by A, we are reinforcing (the direction) of the eigenvector with the largest magnitude eigenvalue (relative to the direction of the other eigenvectors in the sum). Hence, the power method is applicable in cases where the the following technical conditions hold:
1) A has an eigenvalue that is strictly greater in magnitude than its other eigenvalues, and
2) The starting vector b0 has a nonzero component in the direction of an eigenvector associated with the dominant eigenvalue, i.e., b0 is not orthogonal to the eigenvector.

The Power Method for Principal Eigenvector
.
The power method algorithm starts with an initial vector b0, and is described by the following iteration to compute successive vectors:
for k = 0,1,2,3,…
b(k+1) = A(b(k)) / || A(b(k))||
Convergence of the power method is guaranteed when above conditions 1)  and 2) hold.
The HITS Algorithm for Ranking Webpages.
The HITS algorithm is another conceptually simply link-based ranking algorithm based on counting votes and computing iteratively using the power method, which posits two parameters for each webpage: an authority value a_i and a hub value h_i. Authority values are simply the sum of the in-neighbor hub values, and similarly the hub values are the sum of the out-neighbor authority values. So hubs and authority values are mutually dependent. In matrix form we have the following two matrix-vector equations:
a= A’h and h = Aa
Plugging back in for h and a leads to the following equations yielding an pair of eigenvector problems:
a= A’Aa  and  h = AA’h

Looking at these equations we can see that we seek eigenvectors associated with eigenvalue 1 for each of the pair of matrices A’A and AA’. We cannot in general assume that such eigenvalue/eigenvector pairs exists, but we can find other eigenvectors associated with the largest eigenvalue using the normalized Power Method described above, so long as technical conditions are satisfied. To wit, as noted both A’A and AA’ are positive, symmetric matrices. Such matrices have the property that all the eigenvalues are positive reals, and the associated eigenvectors form a basis, that is every vector is a linear combination of eigenvectors. Hence, we can be assured that the HITS algorithm will converge to the (normalized) sum of eigenvectors associated with the largest eigenvalue (for which the starting vectors a_0 and h_0 have non-zero component). We are in good shape if these vectors are real-valued, so that we can interpret the components as ranking values. The theory says that as long as the matrices in question are “irreducible” then we are in business.
Here is Python code for HITS algorithm:
```
def hits(A):
n= len(A)
Au= dot(transpose(A),A)
Hu = dot(A,transpose(A))
a = ones(n); h = ones(n)
print a,h
for j in range(5):
a = dot(a,Au)
a= a/sum(a)
h = dot(h,Hu)
h = h/ sum(h)
print a,h

```
Here are the results of HITS algorithm with the small graph G1:
>>>A1=array([[0, 0, 1, 0], [0, 0, 1, 0],[0,0,0,1],[1,1,1,0]]);
>>>hits(A1)

[ 1.  1.  1.  1.] [ 1.  1.  1.  1.]
[ 0.25        0.25        0.41666667  0.08333333] [ 0.25        0.25        0.08333333  0.41666667]
[ 0.25        0.25        0.47727273  0.02272727] [ 0.25        0.25        0.02272727  0.47727273]
[ 0.25        0.25        0.49418605  0.00581395] [ 0.25        0.25        0.00581395  0.49418605]
[ 0.25        0.25        0.49853801  0.00146199] [ 0.25        0.25        0.00146199  0.49853801]

Here are the results of HITS algorithm with the larger graph G2:
>>>A2 = array([[0, 1, 0, 0,1, 0], [0, 0, 1, 0, 0, 0],[ 1,1,0, 1, 0,0],  [1,0,0,0,1,0], [1,0, 0,1, 0, 0],[0,1,1,0,0,0]]);
>>>hits(A2)

[ 1.  1.  1.  1.  1.  1.] [ 1.  1.  1.  1.  1.  1.]
[ 0.26923077  0.26923077  0.11538462  0.19230769  0.15384615  0. ] [ 0.16666667  0.06666667  0.26666667  0.16666667  0.16666667  0.16666667]
[ 0.28378378  0.27027027  0.08783784  0.20945946  0.14864865  0. ] [ 0.16666667  0.04166667  0.29166667  0.16666667  0.18452381  0.14880952]
[ 0.29205607  0.26635514  0.0771028   0.21728972  0.14719626  0. ] [ 0.16356108  0.03312629  0.30020704  0.16977226  0.19461698  0.13871636]
[ 0.29650462  0.26355966  0.0723182   0.22097228  0.14664524  0. ] [ 0.16131335  0.0296217   0.30371163  0.17201999  0.19985724  0.13347609]
[ 0.29880066  0.26199338  0.07003033  0.22277364  0.14640199  0.  ] [ 0.16004659  0.02801275  0.30532058  0.17328675  0.20252544  0.1308079 ]