## Description

Optimization for Machine Learning

EPFL

Martin Jaggi & Nicolas Flammarion github.com/epfml/OptML course

(Proximal Gradient and Subgradient Descent)

Proximal Gradient and Subgradient Descent

Solve Exercises 26, 27, 28, 29 from the lecture notes.

Random Walks

Gradient descent turns up in a surprising number of situations which apriori have nothing to do with optimization. In this exercise, we will see how performing a random walk on a graph can be seen as a special case of gradient descent.

We are given an undirected graph G(V,E) with vertices V = [n] labelled 1 through n, and edges E ⊆ [n]2 such that if (i,j) ∈ E, then (j,i) ∈ E. Further, we assume that the graph is regular in the sense that every edge has the same degree. Let d be the degree of each node such that if we denote N(i) = {j : (i,j) ∈ E} to be the neighbors of i, then |N(i)| = d. We assume that every node is connected to itself and so (i,i) ∈ N(i).

Now we start our random walk from node 1, jumping randomly from a node to its neighbor. More precisely, suppose at time step t we are at node it. Then it+1 is picked uniformly at random from N(i). If we run this random walk for a large enough T steps, we expect that Pr(iT = j) = 1/n for any j ∈ [n]. This is called the stationary distribution.

Problem A. Let us represent the position at time step t in the graph with eit ∈ Rn where the itth coordinate is 1 and all others are 0. Then, the vector xt = E[eit] denotes the probability distribtion over the n nodes of the graph. Further, let us denote G ∈ Rn×n be the transition probability matrix such that

if (i,j) ∈ E Gi,j =

otherwise.

Show that

xt+1 = Gxt (1)

Problem B. Simulate the random walk above over a torus and confirm that we indeed converge to a uniform distribution over the nodes. What is the rate at which this convergence occurs?

Follow the Python notebook provided here: colab.research.google.com/github/epfml/OptML course/blob/master/labs/ex04/template/notebook lab04.ipynb

Problem C. Define µ := be a vector of all 1/n, and a objective function f : S → R as

f(x) = (x − µ)>(I − G)(x − µ),

defined over the subspace S ⊆ Rn where .

1. Show that f defined above is convex and compute its smoothness constant.

2. Show that running gradient descent on f with the correct step-size is equivalent to the random walk step

(1).

3. Prove that xt converges to the distribution µ at a linear rate i.e. for the random walk on a torus with n nodes,

.

Hint: Use that the two largest eigenvalues of G are 1 and 1− n1 . Also Gµ = µ and so µ is the eigenvector corresponding to eigenvalue 1.

2

## Reviews

There are no reviews yet.