CS439 – Labs (Solution)

$ 20.99
Category:

Description

Optimization for Machine Learning
EPFL
Martin Jaggi & Nicolas Flammarion github.com/epfml/OptML course

(Proximal Gradient and Subgradient Descent)
Proximal Gradient and Subgradient Descent
Solve Exercises 26, 27, 28, 29 from the lecture notes.
Random Walks
Gradient descent turns up in a surprising number of situations which apriori have nothing to do with optimization. In this exercise, we will see how performing a random walk on a graph can be seen as a special case of gradient descent.
We are given an undirected graph G(V,E) with vertices V = [n] labelled 1 through n, and edges E ⊆ [n]2 such that if (i,j) ∈ E, then (j,i) ∈ E. Further, we assume that the graph is regular in the sense that every edge has the same degree. Let d be the degree of each node such that if we denote N(i) = {j : (i,j) ∈ E} to be the neighbors of i, then |N(i)| = d. We assume that every node is connected to itself and so (i,i) ∈ N(i).
Now we start our random walk from node 1, jumping randomly from a node to its neighbor. More precisely, suppose at time step t we are at node it. Then it+1 is picked uniformly at random from N(i). If we run this random walk for a large enough T steps, we expect that Pr(iT = j) = 1/n for any j ∈ [n]. This is called the stationary distribution.
Problem A. Let us represent the position at time step t in the graph with eit ∈ Rn where the itth coordinate is 1 and all others are 0. Then, the vector xt = E[eit] denotes the probability distribtion over the n nodes of the graph. Further, let us denote G ∈ Rn×n be the transition probability matrix such that
if (i,j) ∈ E Gi,j =
otherwise.
Show that
xt+1 = Gxt (1)
Problem B. Simulate the random walk above over a torus and confirm that we indeed converge to a uniform distribution over the nodes. What is the rate at which this convergence occurs?
Follow the Python notebook provided here: colab.research.google.com/github/epfml/OptML course/blob/master/labs/ex04/template/notebook lab04.ipynb
Problem C. Define µ := be a vector of all 1/n, and a objective function f : S → R as
f(x) = (x − µ)>(I − G)(x − µ),
defined over the subspace S ⊆ Rn where .
1. Show that f defined above is convex and compute its smoothness constant.
2. Show that running gradient descent on f with the correct step-size is equivalent to the random walk step
(1).
3. Prove that xt converges to the distribution µ at a linear rate i.e. for the random walk on a torus with n nodes,
.
Hint: Use that the two largest eigenvalues of G are 1 and 1− n1 . Also Gµ = µ and so µ is the eigenvector corresponding to eigenvalue 1.
2

Reviews

There are no reviews yet.

Be the first to review “CS439 – Labs (Solution)”

Your email address will not be published. Required fields are marked *