## Description

+1/1/60+

Profs. Martin Jaggi and Nicolas Flammarion 1

For your examination, preferably print documents compiled from automultiple-choice.

+1/2/59+

First part, multiple choice

There is exactly one correct answer per question.

Convexity

Question 1 Definition: For a (not necessarily convex) function f : R2 → R, an α-level curve (also known as contour line) corresponds to the set {x ∈ Rd,f(x) = α} where α ∈ R. We can represent these curves by drawing them for different values of α. In Figure 1 (a) for example, each heart corresponds to an α-level line of a certain function f : R2 →R and for different values of α ∈R. The two perpendicular lines with arrows at the end are the axis of the plot and not level lines.

Which of the four plots in Figure 1 could correspond to the level curves of a convex function f : R2 →R.

B and C.

A and D. A and C. A and B. B and D. C and D.

(a) Level lines A (b) Level lines B (c) Level lines C (d) Level lines D

Figure 1: Several level lines for four different functions f : R2 →R

Question 2 Assume we perform constant step-size stochastic gradient descent on , where f1(x) = (x − 1)2 and f2(x) = (x + 1)2 for x ∈R, i.e. xt+1 = xt − γ∇fit(xt) where at each iteration, it is chosen uniformly random in {1,2}. Which of the following statements is false:

For γ = 1, we cannot guarantee that the iterates stay in a bounded set. x = 0 is the global minimum of f.

Whatever the choice of constant step-size γ > 0, the iterates cannot converge as t goes to infinity.

For γ = 2, for any starting point x0 and after the first iteration, the iterates will belong to {−1,+1}.

Question 3 Considering the same setup as in the question above, but now assuming f1(x) = x2 and f2(x) = ex, x ∈ R. After running T steps of SGD, we find that ∇fiT(xT) = 0. Which of the following statements is true: xT is a local minimum but not a global minimum.

xT = 0. xT is a global (and local) minimum.

None of the other choices.

+1/3/58+

Question 4 Given a function f : Rd →R we want to minimize. We assume at each iteration t a stochastic oracle is providing us with stochastic gradient g(xt) to run our SGD algorithm: xt+1 = xt − γg(xt). We consider the following two stochastic oracles:

((

3∇f(x), w. prob.∇f(x), w. prob.

gA(x) :=gB(x) := ε∼N(0,1), w. prob. −0.5∇f(x), w. prob.

Which statement is true?

Oracle A and B are both biased.

Oracle A and B are both unbiased.

Question 5

0 t t t∈N,

will converge to:

(a⋆,b⋆) = (0,1).

(a⋆,b⋆) = (1,0).

(a⋆,b⋆) = (0.5,0.5).

(a⋆,b⋆) = (−1,2).

HINT: do a drawing.

Smoothness and gradient descent

Question 6

+1/4/57+

Question 7 Define f(x) := x4 with domain Df := [−2,2]. Assume we want to find a point xT with f(xT) ≤ ε starting from x0 ∈ Df. Among the following statements, which is true and provides the tightest bound?

Using Nesterov acceleration and an appropriate step size, we have since f is smooth and convex over Df.

convex over Df. since f is smooth and convex over Df.

and strongly convex over D .

Question 8

⋆

.

M

A: To get a target error of ε = 10

B: To get a target error of ε = 10

Only A is true.

Both A and B are true.

Neither A nor B is true. Only B is true.

Projected Gradient Descent

Question 9

Which of the following properties is false?

None of the other choices.

+1/5/56+

Subgradient Descent

Question 10 Consider the function f(x) = 4x − x2 for x ∈R. The gradient and subgradient at x = 2

are, respectively

0, 0.

0, doesn’t exist.

2 , 2.

Frank-Wolfe

Question 11

D

A and B

F

A B

C

Newton’s Method and Quasi-Newton

Question 12

0

Newton’s method converges to 0 on f(x) faster than on g(x).

Newton’s method converges to 0 on g(x) faster than on f(x).

+1/6/55+

Second part, true/false questions

Question 13 (Convexity) If a function is strictly convex, then it is also strongly convex.

Question 14

Question 15 and running gradient descent with stepsize γ :=

Question 16 if and only if ∇2f(x) is diagonal for all x∈R .

Question 17

Question 18 d is an identity mapping.

Question 19 (Lasso) For any vector v∈Rd, the projection onto the ℓ1-ball B i.e. B = {x : ∥x∥1 ≤ 1}, always lies on the boundary of the ℓ1-ball.

TRUE FALSE

+1/7/54+

Question 20 (Frank-Wolfe) Consider the two constrained optimization problems min

with initial iterate x0 = [1,1]⊤, and min with initial iterate x0 = [10,1]⊤, after any number of iterations of the Frank-Wolfe algorithm, the optimization error for those two problems will be the same.

+1/8/53+

Third part, open questions

Answer in the space provided! Your answer must be justified with all steps. Do not cross any checkboxes, they are reserved for correction.

In the whole exercise, we consider a convex and differentiable function f : Rd → R and denote by x⋆ ∈

+1/9/52+

Question 24: 1 point. Prove finally the following inequalities for x∈Rd:

.

For your examination, preferably print documents compiled from automultiple-choice.

+1/10/51+

The Polyak stepsize rule

In this section, we consider the iterates of the gradient descent algorithm on the function f with stepsize sequence (γt)t≥0 defined as:

+1/11/50+

This stepsize is called the Polyak stepsize. From now on, we consider the iterates of gradient descent defined in Eq. (1) where the sequence (γt) is defined in the previous question.

In the following sections, we study the rates of convergence of the gradient-descent algorithm with such stepsize. We denote by x¯T the iterate which satisfies f(x¯T) = min0≤t≤T−1 f(xt).

Question 27: 1 point. Show that the iterates (xt) defined with the Polyak stepsize satisfy for t ≥ 0:

+1/12/49+

Analysis under bounded gradients assumption

We assume in this section that the function f has bounded gradients, i.e., there exists B ≥ 0 such that ∥∇f(x)∥≤ B for all x∈Rd.

Question 28: 3 points. Let T ≥ 1. Show the following inequality:

+1/13/48+

For t ≥ 0, let us denote by . We have then proven that:

βt+1 ≤ βt(1 − βt).

+1/14/47+

Question 31: 2 points. Let T ≥ 2 be an even number. Show that

.

Although not proven here, a similar bound that the one proved in Question 32 also holds for any odd number T.

+1/15/46+

Analysis under smoothness assumption

We assume in this section that the function f is L-smooth.

Question 33: 2 points. Let T ≥ 1. Show that

Conclusion

Your results imply that

+1/16/45+

## Reviews

There are no reviews yet.