CS439 – (Solution)

$ 20.99
Category:

Description

+1/1/60+

Profs. Martin Jaggi and Nicolas Flammarion 1
For your examination, preferably print documents compiled from automultiple-choice.
+1/2/59+
First part, multiple choice
There is exactly one correct answer per question.
Smoothness and gradient descent
Question 1 Let us define f : x ∈ R 7→ cos(x). We consider xt ∈ R and xt+1 = xt − ∇f(xt). Assume that xt is not a critical point of f. Which one of the following statements is true:
There exists an xt such that f(xt+1) > f(xt)
None of the above
Question 2 Assume you want to minimize a function , where for each i, fi is convex and L-smooth over Rd. Which of the following statements is false:
If I use a constant step-size γ < L1 , then GD will converge but not SGD.
If n = 1, then SGD and GD correspond to the same recursion.
If n is very big then gradient descent can be computationally infeasible.
SGD corresponds to xt+1 = xt −γt∇fit(xt) where it is the remainder of t divided by .
Question 3 For a > 0 and b ∈ R, consider f(x) = a·x4+b, x ∈ R. Assume you perform gradient descent on f with a constant step-size γ. Which one of the following statements is true:
If |x0| ≤ 1 and 0 < γ ≤ 1 then the iterates converge to 0.
Depending on my starting point x0 and my step size, either my iterates xt converge to 0, or diverge |xt| −→ +∞. t→∞
For the iterates to converge, my step size must depend on b.
For a starting point then the iterates converge to 0.
For a starting point x0, whatever step size I pick, the iterates will never converge to 0.
Newton’s Method and Quasi-Newton
Question 4 How many steps does the Newton’s method require to reach an error smaller than ε > 0 when minimizing a strictly convex quadratic function:
It depends on the step size.
It depends on the condition number of the quadratic function.
+1/3/58+
Question 5 We apply Newton’s method to a function f with a critical point x⋆ starting from iterate x0.
Assume that f has bounded inverse Hessians and Lipschitz continuous Hessians. Among the following propositions, what is the extra assumption which allows to show that ∥xT −x⋆∥ < ε after T = O(loglog(1/ε)) steps?
Convexity.
Smoothness.
Taking the average iterate.
Decreasing step size. Strong convexity.
∥x0 − x⋆∥ should be small.
Function properties
Consider the function d : D → R with D ⊆ R defined as , where x and x are the coordinates of x. Let us consider three cases: (A) when D = R , (B) when D = {x ∈ R : ∥x∥ ≤ 1}, and (C) when D = {x ∈ R2 : x2 = 3}.
Question 6
C only.
A, B and C.
A and C only. A only.
A and B only.B and C only.
B only.
None of them.
Question 7
C only.
B and C only.
A and B only. A and C only.
A only.
B only.
A, B and C.
None of them.
For your examination, preferably print documents compiled from automultiple-choice.

+1/4/57+
Coordinate descent
Question 8 Compared to gradient descent, coordinate descent with gradient-based updates can speed up optimization when coordinate-wise gradients are cheap to compute, and when the coordinates i (i = 1,…,d) have varying smoothness constants Li. We use coordinate-dependent step sizes ηi, and make a gradient step on coordinate i with probability pi. To obtain a convergence rate that depends on instead of maxi Li, you would use
ηi < ηj and pi < pj ηi > ηj and pi < pj ηi < ηj and pi > pj
ηi > ηj and pi > pj
Subgradient descent
Question 9
Constrained optimization
Consider the Lasso regression
Question 10 the linear minimization oracle LMO(∇f(x )) where x
[−1,0]⊤
[0,0]⊤ [1,0]⊤ [0,1]⊤
Question 11
0
+1/5/56+
Proximal gradient descent
Question 12 For h(x) = |x|, the soft thresholding operator is defined by the proximal operator proxh,t(u).
Then for u ≥ t > 0, proxh,t(u) can be written as which of the following?

For your examination, preferably print documents compiled from automultiple-choice.
+1/6/55+
Second part, true/false questions
Question 13 (Convexity) A function f(x) is convex if and only if g(x) = −f(x) is non-convex.
Question 14
Question 15
Question 16
Question 17
2
Question 18 gradient descent with step-size 0 < γ <
Question 19 (Frank-Wolfe) Consider , if we apply the Frank-Wolfe algorithm with stepsize

+1/7/54+
Third part, open questions
Answer in the space provided! Your answer must be justified with all steps. Do not cross any checkboxes, they are reserved for correction.

+1/8/53+
Question 23: 2 points. Is Dh symmetric, i.e., Dh(x,y) = Dh(y,x)? Prove your answer.

0 1 2

+1/9/52+
Question 26: 3 points. Assume condition (S). Show that for any y,x,z ∈ Rd we have

+1/10/51+
We consider that the function h satisfies additionally the following properties:
• The gradient of h takes all possible values, i.e., ∇h(Rd) = Rd. We consider the optimization algorithm defined as x0 ∈ Rd and which iterates:
xt+1 := Tγ(xt) for t ∈ N. (MD)

+1/11/50+
Question 30: 2 points. Let u ∈ Rd and consider the iterates defined in equation (MD). We denote the average of the iterates x . Show the following inequality:

+1/12/49+
Question 32: 2 points. Does the inequality proved in Question 32 imply convergence f(xt) → f(u)? Prove your answer.

0 1 2

+1/13/48+
Application to Poison Linear Inverse Problems
Let us denote by R+ = {x ∈ R,x ≥ 0} and R+∗ = {x ∈ R,x > 0}. Given a matrix and a vector b , the goal is to reconstruct a signal x such that

+1/14/47+
Let us denote by ai the i-th row of the matrix A. We assume that ai ̸= 0 and for all j. Let us consider Burg’s entropy defined by .
Question 36: 5 points. Show that for any L satisfying L ≥ ∥b∥1, the function Lh − f is convex on

0 1 2 3 4 5

(HINT: you can compute the Hessian and show that it is positive semi-definite.)
+1/15/46+
We will minimize the function f using the Mirror Descent algorithm and the potential h whose update rule is defined similarly by

+1/16/45+
Question 38: 3 points. Assuming that you can apply the results derived in Question 34, which convergence rate do you obtain with this algorithm on this problem? Why is it surprising?

0 1 2 3

For your examination, preferably print documents compiled from automultiple-choice.

Reviews

There are no reviews yet.

Be the first to review “CS439 – (Solution)”

Your email address will not be published. Required fields are marked *