Description
5. Duality
• Lagrange dual problem
• weak and strong duality
• geometric interpretation
• optimality conditions
• perturbation and sensitivity analysis
• examples
• generalized inequalities
Lagrangian
standard form problem (not necessarily convex)
minimize f0(x)
subject to fi(x) ≤ 0, i = 1,…,m
hi(x) = 0, i = 1,…,p
variable x ∈ Rn, domain D, optimal value p⋆
Lagrangian: L : Rn × Rm × Rp → R, with domL = D × Rm × Rp,
• weighted sum of objective and constraint functions
• λi is Lagrange multiplier associated with fi(x) ≤ 0
• νi is Lagrange multiplier associated with hi(x) = 0
Lagrange dual function
Lagrange dual function: g : Rm × Rp → R,
g(λ,ν) = inf L(x,λ,ν)
x∈D
m p
= inf f0(x) + Xλifi(x) + Xνihi(x)!
x∈D
i=1 i=1
g is concave, can be −∞ for some λ, ν lower bound property: if , then g(λ,ν) ≤ p⋆ proof: if x˜ is feasible and , then
minimizing over all feasible x˜ gives p⋆ ≥ g(λ,ν)
Least-norm solution of linear equations
minimize xTx subject to Ax = b
dual function
• Lagrangian is L(x,ν) = xTx + νT(Ax − b)
• to minimize L over x, set gradient equal to zero:
∇xL(x,ν) = 2x + ATν = 0 =⇒ x = −(1/2)ATν
• plug in in L to obtain g:
a concave function of ν lower bound property: p⋆ ≥ −(1/4)νTAATν − bTν for all ν
Standard form LP
minimize cTx
subject to
dual function
• Lagrangian is
L(x,λ,ν) = cTx + νT(Ax − b) − λTx
= −bTν + (c + ATν − λ)Tx
• L is affine in x, hence
Tν ATν − λ + c = 0 otherwise
g is linear on affine domain {(λ,ν) | ATν − λ + c = 0}, hence concave lower bound property:
Equality constrained norm minimization
minimize kxk subject to Ax = b
dual function
kATνk∗ ≤ 1 otherwise
where kvk∗ = supkuk≤1 uTv is dual norm of k · k proof: follows from infx(kxk − yTx) = 0 if kyk∗ ≤ 1, −∞ otherwise
• if kyk∗ ≤ 1, then kxk − yTx ≥ 0 for all x, with equality if x = 0
• if kyk∗ > 1, choose x = tu where kuk ≤ 1, uTy = kyk∗ > 1:
kxk − yTx = t(kuk − kyk∗) → −∞ as t → ∞
lower bound property: p⋆ ≥ bTν if kATνk∗ ≤ 1
Two-way partitioning
minimize xTWx subject to x2i = 1, i = 1,…,n
• a nonconvex problem; feasible set contains 2n discrete points
• interpretation: partition {1,…,n} in two sets; Wij is cost of assigning i, j to the same set; −Wij is cost of assigning to different sets dual function
diag(ν))x − 1Tν −1Tν W + diag
=
−∞ otherwise
lower bound property: p⋆ ≥ −1Tν if W + diag example: ν = −λmin(W)1 gives bound p⋆ ≥ nλmin(W)
Lagrange dual and conjugate function
minimize f0(x) subject to
dual function
• recall definition of conjugate f∗(y) = supx∈domf(yTx − f(x))
• simplifies derivation of dual if conjugate of f0 is kown example: entropy maximization
The dual problem
Lagrange dual problem
maximize g(λ,ν) subject to
• finds best lower bound on p⋆, obtained from Lagrange dual function
• a convex optimization problem; optimal value denoted d⋆
• λ, ν are dual feasible if domg
• often simplified by making implicit constraint (λ,ν) ∈ domg explicit example: standard form LP and its dual (page 1–5)
minimize cTx maximize −bTν subject to subject to
Weak and strong duality
weak duality: d⋆ ≤ p⋆
• always holds (for convex and nonconvex problems)
• can be used to find nontrivial lower bounds for difficult problems for example, solving the SDP
maximize −1Tν subject to W + diag
gives a lower bound for the two-way partitioning problem on page 1–7 strong duality: d⋆ = p⋆
• does not hold in general
• (usually) holds for convex problems
• conditions that guarantee strong duality in convex problems are called constraint qualifications
Slater’s constraint qualification
strong duality holds for a convex problem
minimize f0(x)
subject to fi(x) ≤ 0,
Ax = b i = 1,…,m
if it is strictly feasible, i.e.,
∃x ∈ intD : fi(x) < 0, i = 1,…,m, Ax = b
• also guarantees that the dual optimum is attained (if p⋆ > −∞)
• can be sharpened: e.g., can replace intD with relintD (interior relative to affine hull); linear inequalities do not need to hold with strict inequality, . . .
there exist many other types of constraint qualifications
Inequality form LP
primal problem
minimize cTx
subject to
dual function
Tλ ATλ + c = 0 otherwise
dual problem
maximize −bTλ
subject to
• from Slater’s condition: p⋆ = d⋆ if Ax˜ ≺ b for some x˜ in fact, p⋆ = d⋆ except when primal and dual are infeasible
Quadratic program
primal problem (assume
minimize xTPx
subject to
dual function
dual problem
maximize −(1/4)λTAP−1ATλ − bTλ
subject to
• from Slater’s condition: p⋆ = d⋆ if Ax˜ ≺ b for some x˜ in fact, p⋆ = d⋆ always
A nonconvex problem with strong duality
minimize xTAx + 2bTx subject to xTx ≤ 1 , hence nonconvex dual function: g(λ) = infx(xT(A + λI)x + 2bTx − λ)
• unbounded below if or if and b 6∈ R(A + λI)
• minimized by x = −(A + λI)†b otherwise: g(λ) = −bT(A + λI)†b − λ dual problem and equivalent SDP:
maximize −bT(A + λI)†b − λ maximize −t − λ
subject toA + λI b
subject to
strong duality although primal problem is not convex (not easy to show)
Geometric interpretation
for simplicity, consider problem with one constraint f1(x) ≤ 0 interpretation of dual function:
g(λ) = inf (t + λu), where G = {(f1(x),f0(x)) | x ∈ D}
(u,t)∈G
t t
• λu + t = g(λ) is (non-vertical) supporting hyperplane to G
• hyperplane intersects t-axis at t = g(λ)
epigraph variation: same interpretation if G is replaced with
A = {(u,t) | f1(x) ≤ u,f0(x) ≤ t for some x ∈ D}
strong duality
• holds if there is a non-vertical supporting hyperplane to A at (0,p⋆)
• for convex problem, A is convex, hence has supp. hyperplane at (0,p⋆)
• Slater’s condition: if there exist (u,˜ t˜) ∈ A with u <˜ 0, then supporting hyperplanes at (0,p⋆) must be non-vertical
Complementary slackness
assume strong duality holds, x⋆ is primal optimal, (λ⋆,ν⋆) is dual optimal !
hence, the two inequalities hold with equality
• x⋆ minimizes L(x,λ⋆,ν⋆)
(known as complementary slackness):
Karush-Kuhn-Tucker (KKT) conditions
the following four conditions are called KKT conditions (for a problem with differentiable fi, hi):
1. primal constraints: fi(x) ≤ 0, i = 1,…,m, hi(x) = 0, i = 1,…,p
2. dual constraints:
3. complementary slackness: λifi(x) = 0, i = 1,…,m
4. gradient of Lagrangian with respect to x vanishes:
from page 1–17: if strong duality holds and x, λ, ν are optimal, then they must satisfy the KKT conditions
KKT conditions for convex problem
if x˜, λ˜, ν˜ satisfy KKT for a convex problem, then they are optimal:
• from complementary slackness: f0(x˜) = L(x,˜ λ,˜ ν˜)
• from 4th condition (and convexity): g(λ,˜ ν˜) = L(x,˜ λ,˜ ν˜) hence, f0(x˜) = g(λ,˜ ν˜)
if Slater’s condition is satisfied:
x is optimal if and only if there exist λ, ν that satisfy KKT conditions
• recall that Slater implies strong duality, and dual optimum is attained
• generalizes optimality condition ∇f0(x) = 0 for unconstrained problem example: water-filling (assume αi > 0)
minimize subject to
x is optimal iff , and there exist λ ∈ Rn, ν ∈ R such that
,
• if ν < 1/αi: λi = 0 and xi = 1/ν − αi
• if ν ≥ 1/αi: λi = ν − 1/αi and xi = 0
• determine ν from
interpretation
• n patches; level of patch i is at height αi ⋆
1/ν
• resulting level is 1/ν⋆
i
Perturbation and sensitivity analysis
(unperturbed) optimization problem and its dual
minimize f0(x) maximize g(λ,ν)
subject to fi(x) ≤ 0, i = 1,…,m subject to hi(x) = 0, i = 1,…,p
perturbed problem and its dual
min. f0(x) max. g(λ,ν) − uTλ − vTν
s.t. fi(x) ≤ ui, i = 1,…,m s.t. hi(x) = vi, i = 1,…,p
• x is primal variable; u, v are parameters
• p⋆(u,v) is optimal value as a function of u, v
• we are interested in information about p⋆(u,v) that we can obtain from the solution of the unperturbed problem and its dual
global sensitivity result
assume strong duality holds for unperturbed problem, and that λ⋆, ν⋆ are dual optimal for unperturbed problem apply weak duality to perturbed problem:
p⋆(u,v) ≥ g(λ⋆,ν⋆) − uTλ⋆ − vTν⋆
= p⋆(0,0) − uTλ⋆ − vTν⋆
sensitivity interpretation large: p⋆ increases greatly if we tighten constraint i (ui < 0)
small: p⋆ does not decrease much if we loosen constraint i (ui > 0)
• if large and positive: p⋆ increases greatly if we take vi < 0; if large and negative: p⋆ increases greatly if we take vi > 0
• if small and positive: p⋆ does not decrease much if we take vi > 0; if small and negative: p⋆ does not decrease much if we take vi < 0 local sensitivity: if (in addition) p⋆(u,v) is differentiable at (0,0), then
proof (for ): from global sensitivity result,
hence, equality
p⋆(u) for a problem with one (inequality) constraint:
p⋆(0)−λ⋆u
Duality and problem reformulations
• equivalent formulations of a problem can lead to very different duals
• reformulating the primal problem can be useful when the dual is difficult to derive, or uninteresting
common reformulations
• introduce new variables and equality constraints
• make explicit constraints implicit or vice-versa
• transform objective or constraint functions
e.g., replace f0(x) by φ(f0(x)) with φ convex, increasing
Introducing new variables and equality constraints
minimize f0(Ax + b)
• dual function is constant: g = infx L(x) = infx f0(Ax + b) = p⋆
• we have strong duality, but dual is quite useless reformulated problem and its dual
minimize f0(y) maximize subject to Ax + b − y = 0 subject to ATν = 0
dual function follows from
norm approximation problem: minimize kAx − bk
minimize kyk subject to y = Ax − b
can look up conjugate of k · k, or derive dual directly
otherwise
(see page 1–4)
dual of norm approximation problem
maximize bTν subject to ATν = 0, kνk∗ ≤ 1
Implicit constraints
LP with box constraints: primal and dual problem
minimize cTx maximize −bTν − 1Tλ1 − 1Tλ2 subject to subject to
reformulation with box constraints made implicit
T
minimize
subject to Ax = b
dual function
dual problem: maximize −bTν − kATν + ck1
Problems with generalized inequalities
minimize f0(x) subject to
is generalized inequality on Rki
definitions are parallel to scalar case:
• Lagrange multiplier for is vector λi ∈ Rki
• Lagrangian L : Rn × Rk1 × ··· × Rkm × Rp → R, is defined as
• dual function g : Rk1 × ··· × Rkm × Rp → R, is defined as
lower bound property: if , then g(λ1,…,λm,ν) ≤ p⋆ proof: if x˜ is feasible and , then
minimizing over all feasible x˜ gives p⋆ ≥ g(λ1,…,λm,ν) dual problem
maximize g(λ1,…,λm,ν) subject to
• weak duality: p⋆ ≥ d⋆ always
• strong duality: p⋆ = d⋆ for convex problem with constraint qualification (for example, Slater’s: primal problem is strictly feasible)
Semidefinite program
primal SDP (Fi,G ∈ Sk)
minimize cTx
subject to
• Lagrange multiplier is matrix Z ∈ Sk
• Lagrangian L(x,Z) = cTx + tr(Z(x1F1 + ··· + xnFn − G))
• dual function
dual SDP maximize −tr(GZ) subject to , tr(FiZ) + ci = 0, i = 1,…,n
p⋆ = d⋆ if primal SDP is strictly feasible (∃x with x1F1 + ··· + xnFn ≺ G)
Reviews
There are no reviews yet.