## Description

Supplementary Info

Lucas-Kanade Tracker

A Lucas Kanade tracker uses a warp function W(x;p) to align an image I to a template T. The optimal parameters p∗ of the warp W(x;p) are found by minimizing loss L. The loss L is the pixel-wise sum of square difference between the warped image I(W(x;p)) and template T, refer eq 1 and 2.

Note: We denote pixel locations by x, so I(x) is the pixel value (eg. rgb) at location x in image I. For simplicity, I and T are treated as column vectors instead of a matrix (like linearized matrices!). W(x;p) is the point obtained by warping x. I(W(x;p)) is the pixel value (eg. rgb) after warping.

L = X[T(x) − I(W(x;p))]2 (1)

x

p∗ = argminL (2)

p

In general, this is a difficult non-linear optimization of L in p for even simple warps like affine warps, but the Lucas-Kanade tracker circumvents this by using a key idea – forward additive alignment. The idea assumes we already have a very close estimate p0 of the correct warp p∗, then we can assume that a small linear change ∆p is enough to get the best alignment i.e. p∗ = p0 +∆p. This is the forward additive form of the warp. The loss L can then be written as:

L = X[T(x) − I (W(x;p0 + ∆p))]2

x (3)

∆p∗ = argminL

∆p (4)

p∗ = p0 + ∆p∗

Expanding this to the first order with Taylor Series: (5)

(x;p (6)

x

Here the Jacobian of I , is the vector containing the horizontal and vertical gradient at pixel location x. Rearranging the Taylor expansion, it can be rewritten as a typical least squares approximation ∆p∗ = argmin||A∆p−b||2, note we pull out a negative sign thanks to the squaring,

∆p

∆p∗ = argmin p (7)

x

This can be solved with ∆p∗ = (ATA)−1ATb where ATA is the Hessian H:

(ATA) = H = X (8)

x

(9)

b = T(x) − I(W(x;p0)) (10)

Once ∆p∗ is computed, the best estimate warp p∗ can be estimated using eq 5. However, in practice, our initial estimate p0 can be well off the optimal p∗, thus violating the forward additive alignment assumption. As a fix, we perform the optimization L in an iterative fashion using an error threshold ϵ as described in algorithm 2 from [?].

Algorithm 1 The Lucas-Kanade Algorithm

(0) p ← p0

(1) Iterate until ||∆p|| ≤ ϵ:

(2) Warp I with W(x;p) to compute I(W(x;p))

(3) Compute the error image E(x) = T(x) − I(W(x;p))

(4) Warp the gradient ∇I with W(x;p)

(5) Evaluate the Jacobian ∂∂Wp at (x;p)

(6) Compute the steepest descent image ∇I∂∂Wp

T

(7) Compute the Hessian matrix H

(8) Compute ∆p

(9) Update the parameters p ← p + ∆p

Matthews-Baker Tracker

The key to the efficiency is switching the role of the image I and the template T in the algorithm. In contrast to the Lucas-Kanade tracker (warp image I to template T), the Matthews-Baker tracker warps the template T to the image I. This key difference leads to the computational gains because unlike the image I which changes with each frame of the video, the template T remains fixed throughout the video. If we can write the Hessian and Jacobian involved in the optimization of L in terms of the template T instead of image I, then we only need to compute them once at the start of the tracking.

Here is the inverse compositional form of the Matthew-Baker tracker given without the proof of equivalence to the forward additive form of the Lucas-Kanade tracker. Please refer [?] for the proof. Given an initial estimate p0, we want to find the ∆p∗ to minimize L as follows,

L = X[T(W(x;∆p)) − I(W(x;p0))]2

x (11)

∆p∗ = argminL (12)

∆p

This completes our role switch of the template T and the image I, please note that ∆p are the parameters of the warp W applied to T and p0 are the parameters of the warp W applied to I.

Another key difference of Matthews-Baker tracker to the Lucas-Kanade tracker is – how do we combine the two warps W(x;p0) and W(x;∆p)? In the Lucas-Kanade tracker we combined the two warps by simply adding parameter p0 to another parameter ∆p, and produce a new warp W(x;p + ∆p). Specifically, p∗ = p0 + ∆p∗, W(x;p∗) = W(x;p0) + W(x;∆p∗). In Matthews-Baker tracker, we use another way of combination through composition as follows,

W(x;p∗) = W(x;p0) ◦ W(x;∆p∗)−1 (13)

If we are using affine warps, then the warps can be implemented as matrix multiplication, then we can simplify eq 13 as follows

W(x;p∗) = W(W(x;∆p∗)−1;p0)

W(x;p∗) = W(p0)W(∆p∗)−1x

Let us simplify L as before using first order linear approximation of T(W(x;∆p)). Please note, for tracking a template T, the summation over x is performed only over the pixels lying inside T.

L ≈ X (14)

x where .

We now proceed to find a closed form solution to ∆p∗ by equating the derivative ∂∂∆Lp to zero.

∂

(15)

x

Setting to zero, switching from summation to vector notation and solving for ∆p we get

∆p = H−1JT [I(W(x;p0)) − T] (16)

where J is the Jacobian of T(W(x;∆p)), J = ∇T∂∂Wp0, H is the approximated Hessian H = JTJ and

I(W(x;p0)) is the warped image. Note that for a given template, the Jacobian J and Hessian H are independent of p0. This means they only need to be computed once and then they can be reused during the entire tracking sequence.

Once ∆p∗ has been solved for, it needs to be inverted and composed with p0 to get the new warp parameters for the next iteration.

W(x;p0) ← W(x;p0) ◦ W(x;∆p∗)−1 (17)

The next iteration solves Equation 16 starting with the new value of p0. Possible termination criteria include the absolute value of ∆p∗ falling below some value or running for some fixed number of iterations.

Algorithm 2 The Matthews-Baker Algorithm

(0) p ← p0

(1) Pre-compute

(1) Evaluate the gradient of ∇T of the template T(x)

(2) Evaluate the Jacobian ∂∂Wp at (x;0)

(3) Compute the steepest descent images ∇T∂∂Wp

(4) Compute the Hessian matrix using J JTJ

(5)

(6) Iterate until ||∆p|| ≤ ϵ:

(7) Warp I with W(x;p) to compute I(W(x;p))

(8) Compute the error image E(x) = I(W(x;p)) − T(x)

T

(9) Compute ∆p

(10) Update the parameters p ← p ◦ (∆p)−1

## Reviews

There are no reviews yet.