16-720 – Homework 6: Photometric Stereo Solved

Description

As we discussed in class, photometric stereo is a computational imaging method to determine the shape of an object from its appearance under a set of lighting directions. In the course of this homework, we will solve a simplified version of the photometric stereo problem. We will make certain assumptions: that that the object is Lambertian and is imaged with an orthographic camera under a set of directional lights. We will first solve calibrated photometric stereo, in which lighting directions are given; then uncalibrated photometric stereo, where we have to infer lighting directions and the shape of the object jointly from the given images. In this process, we will learn about an ambiguity in the resultant shape introduced because of this joint estimation.
Even though the homework is fairly self-contained, we strongly recommend going over the papers in the references before starting. As always, start early.
Instructions
1. Feedback: This is a new homework created just this year. We have made every effort to make sure that it is as correct, clear and complete as possible, but there are bound to be some mistakes and places where the explanation could be better. Make sure to point these out in your write-up (for extra credit, see the last part) or on Piazza. Please be patient in case there is confusion, we will clear it up as soon as possible.
4. Write-up: Please type out your answers to theory questions and include experimental results and discussions into a document <andrew-id>.pdf (preferably in LATEX).
Please note that we DO NOT accept handwritten scans for your write-up in this homework.
5. Code: Running your code is expected to produce the figures asked for in the questions. Expect your code to be auto-graded. This requires you to stick to the function prototypes in the starter code. Read the docstrings in the code to understand more about inputs and expected outputs. Remember to comment your code. In case the autograder fails, we will go through your code to see if we can give you partial credit. It’ll be of help to us to even roughly understand what each block of code does. If you need to create helper functions, create them in the file where you need them. Make sure you use relative paths in your code as opposed to absolute paths.
6. Submission: Create the following directory structure:
• <andrew-id>/
– <andrew-id>.pdf – src/
∗ q1.py
∗ q2.py
∗ q3.py
– extra/
∗ entropy.py
Zip up the folder into <andrew-id>.zip such that unzipping recreates <andrew-id>/. Upload the zip to Canvas and the <andrew-id>.pdf to Gradescope.
1 Calibrated photometric stereo (80 points)

Figure 1: 3D reconstruction result from calibrated photometric stereo
All code in this part goes in src/q1.py. Outside the function definitions, write code to pipeline these functions and generate the results asked for in this question.
Image formation. We will now look at image formation under our specific assumptions. We will understand and implement the process that happens in the scene when we take a picture with an orthographic camera.

Figure 2: Geometry of photometric stereo
(a, 5 points) Understanding n-dot-l lighting. In your write-up, explain the geometry of the n-dot-l lighting model from Fig. 2. Where does the dot product come from? Where does projected area come into the equation? Why does the viewing direction not matter?
(b, 10 points) Rendering n-dot-l lighting. Consider a uniform fully reflective Lambertian sphere with its center at the origin and a radius of 0.75 cm. An orthographic camera located at (0, 0, 10) cm looks towards the negative z-axis with its sensor axes aligned and centered on the x- and y-axes. The pixels on the camera are squares 7 µm in size, and the resolution of the camera is 3840 × 2160 pixels. Simulate the appearance of the sphere under the√ n-dot-l

model with directional light sources with incoming lighting directions (1, 1, 1)/√ √ 3, (1, -1,

1)/ 3 and (-1, -1, 1)/ 3 in the function renderNDotLSphere. Note that your rendering isn’t required to be absolutely radiometrically accurate: we need only evaluate the n-dot-l model. Include the rendering in your write-up.
Inverting the image formation model. Armed with this knowledge, we will now invert the image formation process given the lighting directions. Seven images of a face lit from different directions are given to us, along with the ground-truth directions of light sources. These images are taken at lighting directions such that there is not a lot of occlusion or normals being directed opposite to the lighting, so we will ignore these effects. The images live in the data/ directory, named as input n.tif for the nth image. The source directions are given in the file data/sources.mat.
(c, 5 points) Loading data. In the function loadData, read the images into Python. Convert the images into the XYZ color space and extract the luminance channel. Vectorize these luminance images and stack them in a 7×P matrix, where P is the number of pixels in each image. This is the matrix I, which is given to us by the camera. Next, load the sources file and convert it to a 3 × 7 matrix L.
Note that the images are 16-bit .tifs, and it’s important to preserve that bit depth while reading the images for good results. Therefore, make sure that the datatype of your images is uint16 after loading them.
(d, 10 points) Initials. Recall that in general, we also need to consider the reflectance, or albedo, of the surface we’re reconstructing. We find it convenient to group the normals and albedos (both of which are a property of only the surface) into a pseudonormal b = a · n, where a is the scalar albedo. We then have the relation I = LTB, where the 3 × P matrix B is the set of pseudonormals in the images. With this model, explain why the rank of I should be 3. Perform a singular value decomposition of I and look at the singular values. Do the singular values agree with the rank-3 requirement? Explain this result in your write-up.
(e, 20 points) Estimating pseudonormals. Since we have more measurements (7 per pixel) than variables (3 per pixel), we will estimate the pseudonormals in a least-squares sense. Note that there is a linear relation between I and B through L: therefore, we can write a linear system of the form Ax = y out of the relation I = LTB and solve it to get B. Solve this linear system in the function estimatePseudonormalsCalibrated. Estimate perpixel albedos and normals from this matrix in estimateAlbedosNormals. In your write-up, mention how you constructed the matrix A and the vector y.
Note that here matrices you end up creating might be huge and might not fit in your computer’s memory. In that case, to prevent your computer from freezing or crashing completely, make sure to use the sparse module from scipy. You might also want to use the sparse linear system solver in the same module.
The per-pixel normals can be viewed as an RGB image. Reshape the estimated normals into an image with 3 channels and display it in the function displayAlbedoNormals. Include this image in the write-up. Do the normals match your expectation of the curvature of the face? Make sure to display in the rainbow colormap.
Normal integration. Given that the normals represent the derivatives of the depth map, we will integrate the normals obtained in (g). We will use a special case of the FrankotChellappa algorithm for normal integration, as given in [1]. You can read the paper for the general version of the normal integration algorithm.
(h, 5 points) Understanding integrability of gradients. Consider the 2D, discrete function g on space given by the matrix below. Find its x and y gradient, given that gradients are calculated as gx(xi,yj) = g(xi+1,yj) − g(xi,yj) for all i,j (and similarly for y). Let us define (0, 0) as the top left, with x going in the horizontal direction and y in the vertical.
(1)
Note that we can reconstruct the entire of g given the values at its boundaries using gx and gy. Given that g(0,0) = 1, perform these two procedures:
1. Use gx to construct the first row of g, then use gy to construct the rest of g;
2. Use gy to construct the first column of g, then use gx to construct the rest of g.
Are these the same?
The Frankot-Chellappa algorithm for estimating shapes from their gradient first projects the (possibly non-integrable) gradients obtained in (h) above onto the set of all integrable gradients. The resulting (integrable) projection can then be used to solve for the depth map. The function integrateFrankot in utils.py implements this algorithm given the two surface gradients.
(i, 10 points) Shape estimation. Write a function estimateShape to apply the FrankotChellappa algorithm to your estimated normals. Once you have the function f(x,y), plot it as a surface in the function plotSurface and include some significant viewpoints in your write-up. The 3D projection from mpl toolkits.mplot3D.Axes3D along with the function plot surface might be of help. Make sure to plot this in the coolwarm colormap.
2 Uncalibrated photometric stereo (50 points)

Figure 3: 3D reconstruction result from uncalibrated photometric stereo
We will now put aside the lighting direction and estimate shape directly from the images of the face. As before, load the lights into L0 and images in the matrix I. We will not use the matrix L0 for anything except to compare our estimate against in this part. All code for this question goes in src/q2.py.
(a, 10 points) Uncalibrated normal estimation. Recall the relation I = LTB. Here, we know neither L nor B. Therefore, this is a matrix factorization problem with the constraint that with the estimated Lˆ and Bˆ, the rank of ˆI = LˆTBˆ be (as you answered in the previous question) 3, and the estimated ˆI and Lˆ have appropriate dimensions.
It is well-known that the best rank-k approximation to a m × n matrix M, where k ≤ min{m,n} is calculated as the following: perform a singular value decomposition SVD M = UΣVT, set all singular values except the top k from Σ to 0 to get the matrix Σˆ, and reconstitute Mˆ = UΣVˆ T. Explain in your write-up how this can be used to construct a factorization of the form detailed above following the required constraints.
(b, 10 points) Calculation and visualization. With your method, estimate the pseudonormals Bˆ in estimatePseudonormalsUncalibrated and visualize the resultant albedos and normals in the gray and rainbow colormaps respectively. Include these in the write-up.
(c, 5 points) Comparing to ground truth lighting. In your write-up, compare the Lˆ estimated by the factorization above to the ground truth lighting directions given in L0. Are they similar? Unless a special choice of factorization is made, they will be different. Demonstrate a simple change to the procedure in (a) that changes the Lˆ and Bˆ, but keeps the images rendered using them the same using only the matrices you calculated during the singular value decomposition.
(d, 5 points) Reconstructing the shape, attempt 1. Use your implementation of the Frankot-Chellappa algorithm from the previous question to reconstruct a 3D depth map and visualize it as a surface in the ‘coolwarm’ colormap as in the previous question. Does this look like a face?
Enforcing integrability explicitly. The ambiguity you demonstrated in (c) can be (partially) resolved by explicitly imposing integrability on the pseudonormals recovered from the factorization. We will follow the approach in [2] to transform our estimated pseudonormals into a set of pseudonormals that are integrable. The function enforceIntegrability in utils.py implements this process.
(e, 5 points) Reconstructing the shape, attempt 2. Input your pseudonormals into the enforceIntegrability function, use the output pseudonormals to estimate the shape with the Frankot-Chellappa algorithm and plot a surface as in the previous questions. Does this surface look like the one output by calibrated photometric stereo? Include at least three viewpoints of the surface and your answers in your write-up.
The generalized bas-relief ambiguity. Unfortunately, the procedure in (e) resolves the ambiguity only up to a matrix of the form
G (2)
where λ > 0, as shown in [3]. This means that for any µ, ν and λ > 0, if B is a set of integrable pseudonormals producing a given appearance, G−TB is another set of integrable pseudonormals producing the same appearance. The ambiguity introduced in uncalibrated photometric stereo by matrices of this kind is called the generalized bas-relief ambiguity (French, translates to ‘low-relief’, pronounced as ‘bah ree-leef’).
(f, 5 points) Why low relief? Vary the parameters µ, ν and λ in the bas-relief transformation and visualize the corresponding surfaces. Include at least six (two with each parameter varied) of the significant ones in your write-up. Looking at these, what is your guess for why the bas-relief ambiguity is so named? In your write-up, describe how the three parameters affect the surface.
(g, 5 points) Flattest surface possible. With the bas-relief ambiguity, in Eq. 2, how would you go about designing a transformation that makes the estimated surface as flat as possible?
(h, 5 points) More measurements. We solved the problem with 7 pictures of the face. Will acquiring more pictures from more lighting direections help resolve the ambiguity?
3 Homework feedback (extra credit, 10 points)
(a, 10 points) Feedback. Please mention in your write-up anything you found unclear or incorrect in the homework or the starter code, and any way in which it can be improved.
Credits
This homework, in a large part, was inspired from the photometric stereo homework in Computational Photography (15-463) offered by Prof. Ioannis Gkioulekas at CMU. The data comes from the course on Computer Vision offered by Prof. Todd Zickler at Harvard.
References
[2] Alan Yuille and Daniel Snow. Shape and albedo from multiple images using integrability. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 158–164. IEEE, 1997.

Reviews

There are no reviews yet.

Be the first to review “16-720 – Homework 6: Photometric Stereo Solved”

16-720 – Homework 6: Photometric Stereo Solved

Description

Reviews

Related products

16-720 – An Intro to Python Solved