In this paper, we present a technique for estimating the geometry and reflectance of objects using only a camera, flashlight, and optionally a tripod. We propose a simple data capture technique in which the user goes around the object, illuminating it with a flashlight and capturing only a few images.Our main technical contribution is the introduction of a recursive neural architecture, which can predict geometry and reflectance at 2kx2k resolution given an input image at 2kx2k and estimated geometry and reflectance from the previous step at 2k-1x2k-1. This recursive architecture, termed RecNet, is trained with 256x256 resolution but can easily operate on 1024x1024 images during inference.
We show that our method produces more accurate surface normal and albedo, especially in regions of specular highlights and cast shadows, compared to previous approaches, given three or fewer input images.
Capture Process
We introduce a weakly calibrated capture procedure where a user shines a flashlight at an object from (up to) six approximate directions: right, front-right, front (co-located with the camera), front-left, left, and above.
Another example of the capture process and high resolution results.
Recursive Network Architecture
Our recursive network architecture learns to predict normals at resolution 2kx2k given an image at 2kx2k and normals at 2k-1x2k-1. This enables us to train on low resolution (256x256) data and generalize to high resolution (1024x1024) data at test time.