Abstract

In this paper, we present a technique for estimating the geometry and reflectance of objects using only a camera, flashlight, and optionally a tripod. We propose a simple data capture technique in which the user goes around the object, illuminating it with a flashlight and capturing only a few images.Our main technical contribution is the introduction of a recursive neural architecture, which can predict geometry and reflectance at 2^kx2^k resolution given an input image at 2^kx2^k and estimated geometry and reflectance from the previous step at 2^k-1x2^k-1. This recursive architecture, termed RecNet, is trained with 256x256 resolution but can easily operate on 1024x1024 images during inference. We show that our method produces more accurate surface normal and albedo, especially in regions of specular highlights and cast shadows, compared to previous approaches, given three or fewer input images.

Capture Process

We introduce a weakly calibrated capture procedure where a user shines a flashlight at an object from (up to) six approximate directions: right, front-right, front (co-located with the camera), front-left, left, and above.

Another example of the capture process and high resolution results.

Recursive Network Architecture

Our recursive network architecture learns to predict normals at resolution 2^kx2^k given an image at 2^kx2^k and normals at 2^k-1x2^k-1. This enables us to train on low resolution (256x256) data and generalize to high resolution (1024x1024) data at test time.