3-D Scene Reconstruction
Hany Farid |
Monday, April 9, 2012 at 6:01AM
[Photo credit: John Caputo (wolf); Flickr user Robobobobo (rabbit)]Predators (lions, wolves, humans) typically have two forward facing eyes, while prey (gazelles, rabbits, chicken) typically have two eyes on either side of their head. The eyes of animals that are preyed upon are ideally positioned to allow them to see in front of and behind them and therefore detect danger from any direction. The eyes of animals that hunt are ideally positioned to allow them to triangulate the 3-D location of their prey, even it if means that they have a narrower field of view. The same mechanisms that allows animals to efficiently hunt (stereopsis) underlies how 3-D information can be extracted from a pair of images.
A photograph is the result of projecting a 3-D scene onto a 2-D sensor. This image formation process results in a loss of information — the distance to objects in the 3-D scene generally cannot be recovered from a single image. More specifically, there is an inherent size/distance ambiguity: objects in the scene may be large and far away, or small and nearby.
This ambiguity would obviously be a problem for a wolf hunting his prey (is that a huge rabbit in the distance or a small rabbit nearby?) Of course, a wolf, and humans, have two eyes which eliminate this ambiguity.
The projection of a 3-D scene onto each of our eyes is slightly different and this difference is proportional to the distance to objects in the scene. Close one eye and then another and you will notice that nearby objects shift quite a bit as compared to distant objects. These differences can be used to determine how far an object is to us.
Shown below, for example, is a pair of images taken from slightly different locations. Roll your mouse onto and off the image to flip between each image. Note that nearby objects shift quite a bit as compared to distant objects. Notice also how flipping back and forth gives a strong sense of depth in the photo (there are, of course, other monocular cues that contribute to a sense of 3-D).

[Photo credit: Alexander Savin]
In an image forensic setting, it can often be useful to be able to measure certain 3-D properties of a scene: how large is an object, what is the distance between two objects, etc. Commercially available stereo cameras (FinePix Real 3D) bundle two spatially offset cameras into one casing and simultaneously record two images from slightly different locations. As described above, these images can be used to calculate the otherwise lost 3-D dimension (depth). Although a bit more involved, 3-D information can also be recovered from two or more images taken from arbitrary locations.
As compared to our earlier posts (3-D Facial Reconstruction and Photogrammetry 101) in which I described how to estimate 3-D information from a single image, the technique described here works for arbitrary scenes and not just for planar surfaces.
Photographing a scene leads to a significant loss of information. Much of this useful information can be recovered by simply photographing the same scene from slighly different locations. The recovery of this 3-D information can be useful to a forensic analyst.


Reader Comments