In The Photorealistic Effect… we saw how textures from photos could be overlaid onto 3D digital models as well as how digital models could be animated by human puppeteers: using motion capture to track the movement of articulation points on the human actor, this information could then be used to actuate similarly located points on the digital character mesh; and in 3D Models from Photos, we saw how textured 3D models could be “extruded” from single photograph by associating points on them with a mesh and then deforming the mesh in 3D space. In this post, we’ll explore further how the digital models themselves can be captured by scanning actual physical objects as well as by constructing models from photographic imagery.
We have already seen how markerless motion capture can be used to capture the motion of actors and objects in the real world in real time, and how video compositing techniques can be used to change the pictorial content of a digitally captured visual scene. But we can also use reality capture technologies to scan physical world objects, or otherwise generate three dimensional digital models of them.
Generating 3D Models from Photos
One way of generating a three dimensional model is to take a basis three dimensional mesh model and map it onto appropriate points in a photograph.
The following example shows an application called Faceworx in which textures from a front facing portrait and a side facing portrait are mapped onto a morphable mesh. The Smoothie-3d application described in 3D Models from Photos uses a related approach.
3D Models from Multiple Photos
Another way in which photographic imagery can be used to generate 3D models is to use techniques from photogrammetry, defined by Wikipedia as “the science of making measurements from photographs, especially for recovering the exact positions of surface points”. By using taking several photographs of the same object and identifying the same features in each of them, and then align the photographs, using the differential distances between features to model the three-dimensional character of the original objects.
DO: read the description of how the PhotoModeler application works: PhotoModeler – how it works. Similar mathematical techniques (triangulation and trilateration) can also be used to calculate distances in a wide variety of other contexts, such as finding the location of a mobile phone based on the signal strengths of three or more cell towers with known locations.
In the case of Intel RealSense devices, three separate camera components work together to capture the imagery (a traditional optical camera) and the distance to objects in the field of view (an infra-red camera and a small infra-red laser projector).
With their ability to capture distance-to-object measures as well as imagery, depth perceiving cameras represent an enabling technology that opens up a range of possibilities for application developers. For example, itseez3d is a tablet based application that works with the Structure Sensor to provide a simple 3D scanner application that can capture a 3D scan of a physical object as both a digital model and a corresponding texture.
Depth Perceiving Cameras and Markerless mocap
Depth perceiving cameras can also be used to capture facial models, as the FaceShift markerless motion capture studio shows.
Activity: according to the FAQ for the FaceShift Studio application shown in the video below, what cameras can be used to provide inputs to the FaceShift application?
Exercise: try to find one or two recent examples of augmented or mixed reality applications that make use of depth sensitive cameras and share links to them in the comments below. To what extent do the examples require the availability of the depth information in order for them to work?
Interactive Dynamic Video
Another approach to use video captures to create interactive models is a new technique developed by researchers Abe Davis, Justin G. Chen, and Fredo Durand at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) referred to as interactive dynamic video. In this technique, or few seconds (or minutes) of video are analysed to study the way a foreground object vibrates naturally, or when gently perturbed.
Rather than extracting a 3-dimensional model of the perturbed object, and then rendering that as a digital object, the object in the interactive is perturbed by constructing a “pyramid” mesh over the pixels on the video image itself (Davis, A., Chen, J.G. and Durand, F., 2015. Image-space modal bases for plausible manipulation of objects in video. ACM Transactions on Graphics (TOG), 34(6), p.239). That is, there is no “freestanding” 3D model of the object that can be perturbed. Instead, it exists as a dynamic, interactive model within the visual scene within which it is situated. (For a full list of related papers, see the Interactive Dynamic Video website.)
SAQ: to what extent, if any, is interactive dynamic video an example of an augmented reality technique? Explain your reasoning.
Adding this technique to our toolbox, along with the ability to generate simple videos from still photographs as described in Hyper-reality Offline – Creating Videos from Photos, we see how it is increasingly possibly to bring imagery alive simply through the manipulation of pixels, mapped as textures onto underlying structural meshes.