Archive Page 2

3D Models from Photos

In Hyper-reality Offline – Creating Videos from Photos we saw how a 3D parallax style effect could be used to generate a 3D style effect from a single, static photograph and in Even if the Camera Never Lies, the Retouched Photo Might… we saw how the puppet warp tool could be used to manipulate photographic meshes to reshape photographed items within a digital photograph. But can we go further than that, and generate an actual 3D model from a photo?

In 2007, a Carnegie Mellon research project released an application called Fotowoosh that allowed users to generate a thee dimensional model from a single photograph:

Another approach to generating three-dimensional perspectives that appeared in 2007 built up a three dimensional model by stitching together multiple photographs of the same scene from multiple perspectives. The original Photo-Synth application from Microsoft allowed users to navigate through a series of overlaid but distinct photographs in order to navigate the compound scene.

The current version, which can be found at, generates a photorealistic 3D model that can be smoothly navigated through. Here’s an example:

karen long neck hill tribe by sonataroundtheworld on photosynth

Photosynth is very impressive, but is more concerned with generating navigable 3D panoramas from multiple photographs than constructing digital models from photographs, models that can then be manipulated as an animated digital object.

In 2013, researchers from Tsinghua University and Tel Aviv University demonstrated a more comprehensive tool modeling for generating 3D models from a single photo.

The Fotowoosh website has long since disappeared, and the 3-Sweep software is no longer available, but other applications in a similar vein have come along to replace them. For example, Smoothie-3D allows you to upload a photo and apply a mesh to it that can then be morphed into a 3D model.

So why not grab a coffee, find an appropriate photo, and see if you can create your own 3D model from it.

Smart Hearing

As we have already seen, there are several enabling technologies that need to be in place in order to put together an effective mediated reality system. In a visual augmented reality system, this includes having some sort of device for recording the visual scene, tracking objects within it, rendering augmented features in the scene and the some means of displaying the scene to the user. We reviewed a range of approaches for rendering augmented visual scenes in the post Taxonomies for Describing Mixed and Alternate Reality Systems, but how might we go about implementing an audio based mediated reality?

In Noise Cancellation – An Example of Mediated Audio Reality?, we saw how headphone based systems could be used to present a processed audio signal to a subject directly – a proximal form of mediation such as a head mounted display – or a speaker could be used to provide a more environmental form of mediation rather more akin to a projection based system in the visual sense.

Whilst enabling technologies for video based proximal AR systems are still at the clunky prototype stage, at best, discreet solutions for realtime, daily use, audio based mediation already exist, complemented in recent years by advanced digital signal processing techniques, in the form of hearing aids.

The following promotional video shows how far hearing aids have developed in recent years, moving from simple amplifiers to complex devices combing digital signal processing of the audio environment with integration with other audio generating devices, such as phones, radios and televisions.

To manage the range of features offered by such devices, they are complemented by full featured remote control apps that allow the user to control what they hear, as well as how they hear it – audio hyper-reality:

The following video review of the here “Active Listening” earbuds further demonstrates how “audio wearables” can provide a range of audio effects – and capabilities – that can augment the hearing of a wearer who does not necessarily suffer from hearing loss or some other form of hearing impairment. (If you’d rather read a review of the same device, the Vice Motherboard blog has one – These Earbuds Are Like Instagram Filters for Live Music.)

SAQ: What practical challenges face designers of in-ear, wirelessly connected audio devices?
Answer: I can think of two immediately: how is the wireless signal received (what sort of antenna is required?) and how is the device powered?

Customised frequency response profiles are also supported in some mobile phones. For example, top-end Samsung Android phones include a feature known as  Adapt Sound that allows a user to calibrate their phone’s headphone’s based on a frequency based hearing test (video example).

Hearing aids are typically comprised of several elements: an earpiece that transmits sound into the ear; a microphone that receives the sound; and amplifier that amplifies the sound; and a battery pack that powers the device. Digital hearing aids may also include remote control circuitry to allow the hearing aid to be controlled remotely; circuitry to support digital signal processing of the received sound; and even a wireless receiver capable of receiving and then replaying sound files or streams from a mobile phone or computer.

Digital hearing aids can be configured to tune the frequency response of the device to suit the needs of each individual user as the following video demonstrates.

Hearing aids come in a range of form factors – NHS Direct describes the following:

  • Behind-the-ear (BTE): rests behind the ear and sends sound into the ear through either an earmould or a small, soft tip (called an open fitting)
  • In-the-ear (ITE): sits in the ear canal and the shell of the ear
  • In-the-canal (ITC): working parts in the earmould, so the whole hearing aid fits inside the ear canal
  • Completely-in-the-canal (CIC): fits further into your ear canal than an ITC aid

Age UK further identify two forms of spectacle hearing aid systemsbone conduction devices and air conduction devices – that are suited to different forms of hearing loss:

With a conductive hearing loss there is some physical obstruction to conducting the sound through the outer ear, eardrum or middle ear (such as a wax blockage, or perforated eardrum). This can mean that the inner ear or nerve centre on that ear is in good shape, and by sending sound straight through the bone behind a patient’s ear the hearing loss can effectively be bypassed. Bone Conduction or “BC” spectacle hearing aids are ideal for this because a transducer is mounted in the arm of the glasses behind the ear that will transmit the sound through the bone to the inner ear instead of along the ear canal.

Sensorineural hearing loss occurs when the anatomical site responsible for the deficiency is the inner ear or further along the auditory pathway (such as age related loss or noise induced hearing loss). Delivering the sound via a route other than the ear canal will not help in these cases, so Air Conduction “AC” spectacle hearing aids are utilised with a traditional form of hearing aid discreetly mounted in the arm of the glasses and either an earmould or receiver with a soft dome in the ear canal.

The following video shows how the frames of digital hearing glasses can be used to package the components required to implement to hearing aid.

And the following promotional video shows in a little more detail how the glasses are put together – and how they are used in everyday life (with a full range of digital features included!).

EXERCISE: Read the following article from The Atlantic – “What My Hearing Aid Taught Me About the Future of Wearables”. What does the author think wearable devices need to offer to make the user want to wear them? How does the author’s experience of wearing a hearing aid colour his view of how wearable devices might develop in the near future?

Many people wear spectacles and/or hearing aids as part of their everyday life, “boosting” the perception of reality around them in particular ways in order to compensate for less than perfect eyesight or vision. Advances in hearing aids suggest that many hearing aid users may already be benefiting from reality augmentations that people without hearing difficulties may also value. And whilst wearing spectacles to correct for poor vision is a commonplace, it is possible to wear eyewear without a corrective function as a fashion item or accessory. Devices such as hearing spectacles already provide a means of combining battery powered, wifi connected audio as well as “passive” visual enhancements (corrective lenses). So might we start to see those sorts of device evolving as augmented reality headwear?

Interlude – Animated Colouring Books as An AR Jumping Off Point

Demonstrations such as the Augmented Reality Browser Demo show how browser based technologies can implement simple augmented reality demonstrations. By building on a browser’s ability to access connected camera feeds, we can reuse third party libraries to detect and track registration images contained within the video feed and 3D plotting libraries to render and overlay 3D objects on the tracked image in real time.

But what if we could also capture information from a modified registration image and use that as part of the rendered 3D model?

A research paper by Disney research – Live Texturing of Augmented Reality Characters from Colored Drawings [PDF] – presented at the International Symposium on Mixed and Augmented Reality (ISMAR 2015) describes “an augmented reality coloring book App in which children color characters in a printed coloring book and inspect their work using a mobile device”, since released as the Disney Color and Play app.

But Disney is not the only company exploring augmented reality colouring books…

Another app in a similar vein is produced by QuiverVision (coloring packs) and is available for iOS and Android devices.

And as you might expect, crayon companies are also keen on finding new ways to sell more crayons and have also been looking at augmented reality colouring books, as in the case of Crayola and their ColorALive app.

DO: grab a coffee and some coloured pen or pencils, install an augmented reality colouring book app, print off an AR colouring pack, then colour in your own 3D model. Magic!:-)

Now compare and contrast augmented reality applications in which a registration image, once captured, can be used to trigger a video effect or free running augmented reality animation with augmented reality applications in which a registration image of environmental feature must be detected and tracked continually in terms of the technology required to implement them, the extent to which they transform a visual scene and the uses to which each approach might be put. Try to think of one or two examples where one technique might be appropriate but the other would not when trying to achieve some sort of effect or meet some particular purpose.

Can You Really Believe Your Ears?

In Even if the Camera Never Lies, the Retouched Photo Might… we saw how photos could be retouched to provide an improved version of a visual reality, and in the interlude activity on Cleaning Audio Tracks With Audacity we saw how a simple audio processing tool could be used to clean up a slightly noisy audio track. In this post, we’ll see how particular audio signals can be modified in real time, if we have access to them individually.

Audio tracks recorded for music, film, television or radio are typically multi-track affairs, with each audio source having its own microphone and its own recording track. This allows each track to be processed separately, and then mixed with the other tracks to produce the final audio track. In a similar way, many graphic designs, as well as traditional animations, are constructed of multiple independent, overlaid layers.

Conceptually, the challenge of augmented reality may be likened to adding an additional layer to the visual or audio scene. In order to achieve an augmented reality effect, we might need to separate out a “flat” source such as mixed audio track of a video image into separate layers, one for each item of interest. The layer(s) corresponding to the item(s) of interest may then be augmented through the addition of an overlay layer onto each as required.

One way of thinking about visual augmented reality is to consider it in terms of inserting objects into the visual field, for example adding an animated monster into a scene, overlaying objects in some way, such as re-coloring or re-texturing them, or transforming them, for example by changing their shape.

EXERCISE: How might you modify an audio / sound based perceptual environment in each of these ways?

ANSWER: Inserted – add a new sound into the audio track, perhaps trying to locate it spatially in the stereo field. Overlaid – if you think of this in terms of texture, this might be like adding echo or reverb to a sound, although this is actually more like changing how we perceive the space the sound is located in. Transformed might be something like pitch-shifting the voice in real time, to make it sound higher pitched, or deeper. I’m not sure if things like noise cancellation would count as a “negative insertion” or a “cancelling overlay”?!

When audio sources are recorded using separate tracks, adding additional effects to them becomes a simple matter. It also provides us with an opportunity to “improve” the appearance of the audio track just as we might “improve” a photograph by retouching it.

Consider, for example, the problem of a singer who can’t sing in tune (much like the model with a bad complexion that needs “fixing” to meet the demands of a fashion magazine…). Can we fix that?

Indeed we can – part of the toolbox in any recording studio will be something that can correct for pitch and help retune an out-of-tune vocal performance.

But vocal performances can also be transformed in other ways, with an actor providing a voice performance, for example, that can then be transformed so that it sounds like a different person. For example, the MorphBox Voice Changer application allows you to create a range of voice profiles that can transform your voice into a range of other voice types.

Not surprising, as the computational power of smartphones increases, this sort of effect has made its way into novelty app form. Once again, it seems as if augmented reality novelty items are starting to appear all around us, even if we don’t necessarily think of them as such as first.

DO: if you have a smart-phone, see if you can find an voice modifying application for it. What features does it offer? TO what extent might you class it as an augmented reality application, and why?

Even if the Camera Never Lies, the Retouched Photo Might…

In Hyper-reality Offline – Creating Videos from Photos, we saw how a single flat image could be transformed in order to provide a range of video effects. In this post, we’ll review some of the other ways in which photographs of real objects may be transformed to slightly less real – or should that be hyper-real – objects, and consider some of the questions such manipulations raise about the authenticity of photographic images.

In 2014, an unsuccessful bill was introduced to the US House of Representatives that sought to introduce controls around “photoshopping”, based on the principle that “altered images [of models’ faces and bodies] can create distorted and unrealistic expectations and understandings of appropriate and healthy weight and body image” (Slate: Legislating Realism). The Bill reappeared in the 114th session of Congress in 2016 as H.R.4445 – Truth in Advertising Act of 2016:

This bill directs the Federal Trade Commission (FTC) to submit a report to Congress assessing the prevalence, in advertisements and other media for the promotion of commercial products and services in the United States, of images that have been altered to materially change the appearance and physical characteristics of the faces and bodies of the individuals depicted. The report must contain: (1) an evaluation of the degree to which such use of altered images may constitute an unfair or deceptive act or practice, (2) guidelines for advertisers regarding how the FTC determines whether such use constitutes an unfair or deceptive act or practice, and (3) recommendations reflecting a consensus of stakeholders and experts to reduce consumer harm arising from such use.

OPTIONAL READING: Forsey, Logan A., “Towards a Workable Rubric for Assessing Photoshop Liability” (2013). Law School Student Scholarship. Paper 222.

Photoshopping – the use of digital photograph editors such as Adobe Photoshop – is a form of reality mediation in which a photograph is transformed to “improve it”. In many cases, photoshopped images may be viewed as examples of “hyper-reality” in that no additional information over and above the scene captured in the photograph is introduced, but elements of the scene may be highlighted, or “tidied up”.

As you might expect, the intention behind many advertising campaigns is to present a product in the best possible light whilst at the same time not misrepresenting it, which makes the prospect of using photo-manipulation attractive. The following  promotional video from the marketers at McDonald’s Canada – Behind the scenes at a McDonald’s photo shoot – shows how set dressing and post-production photo manipulation are used to present the product in the best possible light, whilst still maintaining some claims about the “truthfulness” of the final image.

Undoubtedly, many food photographers manipulate reality whilst at the still time being able to argue that the final photograph is a “fair” representation of it. In a similar way, fashion photography relies on the manipulation of “real” models prior to the digital manipulation of their captured likenesses. A behind the scenes video – Dove Evolution – from beauty product manufacturer, Dove, shows just how much transformation of the human model is applied prior to a photo-shoot, as well as the how much digital manipulation is applied after it.

Let’s see in a little more detail how the images can be transformed. One common way of retouching photos is to map a photographed object, which may be a person, onto a two dimensional mesh. Nodes in the mesh map on to points of interest in the image. As the mesh is transformed by dragging around nodes in the mesh, so too is the image, with points of interest tracking the repositioned nodes with the image content in each cell, or grid element, of the mesh being transformed appropriately.

As well as transforming body shapes, faces can be retouched in a similar way.

Surrounded as we are each day by commercially produced images, it’s important to consider the range of ways in which reality might be manipulated before it is presented to us.

EXERCISE: even if we may be a little sceptical around claims of truth in advertising, we typically expect factual documentaries to present a true view of the world, for some definition of truth. In 2015, the BBC were called to account around a documentary that appeared to depict a particular sort of volcano eruption, but that was actually a composited sequence from more than one eruption. See if you can track down one or two reports of the episode. WHat charges did the critics lay, and how did the documentary makers respond? What sort of editorial guidelines does the BBC follow in the production of natural history film-making?


EXERCISE: As well as advertising and documentary making, news journalists may also be tempted to photoshop images for narrative or impact effect. Watch the following video, stopping and pausing the video where appropriate to note the different ways in which the photographs reviewed have been manipulated. To what extent, if any, do you think manipulations of that sort would be justifiable in a news reporting context? 

A post on the Hacker Factor blog – Body By Victoria – describes how a photographic image may be looked at forensically in order to find evidence of photoshopping.

As the Photoshop tools demonstrate, by mapping a photograph onto a mesh, transforming the mesh and then stretching pixel values or in-painting in a context sensitive way, photographic images can be reshaped whilst still maintaining some sort of integrity in the background, at least to the casual observer.

Increasingly, there are similarities between the tools used to create digital objects from scratch, and the tools used to manipulate real world objects captured into digital form. The computer uses a similar form of representation in each case – a mesh – and supports similar sorts of manipulation on each sort of object. As you will see in several other posts, the ability to create photorealistic objects as digital objects from scratch on the one hand, and the ability to capture the form, likeness and behaviour of a physical object into a digital form means that we are truly starting to blur the edges of reality around any image viewed through a screen.

Diminished Audio Reality – Removing a Vocal from a Musical Jingle

In the post Noise Cancellation – An Example of Mediated Audio Reality? we saw how background or intrusive environmental noise could be removed using noise cancelling headphones. In this post, you’ll learn a simple trick for diminishing an audio reality by removing a vocal track from a musical jingle.

Noise cancellation may be thought of adding the complement of everything that is not the desired signal component to an audio feed in order to remove the unwanted noise component. This same idea can be used as the basis of a crude attempt to remove a mono vocal signal from a stereo audio track by creating our own inverse of the vocal track and then subtracting it from the original mix.

SAQ: Describe an algorithm corresponding to the first part of  method suggested in the How to Remove Vocals from a Song Using Audacity video for removing a vocal track from stereo music track. How does the algorithm compare to the algorithm you described for the noise cancelling system?

SAQ: The technique described in the video relies on the track having a mono vocal signal and stereo backing track. The simple technique also lost some of the bass when the vocals were removed. How was the algorithm modified to try to preserve the bass component? How does the modification preserve the bass component? 

Recovering Audio from Video – But Not How You Might Expect…

 In The Art of Sound – Algorithmic Foley Artists?, we saw how researchers from MIT’s CSAIL Lab were able to train a system to try to recreate the sound of a silently videoed object being hit by a drumstick using a model based on video+sound recordings of lots of different sorts of objects being hit by a drumstick. In this post, we’ll see another way of recovering audio information from a purely visual capture of a visual scene, also developed at CSAIL.

Fans of Hollywood thrillers or surveillance-themed TV series may be familiar with the idea of laser microphones, in which laser light projected onto and reflected from a window can be used to track the vibrations of the window pane and record the audio of people talking behind the window.

Once the preserve of surveillance agencies, such devices can today be cobbled together in your garage using components retrieved from commodity electronics devices.

The technique used by the laser microphone is based on measuring vibrations caused by sound waves relating to the sound you want to record. Which suggests that if you can find other ways of tracking the vibrations, you should similarly be able to retrieve the audio. Which is exactly what the MIT CSAIL researchers did: by analysing video footage of objects that vibrated in sympathy (albeit minutely) to sounds in their environment, they were able to generate a recovered audio signal.

As the video shows, in the case of capturing a background musical track, whilst the audio was not necessarily the highest fidelity, by feeding the input into another application – such as Shazam, an application capable of recognising music tracks – the researchers were at least able to identify it automatically.

So not only can we create videos from still photographs, as described  in Hyper-reality Offline – Creating Videos from Photos, we can also recover audio from otherwise silent videos.