Can You Really Believe Your Ears?

In Even if the Camera Never Lies, the Retouched Photo Might… we saw how photos could be retouched to provide an improved version of a visual reality, and in the interlude activity on Cleaning Audio Tracks With Audacity we saw how a simple audio processing tool could be used to clean up a slightly noisy audio track. In this post, we’ll see how particular audio signals can be modified in real time, if we have access to them individually.

Audio tracks recorded for music, film, television or radio are typically multi-track affairs, with each audio source having its own microphone and its own recording track. This allows each track to be processed separately, and then mixed with the other tracks to produce the final audio track. In a similar way, many graphic designs, as well as traditional animations, are constructed of multiple independent, overlaid layers.

Conceptually, the challenge of augmented reality may be likened to adding an additional layer to the visual or audio scene. In order to achieve an augmented reality effect, we might need to separate out a “flat” source such as mixed audio track of a video image into separate layers, one for each item of interest. The layer(s) corresponding to the item(s) of interest may then be augmented through the addition of an overlay layer onto each as required.

One way of thinking about visual augmented reality is to consider it in terms of inserting objects into the visual field, for example adding an animated monster into a scene, overlaying objects in some way, such as re-coloring or re-texturing them, or transforming them, for example by changing their shape.

EXERCISE: How might you modify an audio / sound based perceptual environment in each of these ways?

ANSWER: Inserted – add a new sound into the audio track, perhaps trying to locate it spatially in the stereo field. Overlaid – if you think of this in terms of texture, this might be like adding echo or reverb to a sound, although this is actually more like changing how we perceive the space the sound is located in. Transformed might be something like pitch-shifting the voice in real time, to make it sound higher pitched, or deeper. I’m not sure if things like noise cancellation would count as a “negative insertion” or a “cancelling overlay”?!

When audio sources are recorded using separate tracks, adding additional effects to them becomes a simple matter. It also provides us with an opportunity to “improve” the appearance of the audio track just as we might “improve” a photograph by retouching it.

Consider, for example, the problem of a singer who can’t sing in tune (much like the model with a bad complexion that needs “fixing” to meet the demands of a fashion magazine…). Can we fix that?

Indeed we can – part of the toolbox in any recording studio will be something that can correct for pitch and help retune an out-of-tune vocal performance.

For an interesting read on Auto-Tune, an industry standard pitch correction tool, see The Mathematical Genius of Auto-Tune.

But vocal performances can also be transformed in other ways, with an actor providing a voice performance, for example, that can then be transformed so that it sounds like a different person. For example, the MorphBox Voice Changer application allows you to create a range of voice profiles that can transform your voice into a range of other voice types.

Not surprising, as the computational power of smartphones increases, this sort of effect has made its way into novelty app form. Once again, it seems as if augmented reality novelty items are starting to appear all around us, even if we don’t necessarily think of them as such as first.

DO: if you have a smart-phone, see if you can find an voice modifying application for it. What features does it offer? TO what extent might you class it as an augmented reality application, and why?

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.




Categories