Archive for the 'Audio' Category

Smart Hearing

As we have already seen, there are several enabling technologies that need to be in place in order to put together an effective mediated reality system. In a visual augmented reality system, this includes having some sort of device for recording the visual scene, tracking objects within it, rendering augmented features in the scene and the some means of displaying the scene to the user. We reviewed a range of approaches for rendering augmented visual scenes in the post Taxonomies for Describing Mixed and Alternate Reality Systems, but how might we go about implementing an audio based mediated reality?

In Noise Cancellation – An Example of Mediated Audio Reality?, we saw how headphone based systems could be used to present a processed audio signal to a subject directly – a proximal form of mediation such as a head mounted display – or a speaker could be used to provide a more environmental form of mediation rather more akin to a projection based system in the visual sense.

Whilst enabling technologies for video based proximal AR systems are still at the clunky prototype stage, at best, discreet solutions for realtime, daily use, audio based mediation already exist, complemented in recent years by advanced digital signal processing techniques, in the form of hearing aids.

The following promotional video shows how far hearing aids have developed in recent years, moving from simple amplifiers to complex devices combing digital signal processing of the audio environment with integration with other audio generating devices, such as phones, radios and televisions.

To manage the range of features offered by such devices, they are complemented by full featured remote control apps that allow the user to control what they hear, as well as how they hear it – audio hyper-reality:

The following video review of the here “Active Listening” earbuds further demonstrates how “audio wearables” can provide a range of audio effects – and capabilities – that can augment the hearing of a wearer who does not necessarily suffer from hearing loss or some other form of hearing impairment. (If you’d rather read a review of the same device, the Vice Motherboard blog has one – These Earbuds Are Like Instagram Filters for Live Music.)

SAQ: What practical challenges face designers of in-ear, wirelessly connected audio devices?
Answer: I can think of two immediately: how is the wireless signal received (what sort of antenna is required?) and how is the device powered?

Customised frequency response profiles are also supported in some mobile phones. For example, top-end Samsung Android phones include a feature known as  Adapt Sound that allows a user to calibrate their phone’s headphone’s based on a frequency based hearing test (video example).

Hearing aids are typically comprised of several elements: an earpiece that transmits sound into the ear; a microphone that receives the sound; and amplifier that amplifies the sound; and a battery pack that powers the device. Digital hearing aids may also include remote control circuitry to allow the hearing aid to be controlled remotely; circuitry to support digital signal processing of the received sound; and even a wireless receiver capable of receiving and then replaying sound files or streams from a mobile phone or computer.

Digital hearing aids can be configured to tune the frequency response of the device to suit the needs of each individual user as the following video demonstrates.

Hearing aids come in a range of form factors – NHS Direct describes the following:

  • Behind-the-ear (BTE): rests behind the ear and sends sound into the ear through either an earmould or a small, soft tip (called an open fitting)
  • In-the-ear (ITE): sits in the ear canal and the shell of the ear
  • In-the-canal (ITC): working parts in the earmould, so the whole hearing aid fits inside the ear canal
  • Completely-in-the-canal (CIC): fits further into your ear canal than an ITC aid

Age UK further identify two forms of spectacle hearing aid systemsbone conduction devices and air conduction devices – that are suited to different forms of hearing loss:

With a conductive hearing loss there is some physical obstruction to conducting the sound through the outer ear, eardrum or middle ear (such as a wax blockage, or perforated eardrum). This can mean that the inner ear or nerve centre on that ear is in good shape, and by sending sound straight through the bone behind a patient’s ear the hearing loss can effectively be bypassed. Bone Conduction or “BC” spectacle hearing aids are ideal for this because a transducer is mounted in the arm of the glasses behind the ear that will transmit the sound through the bone to the inner ear instead of along the ear canal.

Sensorineural hearing loss occurs when the anatomical site responsible for the deficiency is the inner ear or further along the auditory pathway (such as age related loss or noise induced hearing loss). Delivering the sound via a route other than the ear canal will not help in these cases, so Air Conduction “AC” spectacle hearing aids are utilised with a traditional form of hearing aid discreetly mounted in the arm of the glasses and either an earmould or receiver with a soft dome in the ear canal.

The following video shows how the frames of digital hearing glasses can be used to package the components required to implement to hearing aid.

And the following promotional video shows in a little more detail how the glasses are put together – and how they are used in everyday life (with a full range of digital features included!).

EXERCISE: Read the following article from The Atlantic – “What My Hearing Aid Taught Me About the Future of Wearables”. What does the author think wearable devices need to offer to make the user want to wear them? How does the author’s experience of wearing a hearing aid colour his view of how wearable devices might develop in the near future?

Many people wear spectacles and/or hearing aids as part of their everyday life, “boosting” the perception of reality around them in particular ways in order to compensate for less than perfect eyesight or vision. Advances in hearing aids suggest that many hearing aid users may already be benefiting from reality augmentations that people without hearing difficulties may also value. And whilst wearing spectacles to correct for poor vision is a commonplace, it is possible to wear eyewear without a corrective function as a fashion item or accessory. Devices such as hearing spectacles already provide a means of combining battery powered, wifi connected audio as well as “passive” visual enhancements (corrective lenses). So might we start to see those sorts of device evolving as augmented reality headwear?

Can You Really Believe Your Ears?

In Even if the Camera Never Lies, the Retouched Photo Might… we saw how photos could be retouched to provide an improved version of a visual reality, and in the interlude activity on Cleaning Audio Tracks With Audacity we saw how a simple audio processing tool could be used to clean up a slightly noisy audio track. In this post, we’ll see how particular audio signals can be modified in real time, if we have access to them individually.

Audio tracks recorded for music, film, television or radio are typically multi-track affairs, with each audio source having its own microphone and its own recording track. This allows each track to be processed separately, and then mixed with the other tracks to produce the final audio track. In a similar way, many graphic designs, as well as traditional animations, are constructed of multiple independent, overlaid layers.

Conceptually, the challenge of augmented reality may be likened to adding an additional layer to the visual or audio scene. In order to achieve an augmented reality effect, we might need to separate out a “flat” source such as mixed audio track of a video image into separate layers, one for each item of interest. The layer(s) corresponding to the item(s) of interest may then be augmented through the addition of an overlay layer onto each as required.

One way of thinking about visual augmented reality is to consider it in terms of inserting objects into the visual field, for example adding an animated monster into a scene, overlaying objects in some way, such as re-coloring or re-texturing them, or transforming them, for example by changing their shape.

EXERCISE: How might you modify an audio / sound based perceptual environment in each of these ways?

ANSWER: Inserted – add a new sound into the audio track, perhaps trying to locate it spatially in the stereo field. Overlaid – if you think of this in terms of texture, this might be like adding echo or reverb to a sound, although this is actually more like changing how we perceive the space the sound is located in. Transformed might be something like pitch-shifting the voice in real time, to make it sound higher pitched, or deeper. I’m not sure if things like noise cancellation would count as a “negative insertion” or a “cancelling overlay”?!

When audio sources are recorded using separate tracks, adding additional effects to them becomes a simple matter. It also provides us with an opportunity to “improve” the appearance of the audio track just as we might “improve” a photograph by retouching it.

Consider, for example, the problem of a singer who can’t sing in tune (much like the model with a bad complexion that needs “fixing” to meet the demands of a fashion magazine…). Can we fix that?

Indeed we can – part of the toolbox in any recording studio will be something that can correct for pitch and help retune an out-of-tune vocal performance.

But vocal performances can also be transformed in other ways, with an actor providing a voice performance, for example, that can then be transformed so that it sounds like a different person. For example, the MorphBox Voice Changer application allows you to create a range of voice profiles that can transform your voice into a range of other voice types.

Not surprising, as the computational power of smartphones increases, this sort of effect has made its way into novelty app form. Once again, it seems as if augmented reality novelty items are starting to appear all around us, even if we don’t necessarily think of them as such as first.

DO: if you have a smart-phone, see if you can find an voice modifying application for it. What features does it offer? TO what extent might you class it as an augmented reality application, and why?

Diminished Audio Reality – Removing a Vocal from a Musical Jingle

In the post Noise Cancellation – An Example of Mediated Audio Reality? we saw how background or intrusive environmental noise could be removed using noise cancelling headphones. In this post, you’ll learn a simple trick for diminishing an audio reality by removing a vocal track from a musical jingle.

Noise cancellation may be thought of adding the complement of everything that is not the desired signal component to an audio feed in order to remove the unwanted noise component. This same idea can be used as the basis of a crude attempt to remove a mono vocal signal from a stereo audio track by creating our own inverse of the vocal track and then subtracting it from the original mix.

SAQ: Describe an algorithm corresponding to the first part of  method suggested in the How to Remove Vocals from a Song Using Audacity video for removing a vocal track from stereo music track. How does the algorithm compare to the algorithm you described for the noise cancelling system?

SAQ: The technique described in the video relies on the track having a mono vocal signal and stereo backing track. The simple technique also lost some of the bass when the vocals were removed. How was the algorithm modified to try to preserve the bass component? How does the modification preserve the bass component? 

Recovering Audio from Video – But Not How You Might Expect…

 In The Art of Sound – Algorithmic Foley Artists?, we saw how researchers from MIT’s CSAIL Lab were able to train a system to try to recreate the sound of a silently videoed object being hit by a drumstick using a model based on video+sound recordings of lots of different sorts of objects being hit by a drumstick. In this post, we’ll see another way of recovering audio information from a purely visual capture of a visual scene, also developed at CSAIL.

Fans of Hollywood thrillers or surveillance-themed TV series may be familiar with the idea of laser microphones, in which laser light projected onto and reflected from a window can be used to track the vibrations of the window pane and record the audio of people talking behind the window.

Once the preserve of surveillance agencies, such devices can today be cobbled together in your garage using components retrieved from commodity electronics devices.

The technique used by the laser microphone is based on measuring vibrations caused by sound waves relating to the sound you want to record. Which suggests that if you can find other ways of tracking the vibrations, you should similarly be able to retrieve the audio. Which is exactly what the MIT CSAIL researchers did: by analysing video footage of objects that vibrated in sympathy (albeit minutely) to sounds in their environment, they were able to generate a recovered audio signal.

As the video shows, in the case of capturing a background musical track, whilst the audio was not necessarily the highest fidelity, by feeding the input into another application – such as Shazam, an application capable of recognising music tracks – the researchers were at least able to identify it automatically.

So not only can we create videos from still photographs, as described  in Hyper-reality Offline – Creating Videos from Photos, we can also recover audio from otherwise silent videos.

Interlude – Cleaning Audio Tracks With Audacity

Noise cancelling headphones remove background noise by comparing a desired signal to a perceived signal and removing the unwanted components. So for noisy situations where we don’t have access to the clean signal, are we stuck with just the noisy signal?

Not necessarily.

Audio editing tools like Audacity can also be used to remove constant background noise from an audio track by building a simple model of the noise component and then removing it from the audio track.

The following tutorial shows how a low level of background noise may be attenuated by generating a model of the baseline noise on a supposedly quiet part of an audio track and then removing it from the whole of the track. (The effect referred to as Noise Removal in the following video has been renamed Noise Reduction in more recent versions of Audacity.)

SAQ: As the speaker records his test audio track, we see Audacity visualising the waveform in real time. To what extent might we consider this a form of augmented reality?

Other filters can be used to remove noise components with a different frequency profile such as the “pops” and “clicks” you might hear on a recording made from a vinyl record.

In each of the above examples, Audacity’s visual representation of the audio waveform, creating a visual reality from an audio one. This reinforces through visualisation what the original problems were with the audio signals and the consequences of applying the particular audio effect when trying to clean them.

DO: if you have a noisy audio file to hand and fancy trying to clean it up, why not try out the techniques shown in the videos above – or see if you can find any more related tutorials.

Noise Cancellation – An Example of Mediated Audio Reality?

Whilst it is tempting to focus on the realtime processing of visual imagery when considering augmented reality, notwithstanding the tricky problem of inserting a transparent display between the viewer and the physical scene when using magic lens approaches, it may be that the real benefits of augmented reality will arise from the augmentation or realtime manipulation of another modality such as sound.

EXERCISE: describe two or three examples of how audio may be used, or transformed, to alter a user’s perception or understanding of their current environment.

ANSWER: car navigation systems augment spatial location with audio messages describing when to turn and audio guides in heritage settings, where you can listen to a story that “augments” a particular location. Noise cancelling earphones transform the environment by subtracting, or tuning out, background noise and modern digital hearing aids process the audio environment at a personal level in increasingly rich ways.

Noise Cancellation

As briefly described in Blurred Edges – Dual Reality, mediated reality is a general term in which information may be added to or subtracted from a real world scene. In many industrial and everyday settings, intrusive environmental noise may lead to an unpleasant work environment, or act as an obstacle to audio communication. In such situations, it might be convenient to remove the background noise and expose the subjects within it to a mediated audio reality.

Noise cancellation provides one such form of mediated reality, where the audio environment is actively “cleaned” of an otherwise intrusive noise component. Noise cancellation technology can be use to cancel out intrusive noise in noisy environments, such as cars or aircraft. By removing noisy components from the real world audio, noise cancellation may be thought of as producing a form of diminished reality, in the sense that environmental components have ben lost, rather than added to, even though the overall salient signal to noise ration may have increased.

Noise cancelled environments might also be considered as a form of hyper-reality, in the sense that no information other than that contained within, or derived from, the original signal is presented as part of the “augmented” experience.

EXERCISE: watch the following videos that demonstrate the effect of noise cancelling headphones and that describe how they work, then answer the following questions:

  • how does “active” noise cancellation differ from passive noise cancellation?
  • what sorts of noise are active noise cancellation systems most effective at removing, and why?
  • what sort of system can be used to test or demonstrate the effectiveness of noise cancelling headphones?

Finally, write down an algorithm that describes, in simple terms, the steps involved in a simple noise cancelling system.

EXERCISE: Increasingly, top end cars may include some sort of noise cancellation system to reduce the effects of road noise. How might noise cancellation be used, or modified, to cancel noise in an enclosed environment where headphones are not typically worn, such as when sat inside a car?

Rather than presenting the mixed audio signal to a listener via headphones, under some circumstances speakers may be used to cancel the noise as experienced within a more open environment.

As well as improving the experience of someone listening to music in a noisy environment, noise cancellation techniques can also be useful as part of a hearing aid for hard of hearing users. One of the major aims of hearing aid manufacturers is to improve the audibility of speech – can noise cancellation help here?

EXERCISE: read the articles – and watch/listen to the associated videos – Noise Reduction Systems and Reverb Reduction produced by hearing aid manufacturer Sonic. What sorts of audio reality mediation are described?

It may seem strange to you to think of hearing aids as augmented, or more generally, mediated, reality devices, but their realtime processing and representation of the user’s current environment suggests this is exactly what they are!

In the next post on this theme, we will explore what sorts of physical device or apparatus can be used to mediate audio realities. But for now, let’s go back to the visual domain…

The Art of Sound – Algorithmic Foley Artists?

As well as being a visual medium, films also rely on a rich audio environment to communicate emotion and affect (sic). In some cases, it may not be possible to capture the sound associated with a particular action, either because of noise in the environment (literally), or because the props themselves do not have the physical properties of the thing they portray. For example, two wooden swords used in a sword fight that are painted to look like metal would not sound like metal swords when coming in contact to each other. When a film is dubbed, and the original speech recording replaced by a post-production recording, any original sound effects also need to be replaced.

Foley artists add sounds to a film in post-production (that is, after the film has been shot). As foley artist John Roesch describes, whatever we see on that screen, we are making the most honest representation thereof, sonically (“Where the Sounds From the World’s Favorite Movies Are Born“, Wired, 0m42s).

One of the aims of the foley artist is to represent the sounds that the viewer expects to hear when watching a particular scene. As Roesch says of his approach, “when I look at a scene, I hear the sounds in my head” (0m48s). So can a visual analysis of the scene be used to identify material interactions and then automatically generate sounds corresponding to our expectations of what those interactions should sound like?

This question was recently asked by a group of MIT researchers (Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. “Visually Indicated Sounds.” arXiv preprint arXiv:1512.08512 [PDF] (2015)) and summarised in the MIT News article “Artificial intelligence produces realistic sounds that fool humans“.

“On many occasions, … sounds are not just statistically associated with the content of the images – the way, for example, that the sounds of unseen seagulls are associated with a view of a beach – but instead are directly caused by the physical interaction being depicted: you see what is making the sound. We call these events visually indicated sounds, and we propose the task of predicting sound from videos as a way to study physical interactions within a visual scene. To accurately predict a video’s held-out soundtrack, an algorithm has to know about the physical properties of what it is seeing and the actions that are being performed. This task implicitly requires material recognition…”

In their study, the team trained an algorithm using thousands of videos of a drum stick interacting with a wide variety of material objects in an attempt to associate particular with sounds with different materials, as well as the mode of interaction (hitting, scraping, and so on).

The next step was then to show the algorithm a silent video, and see if it could generate an appropriate soundtrack, in effect acting as a synthetic foley artist (Visually-Indicated Sounds, MITCSAIL).

 

SAQ: to what extent do you think foley artists like John Roesch might be replaced by algorithms?

Answer: whilst the MIT demo is an interesting one, it is currently limited to a known object – the drumstick – interacting with an arbitrary object. The video showed how even then, the algorithm occasionally misinterpreted the sort of interaction being demonstrated (e.g. mistaking a hit). For a complete system, the algorithm would have to identify both materials involved in the interaction, as well as the sort of interaction, and synthesize an appropriate sound. If the same sort of training method was used for this more general sort of system, I think it would be unlikely that a large enough corpus of training videos could be created (material X interacts with material Y in interaction Z) to provide a reliable training set. In addition, as foley artist John Roesch pointed out, “what you see is not necessarily what you get” (1m31s)…!


Categories