Archive for the 'Audio' Category

Photoshopping Audio…

By now, we’re all familiar with the idea that images can be manipulated – “photoshopped” – to modify a depicted scene in some way (for example, Even if the Camera Never Lies, the Retouched Photo Might…). Vocal representations can be modified using audio process techniques such as pitchshifting, and traditional audio editing techniques such as cutting and splicing can be used to edit audio files and create “spoken” sentences that have never been uttered before by reordering separately cut words.

But what if we could identify both the words spoken by an actor, and model their voice, so that we could edit out their mistakes, or literally put our own words in their mouths, by changing a written text that is then used to generate the soundtrack?

A demonstration of a new technique for editing audio was demonstrated by Adobe in late 2016 that does exactly that. An audio track is used to generate both a speech generating model and a text to speech track. This allows the text track to edited, not just in terms of rearranging the order of the originally spoken words, but also inserting new words.

Not surprisingly, the technique could raise concern about the “evidential” quality of recorded speech.

EXERCISE: Read the contemporaneous report of the Adoboe VoCo demonstration from the BBC News website “Adobe Voco ‘Photoshop-for-voice’ causes concern“. What concerns are raised in the report? What other concerns, if any, do you think this sort of technology raises?

The technique was reported in more detail in a SIGGRAPH 2017 paper:

An associated paper –  Zeyu Jin, Gautham J. Mysore, Stephen DiVerdi, Jingwan Lu, and Adam Finkelstein. VoCo: Text-based Insertion and Replacement in Audio Narration. ACM Transactions on Graphics 36(4): 96, 13 pages, July 2017 – describes the technique as follows:

Editing audio narration using conventional software typically involves many painstaking low-level manipulations. Some state of the art systems allow the editor to work in a text transcript of the narration, and perform select, cut, copy and paste operations directly in the transcript; these operations are then automatically applied to the waveform in a straightforward manner. However, an obvious gap in the text-based interface is the ability to type new words not appearing in the transcript, for example inserting a new word for emphasis or replacing a misspoken word. While high-quality voice synthesizers exist today, the challenge is to synthesize the new word in a voice that matches the rest of the narration. This paper presents a system that can synthesize a new word or short phrase such that it blends seamlessly in the context of the existing narration. Our approach is to use a text to speech synthesizer to say the word in a generic voice, and then use voice conversion to convert it into a voice that matches the narration. Offering a range of degrees of control to the editor, our interface supports fully automatic synthesis, selection among a candidate set of alternative pronunciations, fine control over edit placements and pitch profiles, and even guidance by the editors own voice. The paper presents studies showing that the output of our method is preferred over baseline methods and often indistinguishable from the original voice.

Voice Capture and Modelling

A key part of the Adobe VoCo approach is the creation of a voice model that can be used to generate utterances that sound like the spoken words of the person whose voice has been modelled, a technique we might think of in terms of “voice capture and modelling”. As the algorithms improve, the technique is likely to become more widely available, as suggested by other companies developing demonstrations in this area.

For example, start-up company Lyrebird have already demonstrated a service that will model a human voice from one minute’s worth of voice capture, and allow you to create arbitrary utterances from text spoken using that voice.

Read more about Lyrebird in the Scientific American article New AI Tech Can Mimic Any Voice by Bahar Gholipour.

Lip Synching Video – When Did You Say That?

The ability to use captured voice models to generate narrated tracks works fine for radio, but what about if you wanted to actually see the actor “speak” those words? By generating a facial model of a speaker, it is possible to use a video representation of an individual as a puppet whose facial movements can be acted by someone else, a technique described as facial re-enactment (Thies, Justus, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. “Real-time expression transfer for facial reenactment“, ACM Trans. Graph. 34, no. 6 (2015): 183-1).

Facial re-enactment involves morphing features or areas from one face onto corresponding elements of another, and then driving a view of the second face from motion capture of the first.

But what if we could generate a model of the face that allowed facial gestures, such as lip movements, to be captured at the same time as an audio track, and then use the audio (and lip capture) from one recording to “lipsync” the same actor speaking those same words in another setting?

The technique, described in Suwajanakorn, Supasorn, Steven M. Seitz, and Ira Kemelmacher-Shlizerman. “Synthesizing Obama: learning lip sync from audio,” ACM Transactions on Graphics (TOG) 36.4 (2017): 95, describes the process as follows: audio and sparse mouth shape features from one video are associated using a neural network. The sparse mouth shape is then used to synthesize a texture for the mouth and lower region of the face that can be blended onto a second, stock video of the same person, and the jaw shapes aligned.

For now, the approach is limited to transposing the spoken words from a video recording of a person speaking one time to a second video of them. As one of the researchers, Steven Seitz, is quoted in Lip-syncing Obama: New tools turn audio clips into realistic video“[y]ou can’t just take anyone’s voice and turn it into an Obama video. We very consciously decided against going down the path of putting other people’s words into someone’s mouth. We’re simply taking real words that someone spoke and turning them into realistic video of that individual.”


Smart Hearing

As we have already seen, there are several enabling technologies that need to be in place in order to put together an effective mediated reality system. In a visual augmented reality system, this includes having some sort of device for recording the visual scene, tracking objects within it, rendering augmented features in the scene and the some means of displaying the scene to the user. We reviewed a range of approaches for rendering augmented visual scenes in the post Taxonomies for Describing Mixed and Alternate Reality Systems, but how might we go about implementing an audio based mediated reality?

In Noise Cancellation – An Example of Mediated Audio Reality?, we saw how headphone based systems could be used to present a processed audio signal to a subject directly – a proximal form of mediation such as a head mounted display – or a speaker could be used to provide a more environmental form of mediation rather more akin to a projection based system in the visual sense.

Whilst enabling technologies for video based proximal AR systems are still at the clunky prototype stage, at best, discreet solutions for realtime, daily use, audio based mediation already exist, complemented in recent years by advanced digital signal processing techniques, in the form of hearing aids.

The following promotional video shows how far hearing aids have developed in recent years, moving from simple amplifiers to complex devices combing digital signal processing of the audio environment with integration with other audio generating devices, such as phones, radios and televisions.

To manage the range of features offered by such devices, they are complemented by full featured remote control apps that allow the user to control what they hear, as well as how they hear it – audio hyper-reality:

The following video review of the here “Active Listening” earbuds further demonstrates how “audio wearables” can provide a range of audio effects – and capabilities – that can augment the hearing of a wearer who does not necessarily suffer from hearing loss or some other form of hearing impairment. (If you’d rather read a review of the same device, the Vice Motherboard blog has one – These Earbuds Are Like Instagram Filters for Live Music.)

SAQ: What practical challenges face designers of in-ear, wirelessly connected audio devices?
Answer: I can think of two immediately: how is the wireless signal received (what sort of antenna is required?) and how is the device powered?

Customised frequency response profiles are also supported in some mobile phones. For example, top-end Samsung Android phones include a feature known as  Adapt Sound that allows a user to calibrate their phone’s headphone’s based on a frequency based hearing test (video example).

Hearing aids are typically comprised of several elements: an earpiece that transmits sound into the ear; a microphone that receives the sound; and amplifier that amplifies the sound; and a battery pack that powers the device. Digital hearing aids may also include remote control circuitry to allow the hearing aid to be controlled remotely; circuitry to support digital signal processing of the received sound; and even a wireless receiver capable of receiving and then replaying sound files or streams from a mobile phone or computer.

Digital hearing aids can be configured to tune the frequency response of the device to suit the needs of each individual user as the following video demonstrates.

Hearing aids come in a range of form factors – NHS Direct describes the following:

  • Behind-the-ear (BTE): rests behind the ear and sends sound into the ear through either an earmould or a small, soft tip (called an open fitting)
  • In-the-ear (ITE): sits in the ear canal and the shell of the ear
  • In-the-canal (ITC): working parts in the earmould, so the whole hearing aid fits inside the ear canal
  • Completely-in-the-canal (CIC): fits further into your ear canal than an ITC aid

Age UK further identify two forms of spectacle hearing aid systemsbone conduction devices and air conduction devices – that are suited to different forms of hearing loss:

With a conductive hearing loss there is some physical obstruction to conducting the sound through the outer ear, eardrum or middle ear (such as a wax blockage, or perforated eardrum). This can mean that the inner ear or nerve centre on that ear is in good shape, and by sending sound straight through the bone behind a patient’s ear the hearing loss can effectively be bypassed. Bone Conduction or “BC” spectacle hearing aids are ideal for this because a transducer is mounted in the arm of the glasses behind the ear that will transmit the sound through the bone to the inner ear instead of along the ear canal.

Sensorineural hearing loss occurs when the anatomical site responsible for the deficiency is the inner ear or further along the auditory pathway (such as age related loss or noise induced hearing loss). Delivering the sound via a route other than the ear canal will not help in these cases, so Air Conduction “AC” spectacle hearing aids are utilised with a traditional form of hearing aid discreetly mounted in the arm of the glasses and either an earmould or receiver with a soft dome in the ear canal.

The following video shows how the frames of digital hearing glasses can be used to package the components required to implement to hearing aid.

And the following promotional video shows in a little more detail how the glasses are put together – and how they are used in everyday life (with a full range of digital features included!).

EXERCISE: Read the following article from The Atlantic – “What My Hearing Aid Taught Me About the Future of Wearables”. What does the author think wearable devices need to offer to make the user want to wear them? How does the author’s experience of wearing a hearing aid colour his view of how wearable devices might develop in the near future?

Many people wear spectacles and/or hearing aids as part of their everyday life, “boosting” the perception of reality around them in particular ways in order to compensate for less than perfect eyesight or vision. Advances in hearing aids suggest that many hearing aid users may already be benefiting from reality augmentations that people without hearing difficulties may also value. And whilst wearing spectacles to correct for poor vision is a commonplace, it is possible to wear eyewear without a corrective function as a fashion item or accessory. Devices such as hearing spectacles already provide a means of combining battery powered, wifi connected audio as well as “passive” visual enhancements (corrective lenses). So might we start to see those sorts of device evolving as augmented reality headwear?

Can You Really Believe Your Ears?

In Even if the Camera Never Lies, the Retouched Photo Might… we saw how photos could be retouched to provide an improved version of a visual reality, and in the interlude activity on Cleaning Audio Tracks With Audacity we saw how a simple audio processing tool could be used to clean up a slightly noisy audio track. In this post, we’ll see how particular audio signals can be modified in real time, if we have access to them individually.

Audio tracks recorded for music, film, television or radio are typically multi-track affairs, with each audio source having its own microphone and its own recording track. This allows each track to be processed separately, and then mixed with the other tracks to produce the final audio track. In a similar way, many graphic designs, as well as traditional animations, are constructed of multiple independent, overlaid layers.

Conceptually, the challenge of augmented reality may be likened to adding an additional layer to the visual or audio scene. In order to achieve an augmented reality effect, we might need to separate out a “flat” source such as mixed audio track of a video image into separate layers, one for each item of interest. The layer(s) corresponding to the item(s) of interest may then be augmented through the addition of an overlay layer onto each as required.

One way of thinking about visual augmented reality is to consider it in terms of inserting objects into the visual field, for example adding an animated monster into a scene, overlaying objects in some way, such as re-coloring or re-texturing them, or transforming them, for example by changing their shape.

EXERCISE: How might you modify an audio / sound based perceptual environment in each of these ways?

ANSWER: Inserted – add a new sound into the audio track, perhaps trying to locate it spatially in the stereo field. Overlaid – if you think of this in terms of texture, this might be like adding echo or reverb to a sound, although this is actually more like changing how we perceive the space the sound is located in. Transformed might be something like pitch-shifting the voice in real time, to make it sound higher pitched, or deeper. I’m not sure if things like noise cancellation would count as a “negative insertion” or a “cancelling overlay”?!

When audio sources are recorded using separate tracks, adding additional effects to them becomes a simple matter. It also provides us with an opportunity to “improve” the appearance of the audio track just as we might “improve” a photograph by retouching it.

Consider, for example, the problem of a singer who can’t sing in tune (much like the model with a bad complexion that needs “fixing” to meet the demands of a fashion magazine…). Can we fix that?

Indeed we can – part of the toolbox in any recording studio will be something that can correct for pitch and help retune an out-of-tune vocal performance.

For an interesting read on Auto-Tune, an industry standard pitch correction tool, see The Mathematical Genius of Auto-Tune.

But vocal performances can also be transformed in other ways, with an actor providing a voice performance, for example, that can then be transformed so that it sounds like a different person. For example, the MorphBox Voice Changer application allows you to create a range of voice profiles that can transform your voice into a range of other voice types.

Not surprising, as the computational power of smartphones increases, this sort of effect has made its way into novelty app form. Once again, it seems as if augmented reality novelty items are starting to appear all around us, even if we don’t necessarily think of them as such as first.

DO: if you have a smart-phone, see if you can find an voice modifying application for it. What features does it offer? TO what extent might you class it as an augmented reality application, and why?

Diminished Audio Reality – Removing a Vocal from a Musical Jingle

In the post Noise Cancellation – An Example of Mediated Audio Reality? we saw how background or intrusive environmental noise could be removed using noise cancelling headphones. In this post, you’ll learn a simple trick for diminishing an audio reality by removing a vocal track from a musical jingle.

Noise cancellation may be thought of adding the complement of everything that is not the desired signal component to an audio feed in order to remove the unwanted noise component. This same idea can be used as the basis of a crude attempt to remove a mono vocal signal from a stereo audio track by creating our own inverse of the vocal track and then subtracting it from the original mix.

SAQ: Describe an algorithm corresponding to the first part of  method suggested in the How to Remove Vocals from a Song Using Audacity video for removing a vocal track from stereo music track. How does the algorithm compare to the algorithm you described for the noise cancelling system?

SAQ: The technique described in the video relies on the track having a mono vocal signal and stereo backing track. The simple technique also lost some of the bass when the vocals were removed. How was the algorithm modified to try to preserve the bass component? How does the modification preserve the bass component? 

Recovering Audio from Video – But Not How You Might Expect…

 In The Art of Sound – Algorithmic Foley Artists?, we saw how researchers from MIT’s CSAIL Lab were able to train a system to try to recreate the sound of a silently videoed object being hit by a drumstick using a model based on video+sound recordings of lots of different sorts of objects being hit by a drumstick. In this post, we’ll see another way of recovering audio information from a purely visual capture of a visual scene, also developed at CSAIL.

Fans of Hollywood thrillers or surveillance-themed TV series may be familiar with the idea of laser microphones, in which laser light projected onto and reflected from a window can be used to track the vibrations of the window pane and record the audio of people talking behind the window.

Once the preserve of surveillance agencies, such devices can today be cobbled together in your garage using components retrieved from commodity electronics devices.

The technique used by the laser microphone is based on measuring vibrations caused by sound waves relating to the sound you want to record. Which suggests that if you can find other ways of tracking the vibrations, you should similarly be able to retrieve the audio. Which is exactly what the MIT CSAIL researchers did: by analysing video footage of objects that vibrated in sympathy (albeit minutely) to sounds in their environment, they were able to generate a recovered audio signal.

As the video shows, in the case of capturing a background musical track, whilst the audio was not necessarily the highest fidelity, by feeding the input into another application – such as Shazam, an application capable of recognising music tracks – the researchers were at least able to identify it automatically.

So not only can we create videos from still photographs, as described  in Hyper-reality Offline – Creating Videos from Photos, we can also recover audio from otherwise silent videos.

Interlude – Cleaning Audio Tracks With Audacity

Noise cancelling headphones remove background noise by comparing a desired signal to a perceived signal and removing the unwanted components. So for noisy situations where we don’t have access to the clean signal, are we stuck with just the noisy signal?

Not necessarily.

Audio editing tools like Audacity can also be used to remove constant background noise from an audio track by building a simple model of the noise component and then removing it from the audio track.

The following tutorial shows how a low level of background noise may be attenuated by generating a model of the baseline noise on a supposedly quiet part of an audio track and then removing it from the whole of the track. (The effect referred to as Noise Removal in the following video has been renamed Noise Reduction in more recent versions of Audacity.)

SAQ: As the speaker records his test audio track, we see Audacity visualising the waveform in real time. To what extent might we consider this a form of augmented reality?

Other filters can be used to remove noise components with a different frequency profile such as the “pops” and “clicks” you might hear on a recording made from a vinyl record.

In each of the above examples, Audacity’s visual representation of the audio waveform, creating a visual reality from an audio one. This reinforces through visualisation what the original problems were with the audio signals and the consequences of applying the particular audio effect when trying to clean them.

DO: if you have a noisy audio file to hand and fancy trying to clean it up, why not try out the techniques shown in the videos above – or see if you can find any more related tutorials.

Noise Cancellation – An Example of Mediated Audio Reality?

Whilst it is tempting to focus on the realtime processing of visual imagery when considering augmented reality, notwithstanding the tricky problem of inserting a transparent display between the viewer and the physical scene when using magic lens approaches, it may be that the real benefits of augmented reality will arise from the augmentation or realtime manipulation of another modality such as sound.

EXERCISE: describe two or three examples of how audio may be used, or transformed, to alter a user’s perception or understanding of their current environment.

ANSWER: car navigation systems augment spatial location with audio messages describing when to turn and audio guides in heritage settings, where you can listen to a story that “augments” a particular location. Noise cancelling earphones transform the environment by subtracting, or tuning out, background noise and modern digital hearing aids process the audio environment at a personal level in increasingly rich ways.

Noise Cancellation

As briefly described in Blurred Edges – Dual Reality, mediated reality is a general term in which information may be added to or subtracted from a real world scene. In many industrial and everyday settings, intrusive environmental noise may lead to an unpleasant work environment, or act as an obstacle to audio communication. In such situations, it might be convenient to remove the background noise and expose the subjects within it to a mediated audio reality.

Noise cancellation provides one such form of mediated reality, where the audio environment is actively “cleaned” of an otherwise intrusive noise component. Noise cancellation technology can be use to cancel out intrusive noise in noisy environments, such as cars or aircraft. By removing noisy components from the real world audio, noise cancellation may be thought of as producing a form of diminished reality, in the sense that environmental components have ben lost, rather than added to, even though the overall salient signal to noise ration may have increased.

Noise cancelled environments might also be considered as a form of hyper-reality, in the sense that no information other than that contained within, or derived from, the original signal is presented as part of the “augmented” experience.

EXERCISE: watch the following videos that demonstrate the effect of noise cancelling headphones and that describe how they work, then answer the following questions:

  • how does “active” noise cancellation differ from passive noise cancellation?
  • what sorts of noise are active noise cancellation systems most effective at removing, and why?
  • what sort of system can be used to test or demonstrate the effectiveness of noise cancelling headphones?

Finally, write down an algorithm that describes, in simple terms, the steps involved in a simple noise cancelling system.

EXERCISE: Increasingly, top end cars may include some sort of noise cancellation system to reduce the effects of road noise. How might noise cancellation be used, or modified, to cancel noise in an enclosed environment where headphones are not typically worn, such as when sat inside a car?

Rather than presenting the mixed audio signal to a listener via headphones, under some circumstances speakers may be used to cancel the noise as experienced within a more open environment.

As well as improving the experience of someone listening to music in a noisy environment, noise cancellation techniques can also be useful as part of a hearing aid for hard of hearing users. One of the major aims of hearing aid manufacturers is to improve the audibility of speech – can noise cancellation help here?

EXERCISE: read the articles – and watch/listen to the associated videos – Noise Reduction Systems and Reverb Reduction produced by hearing aid manufacturer Sonic. What sorts of audio reality mediation are described?

It may seem strange to you to think of hearing aids as augmented, or more generally, mediated, reality devices, but their realtime processing and representation of the user’s current environment suggests this is exactly what they are!

In the next post on this theme, we will explore what sorts of physical device or apparatus can be used to mediate audio realities. But for now, let’s go back to the visual domain…

The Art of Sound – Algorithmic Foley Artists?

As well as being a visual medium, films also rely on a rich audio environment to communicate emotion and affect (sic). In some cases, it may not be possible to capture the sound associated with a particular action, either because of noise in the environment (literally), or because the props themselves do not have the physical properties of the thing they portray. For example, two wooden swords used in a sword fight that are painted to look like metal would not sound like metal swords when coming in contact to each other. When a film is dubbed, and the original speech recording replaced by a post-production recording, any original sound effects also need to be replaced.

Foley artists add sounds to a film in post-production (that is, after the film has been shot). As foley artist John Roesch describes, whatever we see on that screen, we are making the most honest representation thereof, sonically (“Where the Sounds From the World’s Favorite Movies Are Born“, Wired, 0m42s).

One of the aims of the foley artist is to represent the sounds that the viewer expects to hear when watching a particular scene. As Roesch says of his approach, “when I look at a scene, I hear the sounds in my head” (0m48s). So can a visual analysis of the scene be used to identify material interactions and then automatically generate sounds corresponding to our expectations of what those interactions should sound like?

This question was recently asked by a group of MIT researchers (Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. “Visually Indicated Sounds.” arXiv preprint arXiv:1512.08512 [PDF] (2015)) and summarised in the MIT News article “Artificial intelligence produces realistic sounds that fool humans“.

“On many occasions, … sounds are not just statistically associated with the content of the images – the way, for example, that the sounds of unseen seagulls are associated with a view of a beach – but instead are directly caused by the physical interaction being depicted: you see what is making the sound. We call these events visually indicated sounds, and we propose the task of predicting sound from videos as a way to study physical interactions within a visual scene. To accurately predict a video’s held-out soundtrack, an algorithm has to know about the physical properties of what it is seeing and the actions that are being performed. This task implicitly requires material recognition…”

In their study, the team trained an algorithm using thousands of videos of a drum stick interacting with a wide variety of material objects in an attempt to associate particular with sounds with different materials, as well as the mode of interaction (hitting, scraping, and so on).

The next step was then to show the algorithm a silent video, and see if it could generate an appropriate soundtrack, in effect acting as a synthetic foley artist (Visually-Indicated Sounds, MITCSAIL).


SAQ: to what extent do you think foley artists like John Roesch might be replaced by algorithms?

Answer: whilst the MIT demo is an interesting one, it is currently limited to a known object – the drumstick – interacting with an arbitrary object. The video showed how even then, the algorithm occasionally misinterpreted the sort of interaction being demonstrated (e.g. mistaking a hit). For a complete system, the algorithm would have to identify both materials involved in the interaction, as well as the sort of interaction, and synthesize an appropriate sound. If the same sort of training method was used for this more general sort of system, I think it would be unlikely that a large enough corpus of training videos could be created (material X interacts with material Y in interaction Z) to provide a reliable training set. In addition, as foley artist John Roesch pointed out, “what you see is not necessarily what you get” (1m31s)…!

Accessible Gaming

One of the things that a great many games have in common is that they are visually rich and actually require a keen visual sense in order to play them. In this post, I’ll briefly review the idea of accessible gaming in the sense of accessible video games, hopefully as a springboard for a series of posts that explore some of the design principles around accessible games, and maybe even a short accessible game tutorial.

So what do I mean by an accessible game? A quick survey of web sites that claim to cover accessible gaming focus on the notion of visual accessibility, or the extent to which an unsighted person or person with a poor vision will be able to engage with a game. However, creating accessible games also extends to games that are appropriate for gamers who are hard of hearing (audio cues are okay, but they should not be the sole way of communicating something important to the player); gamers who have a physical disability that makes it hard for the player to use a particular input device (whether that’s a keyboard and mouse, gamepad, Wiimote controller, or whatever.); and gamers who have a learning disability or, age or trauma related cognitive impairment.

The Game Accessibility website provides the following breakdown of accessible games and the broad strategies for making them accessible:

Gaming with a visual disability: “In the early days of video gaming visually disabled gamers hardly encountered any accessibility problems. Games consisted primarily of text and therefore very accessible for assistive technologies. When the graphical capabilities in games grew, the use of text was reduced and ‘computer games’ transformed into ‘video games’, eventually making the majority of mainstream computer games completely inaccessible. The games played nowadays by gamers with a visual disability can be categorized by 1) games not specifically designed to be accessible (text-based games and video games) and 2) games specifically designed to be accessible (audio games, video games that are accessible by original design and video games made accessible by modification).” Accessible games in this category include text based games and audio games, “that consists of sound and have only auditory (so no visual) output. Audio games are not specifically “games for the blind”. But since one does not need vision to be able to play audio games, most audio games are developed by and for the blind community.”.
Gaming with a hearing disability: “In the early days of video gaming, auditory disabled gamers hardly encountered any accessibility problems. Games consisted primarily of text and graphics and had very limited audio capabilities. While the audio capabilities in games grew, the use of text was reduced. … The easiest way to provide accessibility is to add so-called “closed-captions” for all auditory information. This allows deaf gamers to obtain the information and meaning of, for instance, dialog and sound effects.”
Gaming with a physical disability: “There are several games that can be played by people with a physical disability. … For gamers with a severe physical disability the number of controls might be limited to just one or two buttons. There are games specifically designed to be played with just one button. These games are often referred to as “one-switch”-games or “single-switch”-games.”
Gaming with a learning disability: “In order to get a good understanding of the needs of gamers with a learning disability, it is important to identify the many different types of learning disabilities [and] know that learning disabilities come in many degrees of severeness. … Learning disabilities include (but are not limited to): literacy difficulty (Dyslexia), Developmental Co-ordination Disorder (DCD) or Dyspraxia, handwriting difficulty (sometimes known as Dysgraphia), specific difficulty with mathematics (sometimes known as Dyscalculia), speech language and communication difficulty (Specific Language Impairment), Central Auditory Processing Disorder(CAPD), Autism or Aspergers syndrome, Attention Deficit (Hyperactivity) Disorder (ADD or ADHD) and memory difficulties. … The majority of mainstream video games are playable by gamers with learning disabilities. … Due to the limited controls one switch games are not only very accessible for gamers with limited physical abilities, but often very easy to understand and play for gamers with a learning disability.”

Generally then, accessible games may either rely on modifications or extensions to a particular game that offers players alternative ways of engaging with the game (for example, closed captions to provide an alternative to spoken word instructions), or they may have been designed with a particular constituency or modality in mind (for example, an audio game or game that responds well to a one-click control). It might also be that accessible games can be designed to suit a range of accessibility requirements (for example, an audio, text-based game with a simple or one-click control).

In the next post, I’ll focus on one class of games in particular – audio games.

Skillset Industry Standards – Creating Music and Sound Effects for Interactive Media Products

If the recent Digital Worlds posts on the topic of audio in computer games interested you, and you would like to learn more about working in this area, there are two sets of Skillset Standards that you may find useful: Skillset Standards relating to “Create Sound Effects For Interactive Media Products (IM27)” and Skillset Standards relating to “Create music for interactive media products (IM28)”.

Please not that the Digital Worlds blog only covered audio in the most rudimentary sense, and a full treatment would require one or more dedicated uncourse blogs on the subject to do it justice!

Here’s a brief summary of the standards for sound effects creation:

Example job titles: Sound Effects Designer, Audio Engineer

This unit is about your ability to create sound effects that work in an interactive context.

Knowledge and Understanding
This is what you must know
a. How to interpret and follow specifications or other briefs;
b. How, and to whom, to ask questions to clarify requirements or raise issues in response to the specification or brief;
c. Principles of sound design,sound effects and acoustics;
d. How to locate sources of audio material suitable for meeting the creative brief;
e. The types of audio effects that are available and their suitability for different products and contexts;
f. Ways in which sound effects can be used to enhance the user’s experience and/or give feedback on user interactions;
g. Appropriate file formats for saving sound effects;
h. The effect of audio sampling-rates and bit-depth on file-size and data-transfer rates;
i. When and why a sound effect might be cut-off prematurely,and how to minimise the risk of this adversely affecting the product.
j. The various types of data compression and their relative merits and demerits.
k. How to layer sounds to achieve a combined audio effect or to produce a complex replay of elements with logical replay rules
l. The various techniques for synchronising sounds to moving images
m. How to use and work within bespoke 3D geometry rules
n. The recording,editing and post production of dialogue

This is what you must be aware of
i. Project parameters and constraints including target platforms and their capabilities, especially relating to audio playback;
ii. Any other audio, such as background music, that the sound effects you create will need to fit with;
iii. The events or user interactions that will trigger sound effects in the product;
iv. How each sound effect will be used in the product (for example, whether it will play once, loop several times or indefinitely etc.);
v. Compatibility issues between mono, stereo, multi-channel and surround sound;
vi. When permission is needed to use material created by others;
vii. The limits of what you may legally do with material created by others before permission is needed;
viii. Any naming conventions, standards, guidelines or specifications that you need to follow;
ix. The requirements and expectations of other team members who will use the sound effects you create.

Performance Statements
This is what you must be able to do
1. Generate original sound effects to meet a brief or specification;
2. Systematically assess the implementation of your work in iterative versions and specify changes in effects, volume, pitch and panning
3. Edit existing audio material to create sound effects to meet a brief or specification;
4. Save sound effects in an appropriate format for different target platforms;
5. Organise sound effects using appropriate filing and naming conventions so that they can be located easily by others;
6. Provide clear documentation and audio demonstration clips as necessary for others to incorporate your sound effects into the product;
7. Liaise with colleagues, such as designers and developers, to ensure your sound effects are appropriate and meet requirements;
8. Liaise with the relevant authority to obtain approval for your work.

Here’s a brief summary of the standards for music composition:

Example job titles: Composer, Musician, Music Writer

This unit is about your ability to compose and record music for use in interactive products. It assumes you already know how to compose music generally and now need to apply this skill in an interactive media context.

You might need to save your compositions as:
• MIDI files
• AIFF sound files
• WAV sound files
• AC3 files

Knowledge and Understanding
a. How to interpret and follow specifications or other briefs;
b. Leading the process of assessing and specifying music requirements as necessary
c. How, and to whom, to ask questions to clarify requirements or raise issues in response to the specification or brief;
d. The different technologies used in a computer-based music studio, including samplers, sequencers, MIDI devices, ‘outboard’ recording studio hardware and mixing desks;
e. How to sample audio from legitimate sources and use sound samples in your composition;
f. How to use appropriate software to record, sequence and mix audio;
g. Different formats in which music can be output, and when it would be appropriate to use them;
h. The effect of audio sampling-rates and bit-depth on file-size and data-transfer rates;
i. How to address the challenges of scoring music for non-linear medium with scenes of indeterminate length by employing techniques like branching segments and the use of music layers mixed dynamically at run-time.
j. How to articulate designs for bespoke development tools to enable auditioning of your work out of context

This is what you must be aware of
i. Project parameters and constraints including target platforms and their capabilities, especially relating to audio playback and data-transfer rates;
ii. How the music will be used in the product (for example, whether it will play once, loop several times or indefinitely, whether it needs to sync with specific parts of the product, etc.);
iii. How the music content will work in conjunction with sound effects and dialogue
iv. Any requirement for the music to change in response to events or user interactions (for example by changing key or tempo, or by segueing into another piece);
v. When permission is needed to sample or use material created by others;
vi. The limits of what you may legally do with material created by others before permission is needed;
vii. The overall purpose and mood of the product and its intended user experience;
viii. How music has been used to enhance comparable products including competitor products.

Performance Statements
This is what you must be able to do
1. Compose music that is appropriate for the purpose and mood of the product;
2. Record music in an appropriate format that can be reproduced within the capabilities of the target platforms;
3. Mix and edit music in an appropriate format that can be reproduced within the capabilities of the target platforms;
4. Create music that can respond to events and user interactions as required;
5. Organise your work using appropriate filing and naming conventions so that it can be located easily by others;
6. Provide clear documentation as necessary for others to incorporate your work into the product;
7. Liaise with colleagues, such as designers and developers, to ensure your work is appropriate and meets requirements;
8. Liaise with the relevant authority to obtain approval for your work.

You might also find the following Open University course of interest: TA212 The technology of music. Here’s a summary: This joint technology/arts course starts with an introduction to music theory and notation and the technological techniques needed in a study of music technology. The principles of sound and acoustics are studied and the course relates musical terms and fundamentals to their physical equivalents. You will study the operation and characteristics of various musical instruments, how music can be represented and stored, the fundamentals of recording, manipulation and transmission of sound, MIDI, current developments and some associated legal/commercial issues.

You can find sample material from the course in the following OpenLearn Units: