Archive for the 'Audio' Category

Noise Cancellation – An Example of Mediated Audio Reality?

Whilst it is tempting to focus on the realtime processing of visual imagery when considering augmented reality, notwithstanding the tricky problem of inserting a transparent display between the viewer and the physical scene when using magic lens approaches, it may be that the real benefits of augmented reality will arise from the augmentation or realtime manipulation of another modality such as sound.

EXERCISE: describe two or three examples of how audio may be used, or transformed, to alter a user’s perception or understanding of their current environment.

ANSWER: car navigation systems augment spatial location with audio messages describing when to turn and audio guides in heritage settings, where you can listen to a story that “augments” a particular location. Noise cancelling earphones transform the environment by subtracting, or tuning out, background noise and modern digital hearing aids process the audio environment at a personal level in increasingly rich ways.

Noise Cancellation

As briefly described in Blurred Edges – Dual Reality, mediated reality is a general term in which information may be added to or subtracted from a real world scene. In many industrial and everyday settings, intrusive environmental noise may lead to an unpleasant work environment, or act as an obstacle to audio communication. In such situations, it might be convenient to remove the background noise and expose the subjects within it to a mediated audio reality.

Noise cancellation provides one such form of mediated reality, where the audio environment is actively “cleaned” of an otherwise intrusive noise component. Noise cancellation technology can be use to cancel out intrusive noise in noisy environments, such as cars or aircraft.

Noise cancelled environments might also be considered as a form of hyper-reality, in the sense that no information other than that contained within, or derived from, the original signal is presented as part of the “augmented” experience

EXERCISE: watch the following videos that demonstrate the effect of noise cancelling headphones and that describe how they work, then answer the following questions:

  • how does “active” noise cancellation differ from passive noise cancellation?
  • what sorts of noise are active noise cancellation systems most effective at removing, and why?
  • what sort of system can be used to test or demonstrate the effectiveness of noise cancelling headphones?

Finally, write down an algorithm that describes, in simple terms, the steps involved in a simple noise cancelling system.

EXERCISE: Increasingly, top end cars may include some sort of noise cancellation system to reduce the effects of road noise. How might noise cancellation be used, or modified, to cancel noise in an enclosed environment where headphones are not typically worn, such as when sat inside a car?

Rather than presenting the mixed audio signal to a listener via headphones, under some circumstances speakers may be used to cancel the noise as experienced within a more open environment.

As well as improving the experience of someone listening to music in a noisy environment, noise cancellation techniques can also be useful as part of a hearing aid for hard of hearing users. One of the major aims of hearing aid manufacturers is to improve the audibility of speech – can noise cancellation help here?

EXERCISE: read the articles – and watch/listen to the associated videos – Noise Reduction Systems and Reverb Reduction produced by hearing aid manufacturer Sonic. What sorts of audio reality mediation are described?

It may seem strange to you to think of hearing aids as augmented, or more generally, mediated, reality devices, but their realtime processing and representation of the user’s current environment suggests this is exactly what they are!

In the next post on this theme, we will explore what sorts of physical device or apparatus can be used to mediate audio realities. But for now, let’s go back to the visual domain…

The Art of Sound – Algorithmic Foley Artists?

As well as being a visual medium, films also rely on a rich audio environment to communicate emotion and affect (sic). In some cases, it may not be possible to capture the sound associated with a particular action, either because of noise in the environment (literally), or because the props themselves do not have the physical properties of the thing they portray. For example, two wooden swords used in a sword fight that are painted to look like metal would not sound like metal swords when coming in contact to each other. When a film is dubbed, and the original speech recording replaced by a post-production recording, any original sound effects also need to be replaced.

Foley artists add sounds to a film in post-production (that is, after the film has been shot). As foley artist John Roesch describes, whatever we see on that screen, we are making the most honest representation thereof, sonically (“Where the Sounds From the World’s Favorite Movies Are Born“, Wired, 0m42s).

One of the aims of the foley artist is to represent the sounds that the viewer expects to hear when watching a particular scene. As Roesch says of his approach, “when I look at a scene, I hear the sounds in my head” (0m48s). So can a visual analysis of the scene be used to identify material interactions and then automatically generate sounds corresponding to our expectations of what those interactions should sound like?

This question was recently asked by a group of MIT researchers (Owens, Andrew, Phillip Isola, Josh McDermott, Antonio Torralba, Edward H. Adelson, and William T. Freeman. “Visually Indicated Sounds.” arXiv preprint arXiv:1512.08512 [PDF] (2015)) and summarised in the MIT News article “Artificial intelligence produces realistic sounds that fool humans“.

“On many occasions, … sounds are not just statistically associated with the content of the images – the way, for example, that the sounds of unseen seagulls are associated with a view of a beach – but instead are directly caused by the physical interaction being depicted: you see what is making the sound. We call these events visually indicated sounds, and we propose the task of predicting sound from videos as a way to study physical interactions within a visual scene. To accurately predict a video’s held-out soundtrack, an algorithm has to know about the physical properties of what it is seeing and the actions that are being performed. This task implicitly requires material recognition…”

In their study, the team trained an algorithm using thousands of videos of a drum stick interacting with a wide variety of material objects in an attempt to associate particular with sounds with different materials, as well as the mode of interaction (hitting, scraping, and so on).

The next step was then to show the algorithm a silent video, and see if it could generate an appropriate soundtrack, in effect acting as a synthetic foley artist (Visually-Indicated Sounds, MITCSAIL).


SAQ: to what extent do you think foley artists like John Roesch might be replaced by algorithms?

Answer: whilst the MIT demo is an interesting one, it is currently limited to a known object – the drumstick – interacting with an arbitrary object. The video showed how even then, the algorithm occasionally misinterpreted the sort of interaction being demonstrated (e.g. mistaking a hit). For a complete system, the algorithm would have to identify both materials involved in the interaction, as well as the sort of interaction, and synthesize an appropriate sound. If the same sort of training method was used for this more general sort of system, I think it would be unlikely that a large enough corpus of training videos could be created (material X interacts with material Y in interaction Z) to provide a reliable training set. In addition, as foley artist John Roesch pointed out, “what you see is not necessarily what you get” (1m31s)…!

Accessible Gaming

One of the things that a great many games have in common is that they are visually rich and actually require a keen visual sense in order to play them. In this post, I’ll briefly review the idea of accessible gaming in the sense of accessible video games, hopefully as a springboard for a series of posts that explore some of the design principles around accessible games, and maybe even a short accessible game tutorial.

So what do I mean by an accessible game? A quick survey of web sites that claim to cover accessible gaming focus on the notion of visual accessibility, or the extent to which an unsighted person or person with a poor vision will be able to engage with a game. However, creating accessible games also extends to games that are appropriate for gamers who are hard of hearing (audio cues are okay, but they should not be the sole way of communicating something important to the player); gamers who have a physical disability that makes it hard for the player to use a particular input device (whether that’s a keyboard and mouse, gamepad, Wiimote controller, or whatever.); and gamers who have a learning disability or, age or trauma related cognitive impairment.

The Game Accessibility website provides the following breakdown of accessible games and the broad strategies for making them accessible:

Gaming with a visual disability: “In the early days of video gaming visually disabled gamers hardly encountered any accessibility problems. Games consisted primarily of text and therefore very accessible for assistive technologies. When the graphical capabilities in games grew, the use of text was reduced and ‘computer games’ transformed into ‘video games’, eventually making the majority of mainstream computer games completely inaccessible. The games played nowadays by gamers with a visual disability can be categorized by 1) games not specifically designed to be accessible (text-based games and video games) and 2) games specifically designed to be accessible (audio games, video games that are accessible by original design and video games made accessible by modification).” Accessible games in this category include text based games and audio games, “that consists of sound and have only auditory (so no visual) output. Audio games are not specifically “games for the blind”. But since one does not need vision to be able to play audio games, most audio games are developed by and for the blind community.”.
Gaming with a hearing disability: “In the early days of video gaming, auditory disabled gamers hardly encountered any accessibility problems. Games consisted primarily of text and graphics and had very limited audio capabilities. While the audio capabilities in games grew, the use of text was reduced. … The easiest way to provide accessibility is to add so-called “closed-captions” for all auditory information. This allows deaf gamers to obtain the information and meaning of, for instance, dialog and sound effects.”
Gaming with a physical disability: “There are several games that can be played by people with a physical disability. … For gamers with a severe physical disability the number of controls might be limited to just one or two buttons. There are games specifically designed to be played with just one button. These games are often referred to as “one-switch”-games or “single-switch”-games.”
Gaming with a learning disability: “In order to get a good understanding of the needs of gamers with a learning disability, it is important to identify the many different types of learning disabilities [and] know that learning disabilities come in many degrees of severeness. … Learning disabilities include (but are not limited to): literacy difficulty (Dyslexia), Developmental Co-ordination Disorder (DCD) or Dyspraxia, handwriting difficulty (sometimes known as Dysgraphia), specific difficulty with mathematics (sometimes known as Dyscalculia), speech language and communication difficulty (Specific Language Impairment), Central Auditory Processing Disorder(CAPD), Autism or Aspergers syndrome, Attention Deficit (Hyperactivity) Disorder (ADD or ADHD) and memory difficulties. … The majority of mainstream video games are playable by gamers with learning disabilities. … Due to the limited controls one switch games are not only very accessible for gamers with limited physical abilities, but often very easy to understand and play for gamers with a learning disability.”

Generally then, accessible games may either rely on modifications or extensions to a particular game that offers players alternative ways of engaging with the game (for example, closed captions to provide an alternative to spoken word instructions), or they may have been designed with a particular constituency or modality in mind (for example, an audio game or game that responds well to a one-click control). It might also be that accessible games can be designed to suit a range of accessibility requirements (for example, an audio, text-based game with a simple or one-click control).

In the next post, I’ll focus on one class of games in particular – audio games.

Skillset Industry Standards – Creating Music and Sound Effects for Interactive Media Products

If the recent Digital Worlds posts on the topic of audio in computer games interested you, and you would like to learn more about working in this area, there are two sets of Skillset Standards that you may find useful: Skillset Standards relating to “Create Sound Effects For Interactive Media Products (IM27)” and Skillset Standards relating to “Create music for interactive media products (IM28)”.

Please not that the Digital Worlds blog only covered audio in the most rudimentary sense, and a full treatment would require one or more dedicated uncourse blogs on the subject to do it justice!

Here’s a brief summary of the standards for sound effects creation:

Example job titles: Sound Effects Designer, Audio Engineer

This unit is about your ability to create sound effects that work in an interactive context.

Knowledge and Understanding
This is what you must know
a. How to interpret and follow specifications or other briefs;
b. How, and to whom, to ask questions to clarify requirements or raise issues in response to the specification or brief;
c. Principles of sound design,sound effects and acoustics;
d. How to locate sources of audio material suitable for meeting the creative brief;
e. The types of audio effects that are available and their suitability for different products and contexts;
f. Ways in which sound effects can be used to enhance the user’s experience and/or give feedback on user interactions;
g. Appropriate file formats for saving sound effects;
h. The effect of audio sampling-rates and bit-depth on file-size and data-transfer rates;
i. When and why a sound effect might be cut-off prematurely,and how to minimise the risk of this adversely affecting the product.
j. The various types of data compression and their relative merits and demerits.
k. How to layer sounds to achieve a combined audio effect or to produce a complex replay of elements with logical replay rules
l. The various techniques for synchronising sounds to moving images
m. How to use and work within bespoke 3D geometry rules
n. The recording,editing and post production of dialogue

This is what you must be aware of
i. Project parameters and constraints including target platforms and their capabilities, especially relating to audio playback;
ii. Any other audio, such as background music, that the sound effects you create will need to fit with;
iii. The events or user interactions that will trigger sound effects in the product;
iv. How each sound effect will be used in the product (for example, whether it will play once, loop several times or indefinitely etc.);
v. Compatibility issues between mono, stereo, multi-channel and surround sound;
vi. When permission is needed to use material created by others;
vii. The limits of what you may legally do with material created by others before permission is needed;
viii. Any naming conventions, standards, guidelines or specifications that you need to follow;
ix. The requirements and expectations of other team members who will use the sound effects you create.

Performance Statements
This is what you must be able to do
1. Generate original sound effects to meet a brief or specification;
2. Systematically assess the implementation of your work in iterative versions and specify changes in effects, volume, pitch and panning
3. Edit existing audio material to create sound effects to meet a brief or specification;
4. Save sound effects in an appropriate format for different target platforms;
5. Organise sound effects using appropriate filing and naming conventions so that they can be located easily by others;
6. Provide clear documentation and audio demonstration clips as necessary for others to incorporate your sound effects into the product;
7. Liaise with colleagues, such as designers and developers, to ensure your sound effects are appropriate and meet requirements;
8. Liaise with the relevant authority to obtain approval for your work.

Here’s a brief summary of the standards for music composition:

Example job titles: Composer, Musician, Music Writer

This unit is about your ability to compose and record music for use in interactive products. It assumes you already know how to compose music generally and now need to apply this skill in an interactive media context.

You might need to save your compositions as:
• MIDI files
• AIFF sound files
• WAV sound files
• AC3 files

Knowledge and Understanding
a. How to interpret and follow specifications or other briefs;
b. Leading the process of assessing and specifying music requirements as necessary
c. How, and to whom, to ask questions to clarify requirements or raise issues in response to the specification or brief;
d. The different technologies used in a computer-based music studio, including samplers, sequencers, MIDI devices, ‘outboard’ recording studio hardware and mixing desks;
e. How to sample audio from legitimate sources and use sound samples in your composition;
f. How to use appropriate software to record, sequence and mix audio;
g. Different formats in which music can be output, and when it would be appropriate to use them;
h. The effect of audio sampling-rates and bit-depth on file-size and data-transfer rates;
i. How to address the challenges of scoring music for non-linear medium with scenes of indeterminate length by employing techniques like branching segments and the use of music layers mixed dynamically at run-time.
j. How to articulate designs for bespoke development tools to enable auditioning of your work out of context

This is what you must be aware of
i. Project parameters and constraints including target platforms and their capabilities, especially relating to audio playback and data-transfer rates;
ii. How the music will be used in the product (for example, whether it will play once, loop several times or indefinitely, whether it needs to sync with specific parts of the product, etc.);
iii. How the music content will work in conjunction with sound effects and dialogue
iv. Any requirement for the music to change in response to events or user interactions (for example by changing key or tempo, or by segueing into another piece);
v. When permission is needed to sample or use material created by others;
vi. The limits of what you may legally do with material created by others before permission is needed;
vii. The overall purpose and mood of the product and its intended user experience;
viii. How music has been used to enhance comparable products including competitor products.

Performance Statements
This is what you must be able to do
1. Compose music that is appropriate for the purpose and mood of the product;
2. Record music in an appropriate format that can be reproduced within the capabilities of the target platforms;
3. Mix and edit music in an appropriate format that can be reproduced within the capabilities of the target platforms;
4. Create music that can respond to events and user interactions as required;
5. Organise your work using appropriate filing and naming conventions so that it can be located easily by others;
6. Provide clear documentation as necessary for others to incorporate your work into the product;
7. Liaise with colleagues, such as designers and developers, to ensure your work is appropriate and meets requirements;
8. Liaise with the relevant authority to obtain approval for your work.

You might also find the following Open University course of interest: TA212 The technology of music. Here’s a summary: This joint technology/arts course starts with an introduction to music theory and notation and the technological techniques needed in a study of music technology. The principles of sound and acoustics are studied and the course relates musical terms and fundamentals to their physical equivalents. You will study the operation and characteristics of various musical instruments, how music can be represented and stored, the fundamentals of recording, manipulation and transmission of sound, MIDI, current developments and some associated legal/commercial issues.

You can find sample material from the course in the following OpenLearn Units:

Making Game Music the MIDI Way

As well as playing back digitised recordings of ‘real’ analogue music, many games make use of MIDI recorded soundtracks to “play” the music back in real time via a music synthesiser.

Originally design as a protocol for connecting physical keyboard based synthesisers to each other and to rack mounted, expander synthesiser boxes, MIDI – the Musical Instrument Digital Interface – soon became used more widely as a standard for recording musical sequences as a series of control signals that could be used to ‘play’ an computer musical instrument. Today, MIDI sequences can be used to control a wide variety of peripherals, including the onboard synthesiser on many computer sound cards, using a specification known as General MIDI (General MIDI 1 specification).

Following its introduction in 1991, General MIDI gained widespread support amongst manufacturers, enabling game publishers to make use of it to play ‘live’ their game soundtracks. (More recently, in 2007, GM1 was superceded by GM2. However, as GM1 continues to remain the de facto. standard, I shall not consider GM2 here.).

So what exactly is General MIDI?

General MIDI (GM)

The General MIDI specification describes a minimum specification synthesiser that can be guaranteed to play back a General MIDI sequence. GM synthesisers are supported by many sound cards, and by music players such as Quicktime and various Windows media players.

The General MIDI Level 1 specification required, among other things, “a minimum of either 24 fully dynamically allocated voices are available simultaneously for both melodic and percussive sounds, or 16 dynamically allocated voices are available for melody plus 8 for percussion. Support for all 16 MIDI channels, each capable of playing “a variable number of voices (polyphony). Each Channel can play a different instrument (sound/patch/timbre). A minimum of 16 simultaneous and different timbres playing various instruments. A minimum of 128 preset instruments (MIDI program numbers) conforming to the GM1 Instrument Patch Map and 47 percussion sounds which conform to the GM1 Percussion Key Map.”

To see what instruments and percussive sounds are supported by a GM1 synthesiser, see the General MIDI Level 1 Sound Set.

What GM1 MIDI instrument patch number corresponds to “Acoustic Guitar (steel)”? How many isntruments are in each GM1 instrument family? What instruments are available in the “Brass” instrument family? What GM1 percussion key number corresponds to a “Closed Hi Hat”?

If you would like to listen to any – or even all – of the sounds supported by GM1, why not try putting together your own GM1 composition, for example using the Online Midi Sequencer.

To create a MIDI track, select one or more instruments or percussion instruments, tick the left most checkbox to turn the instrument on, and click in the sequence checkboxes to select the beats on which each instrument will play. Click on create MIDI and the embedded player will play your sequence back to you.

Experiment with creating some simple sequences using the Online MIDI Sequencer. If you come up with a sequence you are particularly proud of, or think may go well with a particular game, why not take a screenshot of the settings, upload it to your blog (or maybe to an image sharing site such as flickr) and post a link to it back as a comment here, explaining how you think the sequence could be used ;-)

If you save your composition as a MIDI file, I think it should play in Game Maker…(?!)

Want to know more about MIDI?

To see how MIDI can be used in a digital music studio to interconnect a variety of computers and digital music synthesisers, read the Tutorial on MIDI and Music Synthesis on the MIDI Manufacturers’ Association website.

If you are interested in learning more about the actual format of MIDI sequences, this article on The MIDI File Format provides a good introduction.

Representing Analogue Sound Files in a Digital Way

In the post Finishing the Maze – Adding Background Music, I mentioned there were two sorts of sound file that Game Maker could play: sound files (like WAV files, or compressed MP3 files) or MIDI files.

In this aside post, I just want to briefly review the principle of how analogue (continuously varying) sound recordings can be stored as digital files using material sourced in part from the OpenLearn units “Crossing the boundary – analogue universe, digital worlds” (in particular the section Crossing the boundary – Sound and music) and “Representing and manipulating data in computers” (in particular the section Representing sound).

Sound and music

Second only to vision, we rely on sound. Music delights us, noises warn us of impending danger, and communication through speech is at the centre of our human lives. We have countless reasons for wanting computers to reach out and take sounds across the boundary.

Sound is another analogue feature of the world. If you cry out, hit a piano key or drop a plate, then you set particles of air shaking – and any ears in the vicinity will interpret this tremor as sound. At first glance, the problem of capturing something as intangible as a vibration and taking it across the boundary seems even more intractable than capturing images. But we all know it can be done – so how is it done?

The best way into the problem is to consider in a little more detail what sound is. Probably the purest sound you can make is by vibrating a tuning fork. As the prongs of the fork vibrate backwards and forwards, particles of air move in sympathy with them. One way to visualise this movement is to draw a graph of how far an air particle moves backwards and forwards (we call this its displacement) as time passes. The graph (showing a typical wave form) will look like this:

An image showing the pattern of peaks and troughs in air particle displacement over time by vibrating a tuning fork
Displacement of air particles over time by vibrating a tuning fork

Our particle of air moves backwards and forwards in the direction the sound is traveling. As shown in the previous figure, a cycle represents the time between adjacent peaks (or troughs) and the number of cycles completed in a fixed time (usually a second) is known as the frequency. The amplitude of the wave (i.e. maximum displacement of the line in the graph) determines how loud the sound is, the frequency decides how low or high pitched the note sounds to us. Note, though, that the diagram is theoretical; in reality, the amplitude will decrease as the sound fades away.

A sound of high frequency is one that people hear as a high-pitched sound; a sound of low frequency is one that people hear as one of low-pitched sound. Sound consists of air vibrations, and it is the rate at which the air vibrates that determines the frequency: a higher vibration rate is a higher frequency. So if the air vibrates at, say, 100 cycles per second then the frequency of the sound is said to be 100 cycles per second. The unit of 1 cycle per second is given the name ‘hertz’, abbreviated to ‘Hz’. Hence a frequency of 100 cycles per second is normally referred to as a frequency of 100 Hz.

Of course, a tuning fork is a very simple instrument, and so makes a very pure sound. Real instruments and real noises are much more complicated than this. An instrument like a clarinet would have a complex waveform, perhaps like the left hand graph (a) below, and the dropped plate would be a formless nightmare like right hand one (b).

Typical waveforms from a clarinet playing a note and a plate being dropped
Typical waveforms


Write down a few ideas about how we might go about transforming a waveform into numbers. This is a difficult question, so as a clue, why not see look at how numbers may be used to encode images: Subsection 4.3 of the the OpenLearn Unit Crossing the boundary – analogue universe, digital worlds.


In a way the answer is similar to the question on how to transform a picture into numbers (see Subsection 4.3 of the OpenLearn Unit Crossing the boundary – analogue universe, digital worlds). We have to find some way to split up the waveform. We split up images by dividing them into very small spaces (pixels). We can split a sound wave up by dividing it into very small time intervals.

What we can do is record what the sound wave is doing at small time intervals. Taking readings like this at time intervals is called sampling. The number of times per second we take a sample is called the sampling rate.

I’ll take the tuning fork example, set an interval of say 0.5 second and look at the state of the wave every 0.5 second, as shown below.

An image showing the sampling rate for the tuning fork at an interval of 0.5 seconds
Sampling a sound wave

Reading off the amplitude of the wave at every sampling point (marked with dots), gives the following set of numbers:

+9.3, −3.1, −4.1, +8.2, −10.0, +4.0, +4.5

as far as I can judge. Now, if we plot a new graph of the waveform, using just these figures, we get the graph below.

The sample from the previous image shown as a graph
Image of a waveform

The plateaux at each sample point represent the intervals between samples, where we have no information, and so assume that nothing happens. It looks pretty hopeless, but we’re on the right track.

Self-Assessment Question (SAQ)

How can we improve on the blocky figure shown directly above?

The problem here is similar to one that may be encountered with a digitised bitmapped (pixelated) image. In that case we decreased our spatial division of the image by making the pixel size smaller. In this case we can decrease our temporal splitting up of the waveform, by making the sampling interval smaller.

So, let’s decrease the sampling interval by taking a reading of the amplitude every 0.1 second.

Image showing the amplitude every 0.1 second
Sampling every 0.1 second

Once again, I’ll read the amplitude at each sampling point and plot them to a new graph, which is already starting to look a little bit more like the original waveform.

Graph of the amplitude from the previous image which looks more like the original waveform because it is more detailed
Waveform using higher sampling rate

So how often must the sound be sampled? There is a rule called the sampling theorem which says that if the frequencies in the sound range from 0 to B Hz then, for a faithful representation, the sound must be sampled at a rate greater than 2B samples per second.


The human ear can detect frequencies in music up to around 20 kHz (that is, 20 000 Hz). What sampling rate is needed for a faithful digital representation of music? What is the time interval between successive samples?

20 kHz is 20 000 Hz, and so the B in the text above the question is 20 000. The sampling theorem therefore says that the music must be sampled more than 2 × 20 000 samples per second, which is more than 40 000 samples per second.

If 40 000 samples are being taken each second, they must be 1/40 000 seconds apart. This is 0.000025 seconds, which is 0.025 milliseconds (thousandths of a second) or 25 microseconds (millionths of a second).

The answer shows the demands made on a computer if music is to be faithfully represented. Samples of the music must be taken at intervals of less than 25 microseconds. And each of those samples must be stored by the computer.

If speech is to be represented then the demands can be less stringent, first because the frequency range of the human voice is smaller than that of music (up to only about 12 kHz) and second because speech is recognisable even when its frequency range is quite severely restricted. (For example, some digital telephone systems sample at only 8000 samples per second, thereby cutting out most of the higher-frequency components of the human voice, yet we can make sense of what the speaker on the other end of the phone says, and even recognise their voice.)


Five minutes of music is sampled at 40 000 samples per second, and each sample is encoded into 16 bits (2 bytes). How big will the resulting music file be?
Five minutes of speech is sampled at 8000 samples per second, and each sample is encoded into 16 bits (2 bytes). How big will the resulting speech file be?

5 minutes = 300 seconds. So there are 300 × 40 000 samples. Each sample occupies 2 bytes, making a file size of 300 × 40 000 × 2 bytes, which is 24 000 000 bytes – some 24 megabytes!

A sampling rate of 8000 per second will generate a fifth as many samples as a rate of 40 000 per second. So the speech file will ‘only’ be 4 800 000 bytes.

This process of sampling the waveform is very similar to the breaking up of a picture into pixels, except that, whereas we split the picture into tiny units of area; we are now breaking the waveform into units of time. In the case of the picture, making our pixels smaller increased the quality of the result, so making the time intervals at which we sample the waveform smaller will bring our encoding closer to the original sound. And just as it is impossible to make a perfect digital coding of an analogue picture, because we will always lose information between the pixels, so we will always lose information between the times we sample a waveform. We can never make a perfect digital representation of an analogue quantity.


Now we’ve sampled the waveform, what do we need to do next to encode the audio signal?


Remember that after we had divided an image into pixels, we then mapped each pixel to a number. We need to carry out the same process in the case of the waveform.

This mapping of samples (or pixels) to numbers is known as quantisation. Again, the faithfulness of the digital copy to the analogue original will depend on how large a range of numbers we make available. [If “8-bit” sampling is used, 256 different amplitudes can be measured.

That is: 2 x 2 x 2 x 2 x 2 x 2 x 2 x 2 = 256 different levels.]

The human eye is an immensely discriminating instrument; the ear is less so. We are not generally able to detect pitch differences of less than a few hertz (1 hertz (Hz) is a frequency of one cycle per second). So sound wave samples are generally mapped to 16-bit numbers.

Copyright OpenLearn/The Open University, licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 Licence.

The wav encoding that Game Maker can play back is based on the above principles. wav files can be recorded at 8-bit, 16-bit or 24-bit resolution, using a sampling rate set between 8,000 Hz and 48000 Hz. If you calculate some file sizes for different lenght audio clips at a variety of sampling rates and quantisation levels, you will see that wav audio files can be quite big, even for short audio clips.

The MP3 format uses a similar approach to digitise the sound file in the first instance, but then reduces the size of the digital file by using a compression technique to encode the file again in another way. Compression effectively squashes the file size down so that it becomes smaller, but we shall not consider that here.

Creating a Game Soundtrack: Interactive and Adaptive Audio

Just as many films now feature a soundtrack music album, typically a compilation by the prolific ‘Various Artists’;-), game soundtracks are already starting to be released as compilation albums, or as ‘composer’ works. For example, soundtracks for several releases of the Final Fantasy game franchise have been released as orchestral works: Music of the Final Fantasy Series. Game soundtracks also seem to be establishing themselves as a bona fide musical genre. For example, the Rhapsody download service already features Video Game Soundtracks as a subgenre of its soundtracks area.

Where the music track provides backing for an inevitable story point, perhaps as the soundtrack to a cutscene, then it can be scored much like a score for a film sequence. In the case of a cutscene, the length of sequence is known, the action fixed, and the lead in and lead out from the scene known in advance.

But if the music is tied to the action, and the action is interactive, maybe even helping drive the creation of an emergent story, things are maybe a little more difficult…?

Skim read through the four page article Defining Adaptive Music and find out how the author defines “adaptive music” (by “skim reading”, I mean: do not read every word – glance through the article looking for appropriate keywords and headings…). How does “adaptve music” compare to a more traditional musical composition?

Now look at this second page of the article Design With Music In Mind: A Guide to Adaptive Audio for Game Designers. To what extent does the design of adaptive audio resemble the design of an emergent narrative structure? What additional constraints must the designer of the adaptive audio track contend with compared to the narrative designer?

If you want to keep tabs on the world of video game music and interactive audio, and maybe find out more, the music4games website and the Interactive Audio Special Interest Group are both worth a visit.

To see a wide variety of examples of game audio, the prettyuglygamesound blog has a growing collection of game audio critiques, with embedded video examples courtesy of Youtube.

From the blog’s ,em>About page:

PrettyUglyGameSoundStudy (or PUGS) is an experiment to gather as many examples of audio in games that people consider either to be ‘good’ (or ‘pretty’) and ‘bad’ (or ‘ugly’). On one hand we wish to get a better understanding of game audio that people consider to work well in games and on the other we would like to get an overview of (typical) game audio blunders, from which the field can benefit. We hope that eventually this archive can grow out to be an inspiration (as well as the occasional good laugh) for those working in the field of game audio.

We are Sander Huiberts and Richard van Tol and we are currently doing PhD research on game audio. For the past three years we have taught a course Game Audio Design at the Utrecht School of the Arts (Netherlands), in which we gave our students an assignment similar to the idea behind (“gather 1 minute of footage of what you consider to be ‘good’ game audio and 1 minute of footage of what you consider to be ‘bad’ game audio”). We ended up with lots of interesting footage as well as discussion points. Through this website we wish to share this footage.

The prohject is a work in progress: “Please feel free to contribute to this website either by uploading your favourite example of good or bad game audio, or by commenting the uploaded examples of others!”

If you do join in, let us know via a comment back here ;-)



Get every new post delivered to your Inbox.

Join 66 other followers