Understanding tools for designing interactive sounds
Web Bonus: Interested in digging deeper? Links to interactive learning materials, and a special exclusive offer for Electronic Musician readers, follow the main text.
WE ALL know that videogames are a huge business, which brings new opportunities
for composers and sound designers. Maybe you’re a complete newcomer to sound
for games, or perhaps you are already creating audio for traditional, linear media
like film or televison; but if you’re considering taking the leap into game sound,
first you need to understand the process and tools. In this feature, we’ll break
down middleware—how it works and what options are available—but let’s start
with the notion of designing audio for an interactive medium.
Imagine a linear medium, such as a film: If a character goes into the dark,
spooky castle and opens the door at 13 minutes and 22 seconds, you can easily
create or obtain a sound of a creaking door and place it on the timeline in your
DAW at that exact point. Once you synchronize it with the film, it will always play
at the same time.
Now imagine that each time you watch
this film, the character goes into the house at
a different time. This is a prime example of
the unpredictability of the game environment.
How could you successfully create sounds
for a medium when you don’t know when a
particular action is going to happen? You need
to move away from using time as a basis for
organizing sounds and concentrate on the
actions themselves. Let’s think of our spooky
house situation from the perspective of the
action: At some point, the character is going to
come up to the house and open the door. Let’s
list it as an action, like this:
Action #001 Spooky Door Opening > Play ‘ spookydoor.wav ’
Now we’ve defined our action. How do we
trigger it? In a movie, we may be out of luck,
but fortunately in the game, something—most
likely animation code—is going to cause the
door to move. Hook up the code that triggers
the door animation with the sound of a creaky
door and voila! Instant sync—whenever the
character opens the door, the sound will play.
This shift in thinking requires that each
sound in a game exists as a unique element.
We can’t use a mix of sound effects and
music anymore, except in certain situations.
Everything has to be mixed and mastered
separately. Furthermore, we have to be really
organized with all of these audio files (huge
AAA adventure games can contain hundreds
of thousands of files!) so that the programmer
knows what to do with these assets in the game.
It also means that the way the audio is
triggered in a game is intimately tied up with
the way the game is designed, and each game
is a complete universe unto itself: It has its
own sets of rules and regulations and rules for
emergent behavior and interactivity, and any
change in terms of game design can significantly
affect the way the sounds are triggered.
 |
| Middleware, such as Tazman Audio's Fabric, is a bridge between the game sound designer and the programmer. |
Meet the Middleman: Middleware A
typical game engine has an awful lot of things
to do. In essence, it has to run an entire virtual
world, complete with animations, shading
and rendering, visual effects, and of course,
a lot of audio. It must coordinate the overall
interactive logic of a game. It needs to know
both the location of rich media assets (including
music, sound effects, and voiceover files) as
well as when (and when not) to call them up.
Even in a small game, this can add up to a large
amount of data that needs to be coordinated.
In a large game the scope can extend into tens
or hundreds of thousands of assets. Game
engines are software packages that make games
possible, and it takes a talented and dedicated
group of designers to program the engine to do
all this work efficiently.
Back in the day, a sound designer made
noises and delivered those noises to a
programmer in some file format or another,
and that programmer put those files in the
game in some way or another. Since this was a
one-to-one relationship, it was usually a pretty
efficient system.
As games got more complex and people
were often far apart, it became standard
practice to make those noises and deliver them
to a programmer, maybe over the Internet,
along with a text document with file names
and instructions about where these sounds
were supposed to go and what they were
supposed to do. (This process is still in use
today.) But, over time, software has emerged
to handle an increasingly larger amount of the
heavy lifting between the game engine and
the programmer, and sound designer. This
software is called middleware.
Audio middleware is a type of engine that
works along with the core game engine and
sits, in a sense, in between the sound designer
and the programmer. Its main job is to allow
the designer, who may not be a programmer, to
have more control over how, when, and where
his or her sounds are triggered in a game.
Let’s look at a car-racing game. A car has a
motor and when the car goes faster, the sound
of the motor changes in different ways—RPMs
increase, gears shift, and the engine sound
changes pitch and intensity. Let’s say the
designer makes three sounds: car sound slow, car
sound medium, and car sound fast. In the past,
the programmer or integrator would take these
three sounds and write lines of code that would
trigger them, in real time, in the game as the car
got faster. The programmer would need to take
the time to program the sounds, adjust them,
and make sure they worked correctly during
gameplay, and while they were doing that, they
could not be doing other programming tasks.
Now let’s look at how this same car sound
could be developed using audio middleware.
The biggest difference would be that this
whole task can be accomplished by you,
the mighty sound designer! Most audio
middleware packages provide a graphical user
interface, which looks similar to Pro Tools or
other DAW applications; the sound designer
would work in this interface and deliver a
finished sound setup to the programmer. The
audio designer and the programmer only have
to know and agree on certain parameters,
called hooks, that are used in the game.
Let’s get back to our three files: car fast,
medium, and slow. The sound designer will
create these sounds and import them into the
middleware program of choice. Once they
are in the program, the designer will attach
necessary elements such as crossfades or
map game parameters to the audio. In this
example, it is quite common that the game
engine will be keeping track of the speed of
the car, probably in miles/km per hour, and
the programmer will create a hook called
car_speed inside their code and give this hook to the audio designer. As the car speeds up, this
parameter value will increase.
First, however, we need to create what
is often referred to as an event. Remember
when we referred to being concerned about
the action itself rather than the time it takes
place? This is an extension of that concept
embraced by middleware. Think of an event as
a “container” for some kind of action or state
in the game; an event can be tied to anything
in the game, and it can consist of multiple
parameters driven by the game’s logic.
Now that we have the hook of car speed,
we can create a custom parameter in our
middleware editor or GUI that is mapped
to the different engine sounds we have. The
value of the parameter will then control the
timing and crossfades as well as the selection
and balance of our low, medium, and high-speed
car engine sounds. It may even control
the speed/pitch of the sounds. It is then a
relatively simple matter to export sound files,
along with a text or configuration file, and
deliver to the programmer. All the programmer
has to do is tie the value of the parameter to
the game’s event using lines of code, and voilà!
The car’s visual speed will now match its
perceived audio intensity.
This is a very generalized example that
should give you an overview of how the
middleware process works. The details
surrounding the building of complex audio
events tied to game states and parameters
can get quite involved. The main point to
take away here, however, is that middleware
creates an all-round win-win situation: Sound
designers have more control over their work,
confident that the material they deliver is
actually what they will hear in the game, and
the programmer doesn’t have to spend as
much time on the audio in the game.
Audio Middleware Structure Most
middleware consists of a programmed
codebase that is split into two major sections
or areas. The most important to us is the
GUI. This section, usually set up in a separate
application, is primarily set up for the audio
designer to import, configure, and test audio.
Its main purpose is to allow the non-code-savvy
composer or sound designer to become a
sound implementer.
Once audio configuration and testing are
complete, the implementer can then build one or
more sound banks with configuration information
that the programmer can use to integrate the
audio system with the rest of the game code.
Commonly these banks and files can then be
placed inside the game by the implementers
in order to test them out within the game
environment. In some cases, the game itself can be
hooked to the middleware in real time.
The Other Side of the Fence: The API
Although the audio middleware system is
constructed around a codebase, the game
programmer will rarely deal with it at that
level. It’s much more common to use an API,
or Application Programming Interface. This is
a set of instructions and script-based standards
that allows access to the codebase without
having to delve too deeply into the main code,
which in some cases may not be accessible—in
other words, it may be what’s called closed
source. Web standards such as HTML5, CSS,
and JQuery are referred to as open source,
which means their code is public.
A middleware company, such as an audio
tool development company, usually releases
its API to the public so that other software
developers can design products that are
powered by its service. In some cases, the
company lets people use it for free—for
example, for students and individual users.
 |
| Firelight FMOD Studio offers sample-accurate audio triggering and a mixing system that allows buses, sends, and returns. |
Middleware Tools Let’s take a brief look at
the commercial middleware products available
today. These are used mainly by small to midsize
independent game developers.
• Firelight FMOD Studio Firelight
Technologies introduced FMOD in 2002 as
a cross-platform audio runtime library for
playing back sound for video games. Since its
inception, FMOD branched into a low-level
audio engine with an abstracted Event API,
and a designer tool that set the standard for
middleware editing/configuration reminiscent
of DAWs. Firelight has since continued its
innovation, releasing a brand-new audio
engine in 2013 called FMOD Studio, which
offers significant improvements over the older
FMOD Ex engine, such as sample-accurate
audio triggering, better file management, and
an advanced audio mixing system that allows
buses, sends, and returns.
Within the FMOD toolset, a sound
designer/implementer can define the basic
3D/2D parameters for a sound or event, in
addition to effectively mocking up complex
parametric relationships between different
sounds using intuitive crossfading and the
ability to draw in automation curves and use
effects and third-party plugins to change
the audio. Music can be configured within
FMOD Studio using tempo-based markers and timed triggers. Due to its flexible licensing
pricing structure, FMOD is now a solid and
widely adopted audio middleware choice and
continues to be a major player in today’s game
development environment.
 |
| Audiokinetic Wwise's Event System has become standard for many development houses worldwide. |
• Audiokinetic Wwise Introduced in 2006,
the Wwise (Wave Works Interactive Sound
Engine) toolset provides access to features
of its engine from within a staggeringly
comprehensive content-management UI. Its
abstracted and modular Event system, which
allows extremely complex and intricate results
from simple operations that can be nested within
each other, has become a standard for many
development houses worldwide. Wwise can
configure events via typical audio parameters,
with finely controlled randomization of volume,
pitch, surround placement, and effects, as well as
logic, switch/state changes, attenuation profiles,
and more. Its Interactive Music Engine enables
the generation of unique and unpredictable
soundtracks from a small amount of existing
music material.
Profiling is also a strong feature in Wwise; its
ability to mock up every aspect of the engine’s
ability brings the toolset further into a full
prototype simulation outside of the game engine.
Recently Audiokinetic announced a Mac-compatible
public beta version of the Wwise
Editor, plus upcoming MIDI integration of
Wwise with support for downloadable sound
banks of virtual instruments. Look for more
developments on this front in the fall.
• Tazman Audio Fabric The Fabric toolset
from Tazman Audio is another example of the
changing dynamics of the game audio market.
We’ve briefly mentioned the Unity3D game
engine as a prominent force; although Unity
is based around the FMOD engine and API, it
offers very few features for sound designers to
obtain middleware-like functionality without
having to learn code in the process. Fabric was
created to address this situation at a very high
and sophisticated level.
 |
| Dark Tonic's Master Audio is an example of Unity-based middleware. |
With an event-based system of triggering
and sounds, plus randomized and sequenced
backgrounds, parameter-based game mixing,
and a modular-based effect building system,
Fabric is becoming more and more a tool of
choice for developers desiring a more high-level
integrated tool inside Unity. Recently,
however, Tazman Audio signaled its intent to release the engine to work with a number of
other game engines, which should be available
sometime in 2014.
• Miles Sound System The Miles Sound
System is one of the most popular pieces of
middleware. It has been licensed for more
than 5,000 games on 14 platforms. Miles is
a sophisticated, robust, and fully featured
sound system that has been around for quite
some time—John Miles first released MSS in
1991 in the early days of PC gaming. Today,
Miles features a toolset that integrates high-level
sound authoring with 2D and 3D digital
audio, featuring streaming, environmental
reverb, multichannel mixing, and highly
optimized audio decoders. Along with all the
usual encoding and decoding options, Miles
uses Bink audio compression for its sound
banks. Bink Audio is close to MP3 and Ogg in
compression size, but is said to use around 30
percent less CPU than those formats.
Other Unity-Based Middleware Along
with Fabric, there has been a slew of audio
middleware toolsets within Unity that are
becoming increasingly available. Though
perhaps not as sophisticated in certain ways as
the larger middleware options, these toolsets
provide a lot of variety and flexibility in
helping non-programmer types to implement
interactive audio behavior in games that don't
require a lot of adaptability.
Chief among these is Master Audio from
Dark Tonic Software. Master Audio features
easily configurable drag-and-drop creation of
sound events with instancing, probabilities, and
randomized pitch and volume. Preset objects
can be set up to trigger sounds on a variety of
basic game conditions, like collision, trigger
messages, etc. Notably, it also features grouping
of sound events into buses (similar to Groups
on a traditional DAW mixer window), as well
as ducking of buses or sounds via other sounds
or buses. This is convenient to create balanced
mixes when lots of loud explosions happen, for
example. The playlist section is also very versatile.
Multiple playlists are supported and accurate
crossfading between tracks of the same length is
easily configured using the new Syncro feature.
Other tools with roughly equivalent features are
the Clockstone Audio Toolkit, SectrAudio, and
SoundManager Pro. Documentation is relatively
clear and concise as well.
This new field of Unity-based middleware
is also producing innovations that some of the
bigger-name developers don't even have yet.
Sectr Audio provides volumetric and spline-based
audio sources. (That's an audio source confined to a non-spherical space or curve.)
Generative audio is provided by the new
G-Audio toolset from Gregzo, as well as the
more retro Sound Generator from DarkArts
Studios, and much more. Do you want a plugin
in your game that will bitcrush any audio
source that is playing? You can get it here, as
well. Look for this field to be expanding. It's an
incredibly fertile place at the moment with lots
of different approaches.
Last but most certainly not least, Unity
updated audio mixing features in the
upcoming Version 5, announced at the 2014
Game Developers Conference. Featuring in-game
mixing, busing, ducking and automation
of mix parameters, it's sure to be a welcome
addition to game audio folks working in Unity
when it hits (hopefully by fall 2014).
Take Control As we have seen here, middleware
engines give today’s sound designer an amazing
degree of control over the way audio behaves and
develops within a game environment. The first
audio middleware programs were developed
in-house and were the proprietary property of
the companies that built them for specific uses
within specific titles. Over the years, third-party
companies have come along to provide audio
engines and code bases for all sorts of platforms
that did not have access to them before. Along
the way, the sophistication and power of these
tools has significantly increased. Nowadays,
middleware puts a lot of this power directly into
our hands, so we can make sure that the sound we
are hearing in the game is exactly what’s intended.
Probably the best news for all of you
budding game audio experts is that most of the
software discussed here can be downloaded
for free, which makes exploring and bringing
yourself up to speed on the latest middleware
programs easy. Make no mistake about it,
understanding audio middleware programs is
a skill that all game audio professionals should
have readily available in their bag of tricks.
This feature was excerpted from The
Essential Guide to Game Audio by Steve
Horowitz and Scott R. Looney, ©2014 Taylor &
Francis Group. All Rights Reserved.
Interactive
learning materials, including the free level outlined in this feature and a companion iOS
app, are available at gameaudioinstitute.com.
BONUS FOR ELECTRONIC MUSICIAN READERS!
For one month only, receive a 25% discount on anything on the site.
Offer is good through September 31, and good for single use only.
Use promo code 'gai-em-mag-25'
Steve Horowitz composed the soundtrack
to the Academy Award-nominated film
Super Size Me, has worked on hundreds of
game titles, and engineered the Grammy-winning
True Life Blues: The Songs of Bill
Monroe. Scott Looney pioneered interactive
online audio courses for the Academy of Art
University, and has taught at Ex’pression
College, Cogswell College, and Pyramind
Training. He is currently researching
procedural and generative sound applications in games, and mastering the art of code.