Most Popular


The EM Poll




Current issue highlights

Subscribe to EM
Purchase back issues of EM

browse back issues


Follow Us On...




Making Waves

Jun 1, 2009 3:58 PM, By Peter Hamlin



         Subscribe in NewsGator Online   Subscribe in Bloglines
 

GET UP CLOSE AND PERSONAL WITH SOUND FILE FORMATS

CURRENT NEWSSTAND ISSUE

Read the full Table of Contents for the issue on sale now! Click here

Subscribe for only $1.84 an issue!

Please tell us about yourself so we can better serve you. Click here to take our user survey.

MixBooks Logo
Life in the Fast Lane

This collection of St.CroixŐs columns was assembled during the two years following his death of cancer in May 2006. Included are many of his most-read columns, as well as personal notes, drawings and photographs.

Click for more books
EM Podcasts

Listen to these latest podcasts and more:
Bela Fleck on recording Jingle All the Way.Go

What's New: software and sound products. Go

eDeals Newsletter for Discounts on Gear

Get First Dibs on Hot Gear Discounts, Manufacturer Close-Outs and Job Opportunities when you sign up to receive eDeals E-newsletter, sent twice a month. Check out an issue get advertising info or subscribe

Everyone who works with digital audio soon encounters a wide variety of sound-file formats: WAV, AIFF, SND, Sound Designer I and II, and MP3, to name just a few. In most cases, the different formats present few problems. Software simply opens, plays, edits, and saves the audio files, sparing you from knowing the details of exactly how each format is constructed. But how does a program know what type of data the file contains? And what exactly is in an audio file?

A digital sound file is basically a long list of numbers representing the momentary values of an analog waveform measured (sampled) at a periodic rate. A file containing just those numbers is called a raw-data sound file. Usually, a lot more information must be embedded in a file for it to be read and played back properly. Aside from the sampling rate, the necessary information includes the resolution (which is the number of binary digits, or bits, that represent each sample). Other information indicates whether the file is monaural or stereo and whether the file creator has included looping information and cue points, a title, the name of the engineer or composer, a copyright notice, or other similar text.

That kind of information is included in a header, typically found at the beginning of a file. Different types of files have headers that are configured in distinct ways. (A raw-data sound file is also known as a headerless file.) For an application to read or write sound in a particular format, it must understand how the data is organized in that format.

PLAYING DIGITAL RIFFS

As an example, here's a close look at the familiar WAV sound-file format. A WAV file is a type of Resource Interchange File Format (RIFF) file, a format developed by Microsoft and IBM for multimedia files. (The familiar AVI video format is another type of RIFF file.) WAV files have been in use since Windows 3.1 and so are very widespread.

A WAV file is divided into sections, or chunks, that contain certain prescribed information. It's a more flexible arrangement than having just a single header. At the beginning of the file, a RIFF chunk defines the data as a WAV file and also reports its total length. Embedded within the RIFF chunk are two other chunks: a format chunk with information about sampling rate, resolution, number of channels, type of coding, and so on; and a data chunk, in which the actual sample values are stored.

FIG. 1: This is a binary-file view of a simple WAV file containing a single cycle of a sine wave. The color coding indicates the different chunks of data in the file. All of the numbers on the left are in hexadecimal, and each 8-bit byte of data is represented as two hexadecimal digits. On the far right, the same data is reproduced in ASCII code, so you can see any text embedded in the file. When the data is not text, you just see gibberish in that column.

FIG. 1: This is a binary-file view of a simple WAV file containing a single cycle of a sine wave. The color coding indicates the different chunks of data in the file. All of the numbers on the left are in hexadecimal, and each 8-bit byte of data is represented as two hexadecimal digits. On the far right, the same data is reproduced in ASCII code, so you can see any text embedded in the file. When the data is not text, you just see gibberish in that column.

To examine the format, I created a simple WAV file with my audio editor. The file contains a single cycle of a mono 2,205 Hz sine wave, synthesized at a 44,100 Hz sampling rate with 16-bit resolution. After saving the file, I displayed the data in the standard binary file-viewing format shown in Fig. 1. (There are many binary file-viewing utilities, including one called debug that is part of DOS. For the display in Fig. 1, I used Helios Software Solutions' TextPad, which lets you view files in many formats, including binary.) Color coding is added to differentiate each chunk.

GOOD TO THE LAST BYTE

Notice in Fig. 1 that the RIFF-chunk header information is found in the first 12 bytes of the file (highlighted in pink). Every pair of numbers represents a unique byte; in the table “Interpretation of Data in a WAV File,” a space between each byte shows how the data is organized. You can see the meaning of each byte in the table. Note also that the file data is in hexadecimal (hex) format. (If you're not familiar with hex, see the sidebar, “All About Numbering Systems.”)

The first four bytes in the file (the hexadecimal numbers 52, 49, 46, and 46) represent the ASCII characters for the acronym “RIFF,” which denotes the format type. (ASCII characters are numbers that represent letters of the alphabet. See the Value column at the far right of the table.) The next four bytes (4E 00 00 00) indicate the total number of bytes of data in the file after the first eight bytes of the header. This four-byte integer is in a format called little-endian, which means that the least significant bytes come first when the computer lists them byte by byte. That takes some getting used to, because the string of bytes actually appears in the opposite order than you'd expect. In other words, the four bytes, 4E 00 00 00, signify the hexadecimal number 0×0000004E, which can be shortened to 4E or 0×4E. (The 0x prefix is often used to indicate that the number is in hexadecimal format.)

Interpretation of Data in a WAV File
NUMBER OF BYTES FILE DATA (IN HEX) INTERPRETATION VALUE
4
52 49 46 46 ASCII characters identifying file as a RIFF file “RIFF”
4
4E 00 00 00 total size of file minus 8 bytes of header 0×4E (hex) or 78 (decimal)
4
57 41 56 45 ASCII characters identifying file as WAV file “WAVE”

(The term little-endian in computer lingo is taken from Jonathan Swift's Gulliver's Travels. At one point in the story, the Lilliputians are divided into two warring political camps: the Little-Endians, who believe you should first crack a soft-boiled egg on the little end; and the Big-Endians, who believe the opposite. The computer term big-endian, as you would guess, means numbers are listed with the most significant digits first.)

The next four bytes (57 41 56 45) are the ASCII characters “WAVE”; they tell any application reading the file that this is WAV-audio format and not one of the other possible RIFF multimedia file types.

The next 24 bytes of data in Fig. 1 (shown in blue) represent the format chunk, where several of the file's important characteristics are coded. This segment begins with the bytes 66 6D 74 20. The first three bytes of this string are the ASCII symbols for “fmt,” and the “20” indicates a space, which just fills out this segment so that it takes up a full four bytes. The next four bytes (10 00 00 00) indicate the length of the format chunk. That value is hex 0×10, or decimal 16. The table “Format-Chunk Data” shows how the format data is arranged. As before, the second column shows the exact sequence of numbers in hex as they appear in the file.

Notice that the Type of Coding is PCM (Pulse Code Modulation). PCM is a common uncompressed-audio data format. Other possibilities for coding include µ-law (pronounced mu-law, designated by the number 0×0101) and a-law (0×0102). Both are methods of scaling the sample data to try to minimize the audible quantization noise. (Quantization noise is a rounding error that occurs when you translate analog audio information into the more limited realm of digital numbers. If you use large enough digital data words, the quantization noise can be made so small that it does not cause audible problems.)

Another coding technique you may encounter is ADPCM (Adaptive Delta Pulse Code Modulation), which is designated by the number 0×0103. Interested programmers can find the exact formulas for those coding techniques on the Internet. (I'll deal only with uncoded PCM data here.)

Get Copyright ClearanceWant to use this article? Click here for options!
© 2010 Penton Media, Inc.



Acceptable Use Policy
blog comments powered by Disqus

Back to Top