The general approach towards storing digital audio formats is to sample the audio voltage (corresponding to a certain position in the membrane of a speaker) in regular intervals (e.g. 44,100 times per second for CDDA or 48,000 or 96,000 times per second for DVD video) and store the value with a certain resolution (e.g. 16 bits per sample in CDDA). Therefore sample rate, resolution and number of channels (e.g. 2 for stereo) are key parameters in audio file formats.
Contents |
Types of formats
It is important to distinguish between a file format and a codec. Even though most audio file formats support only one audio codec, a file format may support multiple codecs, as AVI does.
There are three major groups of audio file formats:
- uncompressed formats, such as WAV, AIFF and AU.
- formats with lossless compression, such as FLAC, Monkey's Audio (filename extension APE), WavPack, Shorten, TTA, Apple Lossless and lossless Windows Media Audio (WMA).
- formats with lossy compression, such as MP3, Vorbis (filename extension OGG), lossy Windows Media Audio (WMA) and AAC.
Lossy audio formats
Lossy file formats are based on psychoacoustic models that leave out sounds that humans cannot or can hardly hear, e.g. a low volume sound after a big volume sound. MP3 is such an example.
As of 2002, one of the most popular audio file formats was MP3, which uses the MPEG-1 audio layer 3 codec to provide acceptable lossy compression for music files. The compression is about 10:1 compared with uncompressed WAV files (in a standard compression scheme), therefore a CD with MP3 files can store about 11 hours of music, compared to 74 minutes of the standard CDDA, which uses uncompressed PCM.
There are many newer audio formats and codecs claiming to achieve improved compression and quality over MP3. Vorbis is an unpatented, free codec. Microsoft has its Windows Media Audio format and Apple the Advanced Audio Coding format. Both are closed source and propietary.
Lossless audio formats
Lossless audio formats (such as TTA and FLAC) provide a compression ratio of about 2:1, sometimes more. In exchange for their lower compression ratio, these codecs don't destroy any of the original data. The means that when the audio data is uncompressed for playing, the sound produced will be identical to that of the original sample. Taking the free TTA lossless audio codec as an example, one can store up to 20 audio CDs one single DVD-R, without any loss of quality. The downside to this method is that this DVD disc will only play on a device that can both read DVDs and decode the chosen codec. This will most likely be a home computer. Although these codecs are available for free, one important aspect of choosing a lossless audio codec is hardware support. It is in the area of hardware support that FLAC is ahead of the competition. FLAC is supported by a wide variety of portable audio playback devices.
One important consideration is in the area of DRM, where FLAC is very clear about being against any copy prevention features of any kind. More importantly this means that you the owner of the computer will not be restricted in using the FLAC files.
Lossless compression of sound is not nearly as widely used outside of professional applications, as lossy compression can provide a much greater data compression ratio with nearly the same apparent quality. Usually, the difference of quality bewteen lossless and lossy audio compression is absorbed by the quality of the hardware, such as headphones, cables connectors or sound speakers.
Uncompressed audio formats
There are many uncompressed data formats. The most popular of them is WAV, probably because it is the default uncompressed format for the Microsoft Windows operating systems. WAV is a flexible file format designed to store more or less any combination of sampling rates or bitrates. This makes it an adequate file format for storing and archiving an original recording. A lossless compressed format would require more processing for the same time recorded, but would be more efficent in terms of space used. WAV, like any uncompressed format, encodes all sounds, whether they are complex sounds or absolute silence, with the same number of bits per unit of time.
Let's take an example. A file contains a minute of a symphonic orchestra playing beautifully followed by a minute of silence. If the sound were stored in an uncompressed audio format, like WAV, the same amount of data would be used for each half. If data were encoded with a lossless audio format, like TTA, the first minute would be a bit smaller than in the WAV file, and the silent half would take almost no disc space at all. But then, recording in the TTA format would require a lot more processing than the WAV.
The WAV format is based on the RIFF file format, which is similar to the IFF format.
BWF (Broadcast Wave Format) is a standard audio format created by the European Broadcasting Union as a successor to WAV. BWF allows metadata to be stored in the file. See: European Broadcasting Union: Specification of the Broadcast Wave Format - A format for audio data files in broadcasting. EBU Technical document 3285, July 1997. This format is the primary recording format used in many professional Audio Workstations used in the Television and Film industry. Stand-alone file based multi-track recorders from Sound Devices, Zaxcom, HHB USA, Fostex, and Aaton all use BWF as their preferred file format for recording multi-track audio files with SMPTE Time Code reference. This standardized Time Stamp in the Broadcast Wave File allows for easy synchronization with a separate picture element.
Multiple channels
Since the 1990s, movie theatres have upgraded their sound systems to surround sound systems that carry more than two channels. The most popular examples are Advanced Audio Coding or AAC (used by Apple 's iTunes) and Dolby Digital, also known as AC-3. Both codecs are copyrighted and encoders/decoders cannot be offered without paying a licence fee. Less common are Vorbis and the recent MP3-Surround codec. The most popular multi-channel format is called 5.1, with 5 normal channels (front left, front middle, front right, back left, back right) and a subwoofer channel to carry low frequencies only (the human ear cannot distinguish where the low frequencies come from).
It is a common misconception that 5.1 Surround sound includes 2 rear speakers. In fact, a 5.1 setup includes what Dolby call Surround speakers, and are actually placed at the sides of the listener. [1] "6.1" setups do however, include a single rear speaker placed at the rear centre, behind the listener - Dolby calls this setup Dolby Digital EX .[2] A 7.1 setup has the usual front 3 (front Left, front Centre, front Right), 2 Surround sound speakers situated to the left and right of the listener, and 2 rear speakers (rear Left and rear Right); with the usual Sub-woofer for bass - Dolby call this .1 / Sub speaker by the term LFE (Low-Frequency Effects).
External links
- Audio File Types Definitions of audio file extensions
- libsndfile, an LGPLd library that can read and write many audio file formats
- [3] - iTunes file format AAC
- BWF-Widget Pro Utility for working with Broadcast Wave Files. Metadata reader/editor and BWF Playback with SMPTE Time Code.