The Science of Audio in 2005 – Spatial Perception

James Johnston


In the course of time, we have moved from one audio channel (mono) to two audio channels (usually called “stereo”, although not what was originally intended by the term stereophonic, which meant “solid sound” rather than “two channels”. Due both to market pressures relating to the delivery of more than two channels, as well as marketing efforts of the “we only have two ears” variety, for many years two-channel audio has reigned supreme, and in fact many audio enthusiasts insist that 2 channels is the way to go.

It has been shown, however, as far back as Fletcher and Snow, that in fact 3 front channels have a substantial advantage over two channels in terms of the subject’s ability to perceive depth of the soundfield in the center of the stereo soundstage. What’s more, it has been shown that 2 side channels can enhance the sense of envelopment in a space, and that 2 back channels can provide the sensation of depth behind the listener.

In this talk, we will briefly review the operation of the human ear, and then discuss what it can, and what it can not, detect in terms of spatial and distance perception, and move on to discuss some potential for research.


Presently, JJ is working on new and interesting things in the Codecs group at Microsoft Corporation.

He was the primary researcher and algorithm inventor/designer for AT&T’s audio coding (bitrate reduction) effort. His principle research efforts involved perceptual modelling of audio (and video, currently inactive), audio coding, audio soundfield perception and presentation, and standards and ancillary mathematics and science related to audio issues. His last audio coding product at AT&T is better known as the MPEG-2 AAC (Advanced Audio Coding) standard, developed in collaboration with Fraunhofer IIS and other experts in the field of audio compression.

His current interests include filter design, speech coding, audio and speech testing methodology and execution, and implementation concerns in audio processing. His most recent work at AT&T was in designing a perceptual soundfield reconstruction system to capture the “sound” of an actual performance venue and reconstruct the perceptual cues of the venue in a fashion that can be conveyed in a small (presently 5) number of conventional, independent audio channels.

