User:Fhkjshfk/sandbox

'3D sound synthesis' has been a hot topic in the past few years due to it's expanding application in a lot of fields like games, home theaters and human aid systems.And 3D sound is everywhere in our daily life.Which makes it seems a easy thing to do but in fact very hard.Also knowing how to compose 3D sound can give us a better understanding of it.

Motivation

Since 3D sound is everywhere in our daily life.And it has rich cues about the environment.It can give us the information about the location of something and even whether a material is soft or hard under some specific conditions.So being able to synthesize 3D sound can help us a lot in understand and use the information in 3D sound.

Applications

There are plenty applications of 3D sound synthesis such as teleconferencing system, tele-ensemble system, producing more realistic environments and sensations in some conventional applications like TV phone. Also knowing how to compose 3D sound can help us improve the performance of other applications of 3D sound like 3D sound localization.

Problem Statement & Basics

Humans use auditory localization cues to help locate the position in space of a sound source. There are eight sources of localization cues: interaural time difference, head shadow, pinna response, shoulder echo, head motion, early echo reponse, reverberation, and vision. The first four cues are considered static and the others dynamic. Dynamic cues involve movement of the subject's body, affecting how sound enters and reacts with the ear.

Methods

There are many different methods proposed for 3D sound synthesis.Three methods will be mainly introduced in the following content.

The head-related transfer function for 3D sound synthesis
Sound rendering as a method for 3D sound synthesis
Synthesizing 3D sound with speaker location

The head-related transfer function for 3D sound synthesis

In synthesizing accurate 3D sound, attempts to model the human acoustic system have taken binaural recordings one step further by recording sounds with tiny probe microphones in the ears of a real person. These recordings are then compared with the original sounds to compute the person's head-related transfer function (HRTF). The HRTF is a linear function that is based on the sound source's position and takes into account many of the cues humans used to localize sounds, as discussed in the previous section. The HRTF is then used to develop pairs of finite impulse response (FIR) filters for specific sound positions; each sound position requires two filters, one for the left ear, and one for the right. Thus, to place a sound at a certain position in virtual space, the set of FIR filters that correspond to the position is applied to the incoming sound, yielding spatial sound. The computations involved in convolving the sound signal from a particular point in space are demanding. Refer to [BURGESS92] for details on these sound computations. The point to recognize is that the computations are so demanding that they currently cannot be performed in real-time without special hardware. To meet this need, Crystal River Engineering has implemented these convolving operations on a digital signal processing chip called the Convolvotron.

Sound rendering as a method for 3D sound synthesis

Sound rendering is a technique of generating a synchronized soundtrack for animations. This method for 3D sound synthesis creates a sound world by attaching a characteristic sound to each object in the scene. Sound sources can come from sampling or artificial synthesis. The sound rendering technique functions in two distinct passes. The first pass calculates the propagation paths from every object in the space to each microphone; this data is then used to calculate the geometric transformations of the sound sources as they correlate to the acoustic environment. The transformations are made up of two parameters, delay and attenuation. In the second pass, the sound objects are instantiated and then modulated and summed to generate the final sound track. Synchronization is inherent in the use of convolutions that correspond to an objects position with respect to the listener [TAKALA92]. The rendering technique deals with a sound source as a single dimensional signal that has an intensity over time. This is a rather simple approach to the more traditional Fourier transform representation (HRTF generation). The technique exploits the simularity of light and sound to provide the necessary convolutions. A sound source in space propagates sound waves in all directions just as a light source does. As in light, sounds waves can be reflected and refracted due to the acoustic environment. A sound wave interacts with many objects in the environment as it makes its way to the listener. The final sound that the listener hears is the integral of the signals from the multiple simultaneous paths existing between the sound source and the listener. The rendering algorithm cannot provide a continuous analysis of this function and therefore must break it up into discrete calculations to compute sound transformations.

The actual sound rendering process is a pipelined process made up of 4 stages. The first stage is the generation of each object's characteristic sound (recorded, synthesized, modal analysis-collisions). The second stage is sound instantiation and attachment to moving objects within the scene. The third stage is the calculation of the necessary convolutions to describe the sound source interaction within the acoustic environment. In the last stage the convolutions are applied to the attached instantiated sound sources. This process is demonstrated in |Figure SOUND3D.jpg| from [TAKALA92].

The convolution calculation process of this pipelined process also deals with the effect of reverberation. This is an auditory cue that can lead to better spatial perception. The mathematical description of reverberation is a convolution with a continuous weighting function. This is really just multiple echos within the sound environment. The sound rendering technique approximates this by capitalizing on the fact that the wavelength of the sound is similar to that of the object, and thus, is diffuse in its reflections. Sound diffraction allows sound to propagate around an object - this has a "smoothing" affect of the sound. These observations allow the technique to use a simplified sound tracing algorithm. The simplified sound tracing algorithm is beyond the scope of this article, for more information please consult [TAKALA92].

This method handles the simplicity of an animated world that is not necessarily real-time; it is unclear how this method would work in a real-time virtual reality application. However, its simularity to ray-tracing and its unique approach to handling reverberation are a noteworthy aspects.

Synthesizing 3D sound with speaker location

Still other efforts at meeting the real-time challenges of 3D sound synthesis have involved using strategically placed speakers to simulate spatial sound. This model does not attempt to simulate many of the human localization cues, instead focusing on attaching sampled sounds to objects in 3D space. Visual Synthesis Incorporated's Audio Image Sound Cube uses this approach with eight speakers to simulate spatial sound. The speakers can be arranged to form a cube of any size; two speakers are located in each corner of the cube, one up high and one down low. Pitch and volume of the sampled sounds are used to simulate sound location; volume is distributed through the speakers appropriately to give the perception of a sound source's spatial location. This solution gives up the accuracy yielded by convolving sound as in the previous two approaches, but effectively speeds up processing by losing the computational demands involved in the other approaches, allowing for much less expensive real-time spatial sound.

References

[BEGAULT90]: Begault, Durand R. "Challenges to the Successful Implementation of 3-D Sound", NASA-Ames Research Center, Moffett Field, CA, 1990. [BEGAULT92]: Begault, Durand R. "An Introduction to 3-D Sound for Virtual Reality", NASA-Ames Research Center, Moffett Field, CA, 1992.

[BURGESS92]: Burgess, David A. "Techniques for Low Cost Spatial Audio", UIST 1992.

[FOSTER92]: Foster, Wenzel, and Taylor. "Real-Time Synthesis of Complex Acoustic Environments" Crystal River Engineering, Groveland, CA.

[SMITH93]: Stuart Smith. "Auditory Representation of Scientific Data", Focus on Scientific Visualization, H. Hagen, H. Muller, G.M. Nielson, eds. Springer-Verlag. 1993.

[STUART92]: Stuart, Rory. "Virtual Auditory Worlds: An Overview", VR Becomes a Business, Proceedings of Virtual Reality 92, San Jose, CA, 1992.

[TAKALA92]: Takala, Tapio and James Hahn. "Sound Rendering". Computer Graphics, 26, 2, July 1992.