2 Sound Propagation in Virtual Acoustics

Lab Handbook - Assignment 1

This handbook provides the theoretical foundation for understanding sound propagation in virtual acoustics, with a focus on free-field conditions. It serves as an introduction to the fundamental concepts that will be explored throughout the course.

2.1 Table of Contents

Fundamentals of Acoustics
Sound Propagation in Free Field
Spatial Audio Fundamentals
Digital Signal Processing for Virtual Acoustics
Summary and Outlook

In this first assignment, we focus on the simplest case: free-field propagation, where sound travels directly from source to listener without any reflections or obstacles. The rendered scene in this assignment is a emergency car driving by a static listener.

2.2 Fundamentals of Acoustics

2.2.1 Sound Waves and Propagation

Sound is a mechanical wave that propagates through a medium (typically air) as a series of compressions and rarefactions. These pressure variations travel as longitudinal waves, where particle motion is parallel to the direction of wave propagation.

The fundamental equation describing sound propagation is the wave equation:

\frac{\partial^2 p}{\partial t^2} = c^2 \nabla^2 p

where:

p is the acoustic pressure
c is the speed of sound
\nabla^2 is the Laplacian operator

For a point source in free space, the solution to the wave equation yields a spherical wave that propagates outward from the source.

2.2.2 Speed of Sound

The speed of sound in air depends primarily on temperature. A commonly used approximation is:

c = 343.2 \sqrt{\frac{T + 273.15}{293.15}} \text{ m/s}

where T is the temperature in degrees Celsius. At room temperature (20°C), the speed of sound is approximately 343 m/s.

The speed of sound also depends slightly on humidity and atmospheric pressure, but temperature is the dominant factor for most practical applications.

2.2.3 Free Field vs. Reverberant Field

Free field conditions occur when sound propagates without any reflections from surfaces. This is an idealized condition that approximates:

Outdoor environments with no nearby obstacles
Anechoic chambers
Very large spaces where reflections arrive much later than the direct sound

In free field, the direct sound from source to listener is the only component. The sound pressure level decreases with distance according to the inverse distance law (discussed in detail in the section on Sound Propagation in Free Field).

Reverberant field conditions occur when sound reflects off surfaces, creating:

Early reflections (discrete echoes)
Late reverberation (dense, diffuse sound field)

Real-world environments typically contain both direct sound and reflections, but for Assignment 1, we focus exclusively on free-field propagation to establish the fundamental concepts.

2.3 Sound Propagation in Free Field

In free-field conditions, several physical phenomena affect how sound propagates from a source to a listener. Understanding these effects is crucial for realistic virtual acoustics rendering.

2.3.1 Geometric Spreading (Distance Attenuation)

As sound propagates from a point source, it spreads spherically outward. The acoustic energy is distributed over an increasing surface area, causing the sound pressure level to decrease with distance.

For a point source in free space, the sound pressure p at distance r follows the inverse distance law (also called the 1/r law):

p(r) \propto \frac{1}{r}

In terms of amplitude (which is what we typically work with in audio processing):

A(r) = A_0 \frac{r_0}{r}

where:

A(r) is the amplitude at distance r
A_0 is the amplitude at reference distance r_0
r is the distance from source to listener

In decibels, the sound pressure level L_p(r) = 20 \log_{10}(A(r)) at distance r:

L_p(r) = L_p(r_0) - 20 \log_{10}\left(\frac{r}{r_0}\right)

This means that doubling the distance results in a 6 dB decrease in sound pressure level.

Important note: In practice, we often apply a minimum distance constraint (e.g., r \geq 1 m) to prevent the gain from becoming infinite when the source is very close to the listener.

2.3.2 Propagation Delay

Sound travels at a finite speed, so there is a time delay between when sound is emitted at the source and when it arrives at the listener. This propagation delay (also called time-of-flight delay) is given by:

\tau = \frac{r}{c}

where:

\tau is the delay in seconds
r is the distance in meters
c is the speed of sound in m/s

For a distance of 1 meter at room temperature, the delay is approximately 2.9 milliseconds.

In digital signal processing, we convert this to samples:

n = \frac{r}{c} \cdot f_s = \frac{r \cdot f_s}{c}

where f_s is the sampling rate in Hz.

2.3.3 Air Absorption

As sound propagates through air, energy is absorbed due to:

Viscous losses: Friction between air molecules
Molecular relaxation: Energy transfer between translational and internal (rotational/vibrational) modes of air molecules

Air absorption is frequency-dependent: higher frequencies are absorbed more strongly than lower frequencies. This is why distant sounds often sound muffled or “dull.”

The attenuation coefficient \alpha(f) (in dB/m) depends on:

Frequency f: Higher frequencies attenuate more
Temperature T: Affects molecular relaxation processes
Relative humidity h_r: Critical for molecular relaxation
Atmospheric pressure p: Affects the relaxation frequencies

The standard for calculating air absorption is ISO 9613-1, which provides formulas for computing \alpha(f) based on these atmospheric parameters.

For a sound traveling distance r, the total attenuation at frequency f is:

A_{\text{air}}(f, r) = 10^{-\alpha(f) \cdot r / 20}

This means the amplitude is multiplied by A_{\text{air}}(f, r) at each frequency.

Key characteristics of air absorption:

Very low frequencies (< 100 Hz): Minimal absorption
Mid frequencies (1-4 kHz): Moderate absorption, depends on humidity
High frequencies (> 8 kHz): Strong absorption, especially in dry air
The absorption curve typically shows peaks around relaxation frequencies of oxygen and nitrogen molecules

2.3.4 Doppler Effect (Conceptual)

When a source moves relative to the listener, the perceived frequency shifts due to the Doppler effect. However, in Assignment 1, we focus on static listeners with moving sources, and the Doppler effect manifests as:

Frequency shift: Perceived frequency changes as the source approaches or recedes
Time compression/expansion: The waveform appears compressed or stretched

For a source moving at velocity v_s relative to the medium:

f' = f \frac{c}{c - v_s \cos(\theta)}

where \theta is the angle between the source velocity vector and the line connecting source to listener.

In our block-based processing approach, the Doppler effect emerges naturally from the time-varying delay, though we don’t explicitly model frequency shifting in this assignment.

2.4 Spatial Audio Fundamentals

To create convincing virtual acoustics, we must not only simulate how sound propagates but also how it is perceived spatially by the listener.

2.4.1 Coordinate Systems

In virtual acoustics, we typically use a right-handed Cartesian coordinate system:

X-axis: Forward direction (in front of the listener)
Y-axis: Left direction
Z-axis: Up direction

For spatial localization, we also use spherical coordinates:

Azimuth \phi: Horizontal angle, typically measured from the forward direction (X-axis), positive toward the left (Y-axis)
- Range: -\pi to \pi (or -180° to 180°)
- 0°: Forward
- 90°: Left
- -90°: Right
- \pm 180°: Behind
Elevation \theta: Vertical angle from the horizontal plane
- Range: -\pi/2 to \pi/2 (or -90° to 90°)
- 0°: Horizontal plane
- 90°: Above
- -90°: Below
Distance r: Radial distance from origin

The conversion from Cartesian (x, y, z) to spherical (r, \phi, \theta) coordinates is:

r = \sqrt{x^2 + y^2 + z^2}

\phi = \arctan2(y, x)

\theta = \arcsin\left(\frac{z}{r}\right)

2.4.2 Direction of Arrival (DOA)

The Direction of Arrival (DOA) describes the direction from which sound arrives at the listener. For a source at position \mathbf{p}_s and listener at position \mathbf{p}_l, the relative position vector is:

\mathbf{r} = \mathbf{p}_s - \mathbf{p}_l

The azimuth of arrival is:

\phi = \arctan2(r_y, r_x)

where r_x and r_y are the X and Y components of the relative position vector.

2.4.3 Stereo Panning

Stereo panning is a technique for positioning a monaural sound source between two loudspeakers (or in a stereo headphone mix) to create the illusion of spatial position.

2.4.3.1 Linear Panning

The simplest approach is linear panning, where the gain for left and right channels varies linearly with azimuth:

g_L(\phi) = \frac{\phi_{\max} - \phi}{\phi_{\max} - \phi_{\min}}

g_R(\phi) = \frac{\phi - \phi_{\min}}{\phi_{\max} - \phi_{\min}}

However, linear panning has a significant problem: it does not preserve power, leading to perceived loudness changes as the source moves.

2.4.3.2 Power-Preserving Panning

Power-preserving panning (also called constant-power panning) maintains constant total power regardless of pan position. This ensures consistent perceived loudness.

For a source at azimuth \phi, the power-preserving panning gains are:

g_L(\phi) = \sin\left(\frac{\phi + \pi/2}{2}\right)

g_R(\phi) = \cos\left(\frac{\phi + \pi/2}{2}\right)

These gains satisfy:

g_L^2(\phi) + g_R^2(\phi) = \text{constant}

This ensures that the total power remains constant as the source pans from left (\phi = \pi/2) to right (\phi = -\pi/2).

Verification: For \phi = 0 (center), we get g_L = g_R = \sqrt{2}/2, and g_L^2 + g_R^2 = 1.

2.4.4 Beyond Stereo

While Assignment 1 focuses on stereo panning, more advanced spatial audio techniques include:

Binaural rendering: Using Head-Related Transfer Functions (HRTFs) for headphone playback
Ambisonics: Spherical harmonic representation for flexible spatial audio
Vector Base Amplitude Panning (VBAP): Panning across arbitrary loudspeaker arrays
Wave Field Synthesis: Physically accurate sound field reproduction

Some of these topics will be explored in later assignments.

2.5 Digital Signal Processing for Virtual Acoustics

Implementing virtual acoustics requires careful consideration of digital signal processing techniques, especially when dealing with time-varying scenes where source and listener positions change over time.

2.5.1 Block-Based Processing

In real-time audio systems and time-varying virtual acoustics, audio is typically processed in blocks (also called buffers or frames). Common block sizes range from 64 to 1024 samples, with 256 samples being a typical choice.

Advantages of block-based processing:

Efficient computation: Process multiple samples at once
Allows parameter updates between blocks
Matches typical audio hardware buffer sizes
Enables parallel processing

Challenges:

Parameters may change between blocks, requiring smooth transitions
Delay and filter coefficients must be interpolated within blocks
State must be maintained across block boundaries
Choosing the right buffer size: A shorter buffer size increases computational load but reduces latency, while a longer buffer size reduces computational demand but increases latency.

2.5.2 Time-Varying Delays

When a source moves relative to the listener, the propagation delay changes continuously. However, in block-based processing, we can only update the delay value at block boundaries.

Solution: Interpolate the delay within each block to avoid discontinuities.

For a block of size N, if the delay at the start of the block is \tau_0 and at the end is \tau_1, we linearly interpolate:

\tau[n] = \tau_0 + \frac{\tau_1 - \tau_0}{N-1} \cdot n, \quad n = 0, 1, \ldots, N-1

This creates a smooth delay transition that prevents audible clicks or artifacts.

Implementation considerations:

Delay buffers must be large enough to accommodate the maximum expected delay
Fractional delays require interpolation (linear, cubic, or sinc-based)
Delay changes must be smooth to avoid phase discontinuities

2.5.3 Time-Varying Filters

Air absorption filters depend on distance, which changes as sources move. Like delays, filter coefficients must be updated smoothly.

Crossfading approach: Process the signal through two filters simultaneously:

Filter A: Uses the previous block’s coefficients
Filter B: Uses the new block’s coefficients

The outputs are crossfaded:

y[n] = y_A[n] \cdot w_{\text{out}}[n] + y_B[n] \cdot w_{\text{in}}[n]

where w_{\text{out}}[n] fades from 1 to 0 and w_{\text{in}}[n] fades from 0 to 1 over the block.

Alternative approach: Interpolate filter coefficients directly, though this is more complex and may not preserve filter stability.

2.5.4 Filter Design for Air Absorption

Air absorption creates a frequency-dependent attenuation that must be approximated using digital filters. The ISO 9613-1 standard provides the desired frequency response, but we need to design filters that approximate this response.

One-pole filter: A simple first-order IIR filter can approximate air absorption reasonably well for short distances. The filter has the form:

H(z) = \frac{b_0}{1 + a_1 z^{-1}}

The coefficients are designed to match the desired gain at DC (0 Hz) and Nyquist frequency (f_s/2).

Limitations:

One-pole filters are only accurate for short distances
For longer distances, higher-order filters (2nd or 3rd order) provide better accuracy
More sophisticated filter design methods can improve accuracy

Second-Order Sections (SOS): Filters are often represented as cascades of second-order sections, which provides:

Numerical stability
Modular design
Easy parameter updates

2.5.5 Interpolation Techniques

Smooth parameter transitions are crucial for avoiding artifacts. Common interpolation methods include:

Linear interpolation: Simple and efficient, but may cause slight discontinuities in derivatives
Cubic interpolation: Smoother transitions, better for audio quality
Windowed sinc interpolation: Highest quality for delay interpolation, but computationally expensive

For Assignment 1, linear interpolation is sufficient and provides a good balance between quality and computational efficiency.

2.6 Summary and Outlook

2.6.1 Key Concepts Covered

In this handbook, we have introduced the fundamental concepts of sound propagation in virtual acoustics:

Free-field propagation: Sound travels directly from source to listener without reflections
Distance attenuation: The 1/r law describes how amplitude decreases with distance
Propagation delay: Time delay due to finite speed of sound
Air absorption: Frequency-dependent attenuation that increases with distance
Spatial audio: Power-preserving stereo panning for directional perception
Block-based processing: Efficient handling of time-varying scenes

2.6.2 The Complete Free-Field Rendering Pipeline

For a moving source in free field, the rendering process involves:

Compute distance r from source to listener
Apply distance gain: g_{\text{dist}} = \min(1/r, 1)
Apply propagation delay: \tau = r/c
Design air absorption filter based on distance r
Apply air absorption filter to the delayed signal
Compute direction of arrival \phi from relative position
Apply stereo panning gains g_L(\phi) and g_R(\phi)

All of these steps must be updated smoothly as the source moves, using block-based processing with interpolation.

2.6.3 Limitations of Free-Field Model

The free-field model is an idealization. Real-world acoustic environments include:

Early reflections: Discrete echoes from walls and objects
Late reverberation: Dense, diffuse sound field from many reflections
Diffraction: Sound bending around obstacles
Scattering: Sound interacting with rough surfaces
Doppler effect: Frequency shifts from moving sources (partially addressed)

These phenomena will be explored in later assignments.

2.6.4 Next Steps

Assignment 1 will guide you through implementing the free-field rendering pipeline. You will:

Implement power-preserving stereo panning
Apply distance attenuation and propagation delay
Design and apply air absorption filters
Process a dynamic scene with a moving source

Future assignments will build upon these fundamentals to include:

Assignment 2: Binaural rendering, early reflections, and reverberation
Later assignments: Advanced spatial audio techniques, room acoustics modeling, and more sophisticated propagation models