Immersive Yapping - A Spectral Convolution Plugin for Proximity Chat

Overview

This is an academic capstone project built over the 2025 academic year that implements a convolution reverb plugin targeting 3D game engines like Unreal Engine and Unity. The plugin’s primary use case is applying believable, environment-specific reverb to dynamic audio sources — particularly player voice in VOIP proximity chat systems — where traditional baked-in reverb on static audio files falls short. Recent game titles have begun shipping with systems that apply reverb effects to inherently non-static sounds like player voice chat, and this project explores the DSP and software engineering required to make that possible.

Motivation

Historically, spatialized audio in games relied on simple volume attenuation and reverb pre-baked into source audio files. This works for static environmental sounds but breaks down for dynamic sources like a player speaking into a microphone. Convolution reverb — where an audio signal is mathematically convolved with an impulse response (IR) captured from a real acoustic space — produces far more realistic results, but is computationally expensive. The challenge is making it fast enough to run in real time on streaming audio without introducing perceptible latency.

Technical Approach

The project implements two convolution algorithms that coexist within the plugin, allowing real-time A/B comparison:

Time-Domain Convolution serves as the baseline implementation. It uses a circular delay buffer to compute the direct convolution sum sample-by-sample. While straightforward to implement and verify, its O(N^2) complexity makes it impractical for longer impulse responses.

Frequency-Domain Convolution is the production algorithm, built on the Overlap-Add (OLA) method described in Frank Wefers’s Partitioned Convolution Algorithms for Real-Time Auralization. The input audio block is zero-padded and transformed via FFT, multiplied element-wise with the pre-computed IR spectrum, then inverse-transformed back to the time domain. The overlap tail from each block is carried forward and summed with the next block’s output to produce seamless, artifact-free audio. The key constraint enforced in the implementation is that the FFT size K must satisfy K >= B + N - 1 (where B is the audio block size and N is the IR length), which prevents time-aliasing. With an audio block size of 512 samples, an IR buffer of 128 samples, and an FFT size of 1024, the forward and inverse transforms run in O(N log N) while the frequency-domain multiply is O(N) — a substantial improvement over the time-domain approach.

Implementation

The plugin is written in C++ using the JUCE audio framework and builds as a VST3, Audio Unit, and standalone application via CMake. Impulse response WAV files (captured from real acoustic spaces like an amphitheater and a bedroom) are compiled directly into the plugin binary as embedded resources, eliminating external file dependencies at runtime.

The architecture follows the standard JUCE::AudioProcessor pattern. Both convolver engines are instantiated per audio channel at initialization, and the active algorithm is toggled via an atomic enum — enabling zero-latency switching between frequency and time-domain processing. Thread safety on the audio thread is handled with a spin lock using a try-lock pattern: if the lock is contended during an IR rebuild, the audio thread passes through cleanly rather than blocking. Additional care is taken to suppress denormalized floating-point values that can cause severe performance penalties on x86 processors.

The GUI provides controls for selecting between the two algorithms and choosing from the embedded impulse responses, with real-time status display of the active configuration. Plugin state (algorithm selection, IR index, and mix parameters) is serialized so that DAW sessions restore correctly.

Results and Future Work

The plugin successfully processes audio in real time within a DAW environment (tested in Reaper), with the frequency-domain convolver delivering the expected performance gains over the time-domain baseline. Integration with Unreal Engine — using a multi-room template with a playable character to demonstrate per-environment reverb — represents the next milestone.

An aspirational stretch goal is offloading the convolution core to the GPU using the GPU Audio API. While audio has traditionally been computed on the CPU due to latency sensitivity, modern GPU scheduling tools are beginning to make GPU-based audio processing viable. This would open the door to future extensions including multichannel and ambisonics support, more sophisticated room modeling, and broader game engine integration.

Source Code

The source code for this project is available on GitHub.