Simple Custom Audio Engine

Overview

This semester long project was aimed at creating a lightweight, real-time audio engine built in C++ for game applications. The engine handles the full audio pipeline from loading WAV files off disk to mixing and rendering multiple simultaneous sounds through the system’s audio hardware. Built on top of the miniaudio library for cross-platform device access, SmolAudio provides a clean public API for loading sounds, managing voices, controlling playback parameters, and spatializing audio in 3D space. The project targets macOS, Windows, and Linux, and is built with CMake using C++17.

WAV File Loading and Decoding

The engine includes a custom WAV file parser that reads standard RIFF/WAVE files from disk. The loader parses the RIFF header, extracts format metadata from the fmt chunk (sample rate, channel count, bit depth), and reads raw audio from the data chunk. It supports both 8-bit and 16-bit PCM formats and handles mono and stereo files, converting all audio to normalized floating-point samples for internal use. Loaded audio is stored in a Sound resource object that the rest of the engine references during playback. The loader includes error handling for malformed headers, unsupported formats, and missing files.

Voice Pool and Playback

Playback is managed through a pool of 32 pre-allocated voice objects. When the engine’s playSound() method is called, a voice is claimed from the pool and assigned the requested sound. Each voice independently tracks its playback position within the sound’s sample data, along with per-voice parameters like volume, stereo panning, and loop state. When a sound finishes or is explicitly stopped, its voice is returned to the pool. The engine implements a voice stealing strategy for when the pool is exhausted, reclaiming the oldest or quietest active voice to ensure new sounds can always be triggered.

Mixer and Audio Callback

The mixer is the engine’s central rendering component. Each audio callback cycle, the mixer iterates over all active voices, calls each voice’s render() method to produce a block of samples, and sums the results into an interleaved stereo output buffer with a master volume applied. The output buffer is then delivered to the platform audio device. Voices that reach the end of their sound data are automatically deactivated, while looping voices wrap their playback position back to the start. The callback runs on a dedicated audio thread provided by the miniaudio backend, so all mixing occurs with minimal latency.

3D Positional Audio

The engine includes a spatial audio system for positioning sounds in a 3D environment. A listener struct defines the player’s position and orientation in world space, while each voice can be assigned a source position. The spatializer computes distance-based attenuation so that farther sounds are quieter, and calculates angle-based stereo panning so that sounds to the left or right of the listener are placed accordingly in the stereo field. This spatial processing is integrated directly into the voice rendering path, making it straightforward to create immersive audio scenes with directional sound cues.

Interactive Demo Application

An interactive demo application showcases the engine’s capabilities in a practical context. The demo initializes the audio engine, loads a set of WAV files, and provides a simple interface for triggering sounds. It demonstrates core features including simultaneous playback of multiple sounds, real-time volume and panning adjustments, looping ambient audio, and 3D positional audio with moving sound sources. The demo serves both as a validation tool during development and as a reference for how to integrate the engine into a larger application.