Wavetable Synthesizer

Overview

This project is a real-time, MIDI-controlled wavetable synthesizer built in C++. It takes a pre-recorded waveform stored in a WAV file and plays it back at different speeds to produce pitched musical notes, a technique known as wavetable synthesis. A MIDI controller (such as a keyboard) drives the instrument in real time: pressing keys triggers notes, the pitch wheel bends them, and the modulation wheel adds vibrato. Audio output is rendered sample-by-sample through a PortAudio callback, while PortMIDI handles incoming MIDI events on a separate thread. The synthesizer supports two-voice polyphony with ADSR amplitude envelopes, giving each note a natural attack-sustain-release shape. My contribution was the WavetableSynth and a supporting library classes (AudioData, Resample, ADSR, MidiIn) that provide the synthesis engine, MIDI event handling, and voice management logic; a driver was also provided by the course material.

Wavetable Playback and Pitch Control

At the heart of the synthesizer is a single-cycle waveform that is provided by the user. Rather than generating waveforms mathematically, the synth reads this recorded sample and loops through a specific region of it (samples 10070–17439) to sustain a tone. To produce different pitches, the playback rate is scaled relative to A4 (440 Hz) using the standard equal-temperament formula:

frequency = 440 * 2^((note - 69) / 12)
speedup   = (frequency / 440) * (44100 / sample_rate)

A Resample object handles the variable-rate playback, interpolating between samples so that non-integer step sizes still produce smooth output. This approach means that a single stored waveform can cover the entire MIDI note range simply by reading through it faster or slower.

Voice Allocation and Polyphony

The synthesizer supports two simultaneous voices, each backed by its own Resample reader and ADSR envelope. When a MIDI Note On event arrives, the allocator searches for an open slot. If both slots are occupied, it attempts note reclamation: a voice whose envelope has decayed below -48 dB is considered silent enough to steal. As a last resort, the voice playing the oldest (lowest-numbered) note is stolen outright. Each voice tracks its assigned MIDI note number so that the corresponding Note Off event can correctly release it by transitioning its envelope into the release phase. This keeps the output clean—notes fade out naturally rather than cutting off abruptly, even when the player exceeds the two-voice limit.

ADSR Envelope

Every voice is shaped by an ADSR (Attack, Decay, Sustain, Release) envelope generator that modulates amplitude over time. The envelope is configured with an attack time of 0.1 seconds, decay and release times of 0.69 seconds, and a sustain level of 0.9. On a Note On event the envelope resets and ramps up through attack and decay to the sustain level, where it holds until the key is released. On Note Off, the envelope enters its release phase and fades to silence. The envelope’s current value is multiplied into the output each sample, giving notes an organic volume contour rather than a flat, gated sound. The -48 dB threshold used during voice reclamation ties directly into the envelope: a voice is only considered available for stealing once its release tail has become essentially inaudible.

Real-Time MIDI Expression

Beyond basic note playback, the synthesizer responds to three continuous MIDI controllers to add expressiveness. The pitch wheel maps its full range to +/-200 cents (two semitones), allowing smooth pitch bends between notes. The modulation wheel (CC#1) controls vibrato depth, also scaled to a maximum of +/-200 cents, applied through a 1 Hz sine-wave LFO that continuously modulates pitch. These two offsets are summed and fed into each voice’s Resample object every sample, so bends and vibrato interact naturally. Finally, volume (CC#7) scales the master output level from 0 to unity. All three controllers update shared state that affects every active voice simultaneously, and a fixed 0.3 mix factor at the output stage prevents clipping when both voices sound at full amplitude.