How it worksVX-2026 · DSP v1.0

The science behind every reading.

Voxavia's audio engine is the moat. Every parameter is calibrated against gold-standard references — Praat, MDVP, and published clinical norms — so the numbers stand up to scrutiny.

app.voxavia.com / scope
SPECTRUM
Spectrum · Hz
● Live
2 870Hz
F1 · F2 · F3200 — 4000 Hz
Range
A2 — F♯5
On target
86%
Stability
Good
Take
0:08
Capture pipeline

From microphone to measurement.

Voxavia's pipeline is decoupled from the UI so the same primitives power every drill. Every frame is gated for clarity before it counts.

01

Permission

Microphone capture via getUserMedia. Browser DSP — echo cancellation, noise suppression, AGC — is deliberately disabled because those filters distort the time-domain signal that YIN-based pitch detection relies on.

02

Pipeline

MediaStreamSource → AnalyserNode (fftSize 2048) → per-frame Float32 → DSP module → emitted sample. Same plumbing across every feature.

03

Validity gate

A pitch frame counts only when clarity ≥ 0.9 and 60 Hz ≤ F0 ≤ 1500 Hz. Below that, the frame is recorded as a gap — which keeps glissando and vibrato traces honest.

04

Throttling

React state updates at ~10 fps for live readouts. The graph canvas redraws at 60 fps independently, reading directly from a ref-backed sample buffer so React state never enters the hot path.

Design decision

Why we turn off AGC, AEC, and noise suppression.

Most browsers default to applying automatic gain control, echo cancellation, and noise suppression on microphone capture. They're great for video calls, but they smear the time-domain signal that pitch detection and voice-quality DSP rely on.

Re-enabling them would silently corrupt every measurement Voxavia produces. So we disable them at the getUserMedia constraints level and lean on signal-design choices — calibration, SNR gating, validity windows — to keep noisy rooms honest.

DSP modules

The library underneath.

Built once, used everywhere. Every module emits a stream of measurements that feed drills, reports, and personal-best tracking.

Pitch (Pitchy / YIN)

DSP

F0, clarity. The foundation of every pitch-driven feature.

CPPS

DSP

Cepstral peak prominence smoothed — the most modern voice-quality scalar.

Jitter

DSP

Cycle-to-cycle variation in pitch period.

Shimmer

DSP

Cycle-to-cycle variation in amplitude.

HNR

DSP

Harmonics-to-noise ratio.

Formants (LPC)

DSP

F1 / F2 / F3 every ~200 ms. Drives vowel-space plotting and singer's-formant scoring.

Vibrato

DSP

Rate (Hz), extent (cents), regularity.

Stability

DSP

Tremor, drift, deviation on sustained pitches.

Messa-di-voce shape fit

DSP

How well a crescendo→decrescendo amplitude curve matches the target shape.

Passaggio detection

DSP

Break-zone clustering across glide segments.

LTAS

DSP

Long-term average spectrum slope.

MFCC

DSP

Mel-frequency cepstral coefficients (used by accent and vowel-shape work).

GNE

DSP

Glottal-to-noise excitation ratio.

Subharmonic ratio

DSP

Diplophonia / period-doubling detector.

Spectral tilt

DSP

High-frequency rolloff; pairs with mic-distance work.

Singer's formant

DSP

FFT energy ratio in 2.5–3.6 kHz vs the broader band.

Sibilance

DSP

4–10 kHz band ratio + spectral centroid.

Speaking rate

DSP

Approximate words/syllables per minute via onset detection.

Filler detection (beta)

DSP

Detects um, uh, and similar fillers in continuous speech.

Mic calibration

DSP

Per-device dB offset; pink-noise reference.

Tone player

DSP

Reference-tone synthesis for matching, harmony, and ear-training drills.

Onset detection

DSP

RMS-envelope onset times — feeds DDK, melodic dictation, and rate.

Key suggestion

DSP

Maps a stored vocal range to comfortable song keys.

Fatigue index

DSP

Composite scalar over recent vs baseline jitter, HNR, range, and load.

Reference standards

Whose shoulders we stand on.

The literature and tools we calibrate against. The list is short on purpose — we choose published, widely cited references over novel methods.

YIN

De Cheveigné & Kawahara, 2002. Time-domain autocorrelation pitch detector — Voxavia uses the Pitchy implementation.

Praat

Boersma & Weenink. The open-source phonetics tool we calibrate against.

MDVP

Multi-Dimensional Voice Program. Long-standing clinical norms for jitter, shimmer, HNR.

GRBAS — Hirano 1981

Perceptual self-rating: Grade · Roughness · Breathiness · Asthenia · Strain.

RSI — Belafsky 2002

9-item Reflux Symptom Index for laryngopharyngeal reflux.

VHI — Jacobson 1997

Voice Handicap Index — patient-reported impact.

Voxavia · 2026app.voxavia.com

Trust, but verify.

Open the app, sing a vowel for ten seconds, and see the numbers. If you're a researcher who'd compare them against your own toolchain, get in touch.

No account · No installDSP v1.0 · 2026