A Streamlit web application for frequency-domain analysis of audio signals, built from scratch using NumPy, SciPy, Matplotlib, and Plotly. The project covers the full pipeline from signal framing through spectral feature extraction, spectrogram generation, formant analysis, and fundamental frequency estimation via cepstrum.
The audio signal is divided into short overlapping frames (typically 20–40 ms), within which the signal can be treated as approximately stationary. Each frame is then processed independently.
Before applying FFT, a window function is applied to each frame to reduce spectral leakage — the artifact caused by treating a finite-length signal as periodic. Five windows are implemented: rectangular, triangular, Hamming, Hann, and Blackman.
The frequency spectrum of each frame is computed via the Fast Fourier Transform. The magnitude spectrum can be displayed on a linear or logarithmic (dB) scale.
Four frame-level features are extracted from the magnitude spectrum:
Frequency Centroid — the "center of gravity" of the spectrum, related to perceived brightness:
Effective Bandwidth — the weighted standard deviation around the centroid:
Spectral Flatness Measure (SFM) — ratio of geometric mean to arithmetic mean of the power spectrum. Values close to 1 indicate noise-like signals; values close to 0 indicate tonal signals:
Spectral Crest Factor (SCF) — ratio of the peak to the mean of the power spectrum, measuring how "spiky" the spectrum is:
Energy Ratio in Subbands (ERSB) — the fraction of total spectral energy contained in each of four frequency bands (0–630 Hz, 630–1720 Hz, 1720–4400 Hz, 4400+ Hz):
The spectrogram is computed by applying the FFT to each overlapping frame and stacking the resulting magnitude spectra into a 2D time-frequency representation. It is rendered interactively using Plotly, allowing zoom and hover inspection of exact frequency values.
Formants are the resonant frequencies of the vocal tract, visible as peaks in the smoothed magnitude spectrum of voiced speech. The application detects them by smoothing the spectrum with a Savitzky-Golay filter and finding peaks below 5000 Hz.
The real cepstrum is defined as the inverse FFT of the log magnitude spectrum:
The fundamental frequency F0 is estimated by locating the peak of the cepstrum within the quefrency range corresponding to a plausible F0 range (50–400 Hz):
The application also compares this estimate against autocorrelation and AMDF-based methods.
- NumPy — vectorized signal processing
- SciPy — WAV file loading, peak detection, filtering
- Matplotlib — static visualizations
- Plotly — interactive spectrogram
- Streamlit — web interface
pip install -r requirements.txt
streamlit run app.pyUpload a WAV file and explore the analysis tabs: Frame Analysis, Frequency Analysis, Spectrogram, Cepstrum Analysis, and Statistics & Export.