ser is a Python package and CLI for speech emotion recognition from audio.
Core capabilities:
- Emotion prediction from audio files.
- Profile-based inference lanes:
fast(default),medium,accurate,accurate-research. - Optional transcript extraction and timeline-style output.
graph TD;
A[Audio Input] --> B[Preprocessing and Features];
B --> C[Profile Runtime Backend];
C --> D[Emotion Prediction];
A --> E[Transcript Extraction];
D --> F[Timeline Integration];
E --> F;
F --> G[Output];
sequenceDiagram;
participant U as User
participant C as CLI
participant R as Runtime Profile
participant O as Output
U->>C: ser --file audio.wav --profile medium
C->>R: Load matching artifact and backend
R->>O: Predict labels and timestamps
O-->>U: Emotion result (+ transcript if enabled)
From PyPI:
python -m pip install serFrom source:
git clone https://github.com/jsugg/ser/
cd ser
./scripts/setup_compatible_env.shRequirements:
- Python
3.12or3.13 ffmpegonPATH
Optional dependency groups:
python -m pip install "ser[medium]"formediumandaccurateprofiles.python -m pip install "ser[full]"foraccurate-research.ser[full]is the superset extra and installs dependencies required to run all profiles (fast,medium,accurate,accurate-research) on supported platform/version combinations.
fastis the default profile.medium,accurate, andaccurate-researchare opt-in profiles.mediumandaccuraterequiretransformersdependencies (ser[medium]orser[full]).accurate-researchrequiresser[full]and restricted-backend consent.
Darwin Intel policy shorthand:
darwin-x86_64-macos13-python3.12-> full-profile support.darwin-x86_64-macos13-python3.13-> partial support (fast profile only).
GitHub-hosted workflows use macos-15 because macos-13 hosted runners are not available.
ser --file sample.wav
ser --file sample.wav --profile medium
ser --file sample.wav --profile accurate
ser --file sample.wav --profile accurate-researchser --train
ser --train --profile medium
ser --train --profile accurate
ser --train --profile accurate-researchProfile selection during predict is strict: use an artifact trained for the same profile/backend.
When running from a source checkout without activating an environment, prefix commands with uv run.
If your change touches ser/api.py, ser/_internal/api/*, or ser/__main__.py, run:
make import-lint
uv run pytest -q tests/test_import_lint_policy.py tests/test_api_import_boundary.py tests/test_api.py tests/test_cli.py- Libraries and Frameworks: Special thanks to the developers and maintainers of
librosa,openai-whisper,stable-whisper,numpy,scikit-learn,soundfile,tqdm, and for their invaluable tools that made this project possible. - Datasets: Gratitude to the creators of the RAVDESS and Emo-DB datasets for providing high-quality audio data essential for training the models.
- Inspirational Sources: Inspired by Models-based representations for speech emotion recognition
- Architecture guide: docs/architecture.md
- Contributor guide: CONTRIBUTING.md
- Compatibility details: docs/compatibility-matrix.md
- Hardware validation workflows: docs/ci/hardware-validation.md
- Architecture decisions: docs/adr
- License: LICENSE
