Speech AI#
Scraping and generating audio data.
STT (Speech-to-Text)#
OpenAI Whisper is the open-source state-of-the-art. It handles transcription and translation across dozens of languages.
- Tip: Use
faster-whisperfor massive speedups on CPU/GPU.
TTS (Text-to-Speech)#
- ElevenLabs: Incredible realism, supports voice cloning. API is paid.
- OpenAI TTS: Fast, high quality, 6 standard voices.
- Parakeet / Piper: Open-source, runs locally, very fast. Good for basic voice generation.