Speech AI#

Scraping and generating audio data.

STT (Speech-to-Text)#

OpenAI Whisper is the open-source state-of-the-art. It handles transcription and translation across dozens of languages.

  • Tip: Use faster-whisper for massive speedups on CPU/GPU.

TTS (Text-to-Speech)#

  • ElevenLabs: Incredible realism, supports voice cloning. API is paid.
  • OpenAI TTS: Fast, high quality, 6 standard voices.
  • Parakeet / Piper: Open-source, runs locally, very fast. Good for basic voice generation.