Building an open source voice assistant

This is a tutorial how to build an open source voice assistant from scratch for educational purposes with Rhasspy on a Raspberry Pi.


Tutorial: How to build your own offline speech recognition system for Mandarin on a Raspberry Pi:

This is a tutorial for an offline, open-source speech recognition system for Mandarin Chinese that consumes speech and produces subtitle-like transcriptions. Live and faster than real time. The system is built using Mozilla’s Deep Speech 0.6 open-source ASR engine. The guide is in large parts based on two existing tutorials Dmitry Maslov’s guide to live Deepspeech on Raspberry Pi on and elpimous_robot’s guide to running custom language models on Deepspeech.

We are building a Speech-to-text (STT) module for the use in voice bot speech processing pipelines. Within a typical voice bot ecosystem the job of this module is to consumes speech and compute subtitle-like transcripts from it. Our module will first convert raw audio into feature representations and then turn them into transcribed text.

         !      +---+   +----------------+   +---+   +---+   +---+
     ( ͡° ͜ʖ ͡°) < |Mic|-->|Audio Processing|-->|KWS|-->|STT|-->|NLU|
                +---+   +----------------+   +---+   +---+   +-+-+
         ?      +-------+   +---+   +----------------------+   |
     ( ͡° ͜ʖ ͡°) > |Speaker|<--|TTS|<--|Knowledge/Skill/Action|<--+
                +-------+   +---+   +----------------------+


Andreas Liesenfeld
Andreas Liesenfeld
Postdoc in language technology

I am a social scientist working in Conversational AI. My work focuses on understanding how humans interact with voice technologies in the real world. I aspire to produce useful insights for anyone who designs, builds, uses or regulates talking machines. Late naturalist under pressure. “You must collect things for reasons you don’t yet understand.” — Daniel J. Boorstin