tada!
Video demo 🙂
You can see the kind of mistakes it makes —
The translation package (argostranslate) really wants proper grammar, while the transcription (vosk) package does not care at all… I did try a vosk model that preserves casing, but it was so slow to run that it would just miss audio entirely which was worse.
Also I couldn’t get Chinese or other languages to work with argostranslate, only Spanish (it would just output the input text instead of any Chinese characters). I wonder if I messed something up with my install.