Made a quick NPC announcer for this virtual pokemon style gathering tech (aka Online Town)
Online Town was used for a previous conference, ICLR, where people even “went to the beach” (there’s several environments, I chose the conference hall one).
ICLR Town: Pokemon-esque environment to wander around and bump into people, which syncs almost seamlessly with video-chatting capabilities. — maithra_raghu
However there is no chat history, and no way to set a description or announcements. In order to do so, Ondrej Biza suggested I actually just have a repeating audio announcement character: the Town Announcer.
After some googling, I quickly whipped together a set of bash scripts to do so. This is is Ubuntu 18.04.
1. Create recording of the announcement
I used `festival`, since I already had that installed for my terminal timer (blinks the terminal red, plays a loud sound, and speaks “n minutes are up”).
sudo apt install festival text2wave onlinetown.txt | lame - text.mp3
Inside the text file I had
$ vi onlinetown.txt Welcome! The next scheduled event is: Happy Hour at 5pm Eastern. Again that is at 5pm Eastern. Move away to stop hearing this voiceover. To add to this voiceover email me at a b c at m i t dot edu
(Side note: Somehow festival is really bad at pronouncing email, oh well).
2. Create “virtual microphone”
I can then select this as my mic input for the videoconference
As per sebpiq on stackoverflow
pactl load-module module-pipe-source source_name=virtmic file=/tmp/virtmic format=s16le rate=16000 channels=1
3. Play speed recording over virtual mic
As per sebpiq on stackoverflow
ffmpeg -re -i text.mp3 -f s16le -ar 16000 -ac 1 - > /tmp/virtmic
4. Create bash script to continuously do so
$ vi loop_mic.sh #!/bin/bash while true do echo "Press [CTRL+C] to stop.." ffmpeg -re -i text.mp3 -f s16le -ar 16000 -ac 1 - > /tmp/virtmic sleep 0.5 done
5. Run it
chmod +x loop_mic.sh sh loop_mic.sh
Check that it’s in the settings
Then open online.town, and when it asks for which microphone input to select, select the Unix FIFO.
To check that it’s all working, I joined from another computer & could hear the announcer. Sometimes it does seem like I can’t hear the announcer when joining, but I can’t reliably reproduce it.
Another sanity check is to go back to Sound settings and check that the “input level” bars are going up and down.
I did try speeding up the audio, but it did introduce new audio artifacts (squeaks). If you want to do so, add a setting when generating the mp3. Below it speeds up to 2x.
text2wave -eval "(Parameter.set 'Duration_Stretch 0.5)" onlinetown.txt -o text.mp3
( I also tried speeding it up in the `ffmpeg` command, it sounded horrible).
Better text to speech
This was a pain… ! Not sure why. Very confusing instructions.
Helped in part As per. https://levelup.gitconnected.com/installing-mozilla-tts-on-a-raspberry-pi-4-e6af16459ab9
These are final records of what worked.
2022 apt-get install -y espeak libsndfile1 python3-venv 0000 python3 -m venv env 0000 source env/bin/activate 2029 pip3 install -U pip setuptools wheel 2050 pip install https://github.com/reuben/TTS/releases/download/ljspeech-fwd-attn-pwgan/TTS-0.0.1+92aea2a-py3-none-any.whl 2001 git clone email@example.com:mozilla/TTS.git 0000 cd TTS 2053 pip3 install packaging 2052 python -m TTS.server.server 0000 firefox localhost:5002 # FAILS unless inside TTS git folder!
Traceback from failure:
# FAILURE CASE $ python -m TTS.server.server > Loading TTS model ... | > model config: None | > checkpoint file: None Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/chai/projects/TTS/server/server.py", line 62, in <module> synthesizer = Synthesizer(args) File "/home/chai/projects/TTS/server/synthesizer.py", line 36, in __init__ self.config.use_cuda) File "/home/chai/projects/TTS/server/synthesizer.py", line 52, in load_tts self.tts_config = load_config(tts_config) File "/home/chai/projects/TTS/utils/io.py", line 16, in load_config with open(config_path, "r") as f: TypeError: expected str, bytes or os.PathLike object, not NoneType
Some other commands I tried (didn’t work)
https://github.com/mozilla/TTS/wiki/Released-Models#simple-packaging---self-contained-package-that-runs-an-http-api-for-a-pre-trained-tts-model sudo apt-get install -y espeak libsndfile1 python3-venv python3 -m venv env source env/bin/activate && which python3 pip install -U https//example.com/url/to/python/package.whl pip3 install -U pip setuptools wheel $ python -m TTS.server.server --help
It will say synthesizing, After a minute or two it will finish and start playing the audio. Right click and “save audio”
Output is a wav file.
What successful TTS.server.server looks like:
(env) 15:53:25 chai@W530:~/projects/TTS (master %)$ python -m TTS.server.server /home/chai/projects/env/lib/python3.6/site-packages/librosa/util/decorators.py:9: NumbaDeprecationWarning: An import was requested from a module that has moved location. Import of 'jit' requested from: 'numba.decorators', please update to use 'numba.core.decorators' or pin to Numba version 0.48.0. This alias will not be present in Numba version 0.50.0. from numba.decorators import jit as optional_jit > Loading TTS model ... | > model config: /home/chai/projects/env/lib/python3.6/site-packages/TTS/server/model/tts/config.json | > checkpoint file: /home/chai/projects/env/lib/python3.6/site-packages/TTS/server/model/tts/checkpoint.pth.tar > Setting up Audio Processor... | > sample_rate:22050 | > num_mels:80 | > min_level_db:-100 | > frame_shift_ms:12.5 | > frame_length_ms:50 | > ref_level_db:20 | > num_freq:1025 | > power:1.5 | > preemphasis:0.98 | > griffin_lim_iters:60 | > signal_norm:True | > symmetric_norm:True | > mel_fmin:0 | > mel_fmax:8000.0 | > max_norm:4.0 | > clip_norm:True | > do_trim_silence:True | > trim_db:60 | > sound_norm:False | > n_fft:2048 | > hop_length:275 | > win_length:1100 > Using model: Tacotron2 > Loading PWGAN model ... | > model config: /home/chai/projects/env/lib/python3.6/site-packages/TTS/server/model/pwgan/config.yml | > model file: /home/chai/projects/env/lib/python3.6/site-packages/TTS/server/model/pwgan/checkpoint.pkl > Setting up Audio Processor... | > sample_rate:22050 | > num_mels:80 | > min_level_db:-100 | > frame_shift_ms:12.5 | > frame_length_ms:50 | > ref_level_db:20 | > num_freq:1025 | > power:None | > preemphasis:0.98 | > griffin_lim_iters:None | > signal_norm:True | > symmetric_norm:True | > mel_fmin:0 | > mel_fmax:8000.0 | > max_norm:4.0 | > clip_norm:True | > do_trim_silence:True | > trim_db:60 | > sound_norm:False | > n_fft:2048 | > hop_length:275 | > win_length:1100 * Serving Flask app "server" (lazy loading) * Environment: production WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Debug mode: off [INFO] * Running on http://0.0.0.0:5002/ (Press CTRL+C to quit) > Model input: Welcome! The next scheduled event is: Happy Hour at 5pm Eastern. Again that is at 5pm Eastern. Move away to stop hearing this voiceover. ['Welcome!', 'The next scheduled event is: Happy Hour at 5pm Eastern.', 'Again that is at 5pm Eastern.', 'Move away to stop hearing this voiceover.'] [INFO] 127.0.0.1 - - [01/Jun/2020 15:54:03] "GET /api/tts?text=Welcome!%20The%20next%20scheduled%20event%20is%3A%20Happy%20Hour%20at%205pm%20Eastern.%20Again%20that%20is%20at%205pm%20Eastern.%20Move%20away%20to%20stop%20hearing%20this%20voiceover. HTTP/1.1" 200 -
Hurray. ! The output sounds much nicer.
pactl unload-module module-pipe-source