one morning later… english speech to text, then translated to spanish — in realtime & offline

we live in the future!! this is some crazy stuff

here’s a short/random video demo (needs audio on)

XD;

The translation engine really wants proper grammar (e.g. capitalization and punctuation), and the transcription engine really doesn’t care about all that nonsense.

So now I’m working through grammar checkers.

attempts at grammar fixing

I found a promising library that unfortunately turned out to just be a wrapper around an API service, a.k.a. doesn’t function offline T^T It works quite well.
The other two options: language_tool_python, which is a wrapper around the grammar/spell check that libreoffice uses. It’s more rules-based. And then some random transformers model.

Here’s the results. The random transformers model doesn’t perform as well as the (company, gingerit)’s fix and is rather … unpredictable. The language tool doesn’t perform great either, but it’s relatively fast. And then we have gingerit, which is slow purely b/c it’s an online model (and not a fair comparison in that case for accuracy).

I also show how fixing the grammar/spelling matters when then putting it through argotranslate.

grammar fix tool comparison

Timing Code for above



import time
from funcy import print_durations
from gingerit.gingerit import GingerIt
import language_tool_python
from happytransformer import HappyTextToText
from happytransformer import TTSettings

import argostranslate.package
import argostranslate.translate

def trans(txt):
   return argostranslate.translate.translate(
       txt, 'en', 'es')

parser = GingerIt()
tool = language_tool_python.LanguageTool('en-US')
happy_tt = HappyTextToText("T5", "prithivida/grammar_error_correcter_v1")
settings = TTSettings(do_sample=True, top_k=10, temperature=0.5, min_length=1, max_length=100)

txt1 = 'i have a cat in my pants'
txt2 = 'hi hows it going'


for i, txt in enumerate([txt1, txt2]):
    print('!----')
    print('Text to fix: ', txt)
    print('Translation w/o grammar fix: ', trans(txt))
    print('!----')
    print('\tgingerit\t\t', parser.parse(txt)['result'], '\t',
          f"{trans(parser.parse(txt)['result']) if i==1 else ''}")
    print('\tlanguage_tool_python\t', tool.correct(txt),'\t',
          f"{trans(tool.correct(txt)) if i==1 else ''}"
          )
    print('\tt5 transformer\t\t', 
          happy_tt.generate_text(txt, args=settings).text,'\t',
          f"{trans(happy_tt.generate_text(txt, args=settings).text) if i==1 else ''}"
          )

    print()
    with print_durations('Timing gingerit'):
        for i in range(20):
            parser.parse(txt)['result']

    with print_durations('Timing language_tool_python'):
        for i in range(20):
            tool.correct(txt)

    with print_durations('Timing t5 transformer'):
        for i in range(20):
              happy_tt.generate_text(txt, args=settings).text
    print()

code for the video demo above (just mashed up from the argotranslate and vosk-api readmes)

# 12 May 2023
# nrobot

import sys
import argostranslate.package
import argostranslate.translate
import queue
import json

import sounddevice as sd
import wave
from vosk import Model, KaldiRecognizer, SetLogLevel

from_code = "en"
to_code = "es"

def setup_trans():


# Download and install Argos Translate package
    argostranslate.package.update_package_index()
    available_packages = argostranslate.package.get_available_packages()
    package_to_install = next(
        filter(
            lambda x: x.from_code == from_code and x.to_code == to_code, available_packages
        )
    )
    argostranslate.package.install_from_path(package_to_install.download())

def transl(phrase):
   return argostranslate.translate.translate(phrase, from_code, to_code)

def setup_transcribe():
# You can set log level to -1 to disable debug messages
    SetLogLevel(-1)
    #SetLogLevel(0)

if __name__ == '__main__': 
    setup_trans()
    setup_transcribe()
    '''
    wf = wave.open('test.wav', "rb")
    if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
        print("Audio file must be WAV format mono PCM.")
        sys.exit(1)

    while True:
        data = wf.readframes(4000)
        if len(data) == 0:
            break
        if rec.AcceptWaveform(data):
            print(rec.Result())
        else:
            print(rec.PartialResult())

    print('final result' , rec.FinalResult())
    res = json.loads(result)
    print(res['text'])

    '''
    model = Model(lang="en-us")

    q = queue.Queue()

    def callback(indata, frames, time, status):
        """This is called (from a separate thread) for each audio block."""
        if status:
            print(status, file=sys.stderr)
        q.put(bytes(indata))


    device = None
    device_info = sd.query_devices(device, 'input')
    # soundfile expects an int, sounddevice provides a float:
    samplerate = int(device_info['default_samplerate'])

    model = Model(lang="en-us")

    try:
        with sd.RawInputStream(samplerate=samplerate, blocksize = 8000, device=dfg
evice, dtype='int16',
                               channels=1, callback=callback):
            print('#' * 80)
            print('Press Ctrl+C to stop the recording')
            print('#' * 80)
            print(f'Samplerate: {samplerate}, device: {device}')

            rec = KaldiRecognizer(model, samplerate)
            rec.SetWords(True)
            rec.SetPartialWords(True)

            #translating = False
            while True:
                data = q.get()
                if rec.AcceptWaveform(data):
                    sentence = json.loads(rec.Result())['text']
                    print('\t !------ \n')
                    print('sentence: ', sentence)
                    print('translation: ', transl(sentence))
                    print('listening for input again')
                else:
                    #print('waiting for a full sentence')
                    pass
                    #print('partial result', rec.PartialResult())


    except KeyboardInterrupt:
        print('\nDone')
    except Exception as e:
        print('Exception: ', e)

quick experiment: python library for real-time offline translation (english to spanish)

wow, pip install argotranslate, and the offline translation just works

and it works fast

Seems plenty fast to me to actually use in real time!

The library is: argos-translate

The code is straight from their github.

https://github.com/argosopentech/argos-translate

code

import argostranslate.package
import argostranslate.translate

from_code = "en"
to_code = "es"

# Download and install Argos Translate package
argostranslate.package.update_package_index()
available_packages = argostranslate.package.get_available_packages()
package_to_install = next(
    filter(
        lambda x: x.from_code == from_code and x.to_code == to_code, available_packages
    )
)
argostranslate.package.install_from_path(package_to_install.download())

# Translate
translatedText = argostranslate.translate.translate("Hello World", from_code, to_code)
print(translatedText)

 def trans(phrase):
   ...:     return argostranslate.translate.translate(phrase, from_code, to_code)

trans('I think it will be more than fast enough for real-time use')
>>> 'Creo que será más que lo suficientemente rápido para uso en tiempo real'

other tools to check out

(haven’t bothered yet)

https://github.com/OpenNMT/CTranslate2

according to https://skeptric.com/python-offline-translation “Argos Translate is a more complete solution, is easier to get set up, and is substantially faster. However Marian Machine Translation gives better translations, supports more languages, and better supports batch translations.”

Note: The tool will fail if the punctuation and capitalization is not correct however.

back story

The idea is to create a translation t-shirt. So take the following project one step further:

The code / instructions are here: https://github.com/ZackFreedman/DeepgramSubtitleHoodie /

(It looks like he used a $75 touch screen (“stretched bar display”), I guess he had it lying around ?? I’m not sure that’s where I want to be swiping around on myself lol) (but maybe that is the cheapest?)

Basically just pipe output of the real time speech to text tool to the translation tool,

Then wear an apron with the screen built-in. So I can speak and have the apron-tablet auto-translate.

Actually, it could have a fold-out mirror. So if there were also vision problems, I could look down at the mirror and just speak the translated text out loud XD;

i would feel like such a nerd

actually though this would be a neat tool for traveling to another country,

waiting for google translate lag (both in terms of phone processor and 4g availability) is a pain

if I am just using stuff I already have, maybe the next step is installing linux on a switch — that’s the closest to a portable screen I have at the moment… (I was playing FFXII really intensely but have since lost interest, I think due to not being sure what direction to take the game)

ALTERNATIVELY. A youtube commenter suggested — I could make a persistence of vision hat, and that would be pretty darn cheap haha.

And… need a glare / matte screen apparently. See 13:07.

rambles

I’m not sure how I feel about this, it might reduce my motivation to learn languages. I guess once I make real moneys I would like to pay for Spanish lessons or something similar. @__@ (And Chinese lessons for that matter). I’ve also discovered though that maybe I’m not the greatest at learning languages? I have no idea.

Life feels weirdly short and long. I was too busy recovering from crisis in my mid-20s to have too much of a quarter life crisis I guess. But now that I am emerging out of one crisis, I feel a bit adrift in terms of identity and what I’m trying to do with my life.

custom search engine for python documentation

recently i started using devdocs.io and I actually enjoy the focus on documentation — usually I just try to do whatever and get by on stackoverflow, but it’s been refreshing to dive into documentation.

(I guess chatgpt has also made me realize how much time i spend avoiding ads and skimming text to find what I want)

unfortunately devdocs only works on a subset of the python libraries i use

i looked into the source and it looks a bit annoying to run on my own (a lot of node, ruby, etc., requires indexing script)

anyway halfway to a terrible start up idea i found a “good enough” solution that scratches my itch

it turns out you can customize google to create a search box that only searches a subset of websites

programmablesearchengine.google.com/

Hurray!

That engine I set up above is at https://cse.google.com/cse?cx=22b7ea18490504163#gsc.tab=0 if you’re curious.

projects blog (nouyang)