Smart-E Robot Arm Assistant: Helps you pick up cubes (WIP Post #1: Voice Interaction with Mozilla TTS and Vosk, LewanSoul 6DoF Arm python library, ArucoTags, OpenCV)

spent christmas weekend making this demo of a robot picking up cubes (it doesn’t know which color cube is where beforehand)

Wellll okay this could be a long post but I’m pretty sleep so I’ll just “liveblog” it and not worry too much about making it a tutorial / step-by-step instructions / details. (it’s a two day hackathon project anyway)

two out of three of my roommates are also roboticists, so we (the three of us) started on a house project: a robot arm.

one of us want to focus on designing the arm from scratch (started on cable driven, now focused on cycloidal gearboxes) , and two of us wanted to get started on the software side (divide and conquer). so we got a commercial robot arm for $200: the lewansoul 6dof xarm.

i decided over christmas that it was time to give up on work and do side projects. so i spent two days and hacked together a bare minimum demo of a “helper robot.” we eventually want something similar to dum-E robot in iron man, a robot that can hand you tools etc.

the minimum demo turned out to be verbally asking the robot to pick up a cube of a specific color

the name

i felt mean to call a defenseless robot “hey dummy” all the time, so went with suggestion to use “hey smarty” instead …

probably the dum-e reference will be lost on everyone, since I only remembered jarvis in iron man before starting this project

first print some cubes

T^T the inevitable thing that happens when you get a 3d printer, you crush your meche soul and print cubes. then cover in tape to get different colors

lewansoul robot arm python library

there are several python libraries written, we worked with https://github.com/dmklee/nuro-arm/

which implements already the IK (inverse kinematics) so that we can give x,y,z coordinates instead of joint angles for each of the six joints. runs smoothly (after fixing a udev permissions issue). the library’s calibration required us to move the servo horn a little from how we initially put the robot arm together.

quickstart

$ git clone https://github.com/dmklee/nuro-arm/
$ cd nuro-arm
$ ipython
>>>
 from nuro_arm.robot.robot_arm import RobotArm
 import numpy as np

    robot = RobotArm(); 
    robot.passive_mode() ;   
    # now robot servos do not fight you. physically move robot arm to desired pose
>>> 
    test_pose_xyz = np.round(robot.get_hand_pose()[0], decimals=4); 
    print('---> ', test_pose_xyz, ' <---')
>>>
    POS_MID = [0.200, -0.010]
    Z_SUPER_HI = 0.10
    OPEN = 0.5
    robot = RobotArm()
    xyz = np.append(POS_MID, Z_SUPER_HI)
    robot.move_hand_to(xyz)
    robot.set_gripper_state(OPEN)

additional things to try: provides GUIs

python -m nuro_arm.robot.record_movements
python -m nuro_arm.robot.move_arm_with_gui

NOTES if have difficulty seeing robot on ubuntu, re: udev, on 20.04

sudo vi /usr/lib/udev/rules.d/99-xarm.rules
SUBSYSTEM=="hidraw", ATTRS{product}=="LOBOT", GROUP="dialout", MODE="0666"
sudo usermod -a -G dialout $USER
sudo udevadm control --reload-rules && sudo udevadm trigger

pip install easyhid
sudo apt-get install libhidapi-hidraw0 libhidapi-libusb0 # maybe needed?

(on 18.04 it is /etc/udev/rules.d)

troubleshooting:

import easyhid
en = easyhid.Enumeration()
devices = en.find(vid=1155, pid=22352)
print([dev.description() for dev in devices])

# should something like
# devices ['HIDDevice:\n    /dev/hidraw2 | 483:5750 | MyUSB_HID | LOBOT | 496D626C3331\n    release_number: 513\n    usage_page: 26740\n    usage: 8293\n    interface_number: 0']

make sure robot is plugged into laptop

voice interaction

the other part of the demo is voice interaction. two parts:

speech to text: recognize the trigger phrase and the commands after. this is modelled after the amazon alexa, apple siri, google home style of voice interaction with “AI”, where you say a trigger phrase.

now based on our inspiration it would be “hey dum-e” but this felt pretty mean, so was suggested to use “hey smart-e” instead 🙂

voice interaction part A: generating smart-E’s “voice”

from an earlier project (creating a welcome announcer in the lobby for the gather.town for ICRA 2020) I knew I could use mozilla TTS to generate nice-sounding voice from text.

https://github.com/mozilla/TTS

(as opposed to the default “espeak” which sounds like horrible electronic garble)

was rather annoying install at the time (a year ago?) but seems to be a cleaner install this time around, however!! all the instructions are for training your own model, vs I just wanted to use it to generate audio.

So here’s a quickstart to using Mozilla TTS.

~$ python3 -m venv env && source ./env/bin/activate && which python && pip3 install --upgrade pip && pip3 install wheel && echo done

~$ pip install tts

~$ MODEL="tts_models/en/ljspeech/glow-tts"
~$ tts --text "What would you like me to do?" --model_name $MODEL
~$ tts --text "What would you like me to do?" --out_path "./whatwouldyoulikemetodo.wav"
 > tts_models/en/ljspeech/glow-tts is already downloaded.
 > Downloading model to /home/nrw/.local/share/tts/vocoder_models--en--ljspeech--multiband-melgan
 > Using model: glow_tts
 > Vocoder Model: multiband_melgan
 > Generator Model: multiband_melgan_generator
 > Discriminator Model: melgan_multiscale_discriminator
 > Text: What would you like me to do?
 > Text splitted to sentences.
['What would you like me to do?']
 > Processing time: 0.7528579235076904
 > Real-time factor: 0.3542274926029484
 > Saving output to whatwouldyoulikemetodo.wav
$ play tts_output.wav

voice interaction part B: giving smart-E “intelligence”

hot stars, in the last year or two we gained a realllllly nice speech2text library. used to be that if you wanted reasonable real-time-ish text transcription, you needed to get a google API key and use google API.

now you can just pip install the Vosk library, download a 50 mb file, and get real-time offline transcription that actually works! and if you want Chinese transcription instead of English, just download another 50 mb model file.

https://alphacephei.com/vosk/install

i’m liking this future 🙂

quickstart

$ git clone https://github.com/alphacep/vosk-api
$ cd vosk-api/python/example
$ wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.15.zip
$ unzip vosk-model-small-en-us-0.15.zip
$ mv vosk-model-small-en-us-0.15 model
$ pip install sounddevice
$ python test_microphone.py

now to make the keyword recognition happen.

def find_keyword(voice_result):
    keywords = ['marty', 'smarty', 'smart']

(In a while True loop -- see test_microphone.py in the vosk repo)
    if rec.AcceptWaveform(data):
        sentence = rec.Result()
        if not state_triggered:
            found_keyword = find_keyword(sentence)
            if found_keyword:
                state_triggered = True
# if state_triggered, then move on to listening for colors

arucotags in python

how to locate the cubes? okay, actually i simplified this step in interest of finishing something in a weekend (before leaving for home for the holidays), and only had three positions. instead the issue was finding which color cube was where.

how to distinguish yellow cube from yellow stickynote? or black cube from checkerboard? i solved it by putting tags on the cubes and sampling (in 2d space) the color slightly above the tags to determine cube color.

I made a quickstart here for working with arucotags in python: https://gist.github.com/nouyang/c08202f97c607e2e6b7728577ffcb47f

note that the gist shows how to get 3d out, vs for the hackdemo i only used the 2d.

what color is the cube

this was actually annoying. in the end i assume there are only three colors and do a hardcoded HSV filter. Specifically, use hue to separate out the yellow, and then value to separate black from blue. (this is hack – note that the logic is flawed, in that black can be any hue)

crop = image[y-rad:y+rad, x-rad:x+rad]
color_sample = crop.mean(axis=0).mean(axis=0)
rgb = color_sample[::-1] # reverse, color is in BGR from opencv
rgb_pct = rgb / 255
hsv_pct = colorsys.rgb_to_hsv(*rgb_pct) # fxn takes 3 values, so unpack list with *
hsv = np.array(hsv_pct) * 255
print('rgb', rgb, 'hsv', hsv)

hue = hsv[0]
val = hsv[2]

closest_color = None

if 10 < hue < 40:
    closest_color = 'yellow'
if hue >= 50:
    if val > 90:
        closest_color = 'blue'
    else:
        closest_color = 'black'

additional notes

i’ll think about making a public snapshot of the git repo for the demo, perhaps. for the above code samples, my file directory looks like so:

$ ls
env
nuro_arm
speech_output # wav files from mozilla tts
voice_model
hackdemo.py
$ python hackdemo.py

Note

The pop culture reference for smart-E

this is the voice interaction, 38s sec to 62 sec

for longer context of the dum-e scenes, see

creepy voice output

it’s a deep learning model, so it gives unpredictably different results for hey! vs hey!! vs hey!!! and seems to do poorly on short phrases in general. (also you can’t force any specific intonations, like a cheerful “hi there!” vs a sullen “hi there”)

here’s the creepy output of the default model with just “hey”

funny bugs

i got the voice recognition to recognize the colors, but it seemed like the program would get caught in a loop where it would reparse the same audio again and again. i spent a while going “??? how do python queues work, do they not remove items with get()???” and clearing queues and such. eventually my roommie pointed out that, i have it give the response “okay picking up the yellow cube” so it was literally reacting to its own voice in an infinite loop 0:

it was fun to think about the intricacies of all the voice assistants we have nowadays

next steps

on the meche side, build an actual arm that can e.g. lift a drill (or at least a drill battery)

on the cs side, we will continue working with mini arm. bought a set of toy miniature tools (used for kids). we’ll use probably reinforcement learning to learn 1. what pixels correspond to a tool 2. which tool is it 3. where to grasp on the tool. and then 4. how to add new tools to workspace

decisions: match cad models? or pixels? to determine which tool

on the voice side we also have to do some thought about how people will interact with it. “hand me the quarter-inch allen wrench” or “hex wrench” or “hex key” or “t-handle hex” or “ball-end hex” etc., asking for clarification, …

currently we just figured out how to go from webcam image (x,y pixels) to robot frame (x,y,z millimeters) so you can click on the camera image and the robot will move to that point in real life

Pandemic Diary #73 – travel, omicron, classes (2 Jan 2022)

it’s 2022! here’s hoping it’ll be a great year compared to the bletch of 2020 (oh god has it been nearly two years already?!) and the meh of 2021

jeez haven’t diaried in a while

am in GA, flew back. only half-heartedly attempted to quarantine around parents. i did get a proper n95 mask though, and (mouthwash? according to parents? i guess omicron is more upper respiratory but ???), stripped off a layer of clothes, and hot shower when i got back. kinda wore a mask the first two days but still had meals together so,

i guess since we are all roughly at max strength for our boosters (2-3 weeks ago), and omicron is the most transmissable but possibly less deadly, so i’ve been laissez-faire. i guess if you want to only spend a week at home, then it’s not really compatible with actual quarantine. my parents both work remotely, so the risk of starting a community cluster is relatively low.

took a quickvue test ~ day 3 (~84 hrs in). at-home antigen test, analog readout – you read the stripes, if just blue shows up you are tentatively negative, if blue and pink show up you almost certainly have covid. got a negative result. will test again pre-flight tuesday.

at-home tests are hard to find online but still pretty available in-stores. appointments for professional tests are one-week out scheduled.

got guilted/dragged into indoor lunch… -___-;; sketchy, GA vax rate is ~50% (MA is ~75%, malta was ~90%, for two shots). mom did not go b/c immunocompromised, last last new years she got pneumonia and sepsis and a stay in the ICU from a holiday dinner. place was spaciously fairly empty at least. 2-3 empty tables between occupied ones. stressful. i realized my main concern actually now with catching covid, post-vaccination, is having to wait two extra weeks to get better before i can catch a flight back to boston. i can’t keep slipping two weeks of progress everywhere, and definitely unproductive in a stressful way at home.

parents have made changes: no longer shower after get back from stores. and no longer need to decontaminate groceries.

risk levels are changing. i will take a class in spring, have to teach two classes in that case. unlikely it’ll be remote. tested weekly. but still, classes with maybe a hundred+ students… (? i think? no idea how it works since haven’t been to class since 2019). very different environment than, as senior grad student, going in to lab and interacting with at most the same 10 people. don’t feel great about it. i think i’m going to really hate the first quarter of 2022. need to figure out something to look forward to. maybe life will surprise me.

ankle is doing better. maybe there was a mini-fracture and it healed, who knows, doesn’t matter. still achy-painful 6.5 weeks in to walk up and down the stairs more than a few times, but not enough to stop me from going downstairs and snacking lol.

(went home for new years, landing at the wrong end of terminal A & having to walk 26 gates felt like a marathon, stopped multiple times)

more new cases in a week in the US than at any time during the pandemic. ICUs on divert status. a little tired of the added friction to see people, to consider people’s circles, masks, check if people are okay with one thing or the other, just tired and want to not care about the future, and savings goals, and career ladders, and life trajectories, and how people perceive me, and whether i’m spending my time in an optimized way

well, also it’s late so i’m tired

happy 2022

https://www.npr.org/sections/health-shots/2021/01/28/960901166/how-is-the-covid-19-vaccination-campaign-going-in-your-state

As of 3 Feb 2022
As of 3 Feb 2022

PoV Yoyo Project Rebooted: CircuitPython, 3D Printed Yoyos, and Popsicle Sticks (WIP Post #1)

I decided IAP started, which means it’s time to give up on work and just do side projects. Thanks to my earlier blog post listing, I had a concrete set of projects to work through. And thanks to the largess of my roommate, we had a nice working 3d printer (ender 6). so finally i could 3d print the yoyo shell (as opposed to 2.008, getting a cnc mill to cut an aluminum mold for thermoforming and injection molding, 3d printing is much easier). in fact the entire yoyo is 3d printable apparently, so i didn’t even need to buy more things from the hardware store, removing almost all friction from re-starting this project. (additional friction removal supplied by stealing supplies from ee friends)

specifically this design: https://www.thingiverse.com/thing:1766385

that adafruit circuit playground pcb design is so close to what I want, i wanted to just buy the circuit directly and just build it. sadly what i want are radial LEDs, not ones arranged in a circle, for the PoV display.

mechanical structure

what i finally understood after printing it (re: how to do it without any hardware, like most 3d prints ask you to get some nuts or bolts at least): you can print threads into plastic directly, which are enough to screw into one side of the yoyo., and the other side of the nut can be glued in, with a hex design to retain so it doesn’t unscrew itself. the nut itself can be hollow so you can pass a battery cable from one side to the other.

circuitpython

ok so that was the mechanical structure. now for the electronics and programming. my friend supplied an rp2040 (adafruit version) to play with and convinced me to try circuit python.

it’s so great. admittedly at first it was very frustrating, as i accidentally sent one board into safe mode so it wouldn’t show up as a usb, and then i downloaded the wrong circuitpython (the naming is sooo confusing) so i would drag the file onto the “usb drive”, the usb drive would disappear, but never show up again as “circuitpython” drive. so i would have to hold down boot-select-whatever again to get it to show up as a drive again. and i tried nuking the flash too. eventually i realized that

“adafruit-circuitpython-raspberry_pi_pico-en_US-7.0.0.uf2

is not the one i wanted but rather

“adafruit-circuitpython-adafruit_feather_rp2040-en_US-7.0.0.uf2”

which, what. =____= such confusing naming.

nicely enough the adafruit rp2040 board takes lipo input (has jst connector) directly, and programs over usb-c. whoooo

breadboard and throughhole

so to start with, a breadboard and some standard throughhole LEDs. i used quick clips directly on the rp2040, connected to some male-male headers, and put 5 LEDs on the breadboard (with some 1k resistors).

I used the code form the lucky cat PoV and it … just worked! Yay!

I did run into some confusion (i think syntax differences between micropython, which code I stole uses, vs circuitpython?), in the end I made the modifications like so:

import time
import board
import digitalio

p1 = digitalio.DigitalInOut(board.D13)
p2 = digitalio.DigitalInOut(board.D11)
p3 = digitalio.DigitalInOut(board.D10)
p4 = digitalio.DigitalInOut(board.D6)
p5 = digitalio.DigitalInOut(board.D5)

LEDS = [p1,p2,p3,p4,p5]

for LED in LEDS:
    LED.switch_to_output()

[... most lines same as lucky cat pov code ... ]

def display(message, duration=150, duty=585, length=101):
    #motor.duty(duty)
    dispm = build_message_display(message, length)
    for n in range(duration):
        for line in dispm:
            for p, v in zip(LEDS, line):
                if v:
                    p.value = True
                else:
                    p.value = False
                #p.on() if v else p.off()
            time.sleep(0.001)

    #motor.duty(0)
while True:
    display('dood')

popsicle stick and smd

My original plan was to mimic the yoyo spin with one of those tiny brushed dc motors, however after this success i decided to move straight to the yoyo.

fortunately i had eaten some boba ice cream (thanks costco) and saved the popsicle stick just in case.

with the pressure of hacking together a demo, it removed all my normal thinking decision making issues (what headers to put on my micro? what will my cool friends think of the way i implemented this?) in favor of “what is the dumbest way i can get this done”. so … don’t question what i did too much lol

i snapped the popsicle stick, drilled the holes, stuck the SMD LEDs on with superglue, then put in resistors. i used hot glue to hold the throughhole resistors so that they wouldn’t move around and rip out the pads from the surface mount LEDs.

soldering was a small disaster.

rip

i had no lipo, so i went to microcenter (!) to get one in person (!) unfortunately the smallest size they had was still quite large.

apparently i managed to get some actual solder joints in there, and the code just worked

then i hot glued the battery in. the feather was kind of a “press-fit” already, and the popsicle stick rested on the resistor legs.

the heavy battery on one side made the yoyo spin pretty wildly, so it was not possible to see what was going on.

to try to see what was going on, i stuck it on a drill (with some nice double stick tape) and spun it around. this double stick tape is amazing, i keep finding uses for it, it can almost entirely replace hot glue and look better.

hmm. not very legible.

thoughts on next steps

my conjecture is that normally on a rotating object the words go around the circle, such that the base of the letters is the center of the circle. but since i adapted the lucky cat code, the sides of the letters are actually facing the circle.

i’ll save for a future update, but since the code was in python, it was super easy to figure out what was going on with the data structures. the lucky cat code actually has extra logic to flip the letters sideways, so it was straightforward to just take out that logic. (will save for future post)

with a bit of simple python code YAY NOT ARDUINO C we can see what what’s going on. on the left is output from the original code, and then i did some minor mods to get the right.

def convert_to_stdout(list_onoffs):
    result = []
    for line in list_onoffs:
        result.append(['▫ ' if c==0 else '▪ ' for c in line])
    printout = [' '.join(line) for line in result]
    for line in printout:
        print(line)

the upshot of this though is that if i want to try the second method, i will need 7 LEDs, not 5 LEDs 0: because the letters are each 5×7, and now they’re rotated

i found some batteries. the mini drone batteries have … mini jst. fortunately i dug around a bit and found a jst connector, so now to solder. and not cut across both battery leads with a scissor and short circuit the battery.

why do none of my small batteries have the reasonable connector

so, next steps; smaller battery, more LEDs, (maybe one day RGB LEDs?), new code, and maybe new shell when i’m ready.

not bad for 2 days-ish of work, hopefully i can stick through with this and turn it into a nice looking/working project instead of the hackish state most of my project end in.

other thoughts

A few questions arise, I did some back-of-envelope math while sitting on a plane (i did, in fact, fly despite the omicron). I don’t think I ever installed MathJax on this wordpress instance but eh for now.

Q: What rpm does a yoyo spin at?

If we think of it just as a falling object (don’t consider conservation of momentum etc.) just to get back of envelope, With no rotation, an object starts at 0 m/s, and after 1 second it has fallen 4.9 m.

>>> d = v0 t + (1/2)a*t^2
>>> where a = g = 9.8 m/s^2

Now for the yoyo, if a yoyo has a 1 cm diameter, it’s around 3 cm circumference.

>>> C = pi * d

Thus when yoyo falls 3cm, it’s made 1 revolution in order to unwind that much string. So to fall 4.9 m, it’s made ~100 revolutions.

>>> 4.9m * (1/0.03) rev/m ~= 100 revs

This took 1 second to fall 4.9m. So the yoyo has averaged ~100 revs / second. This is about 6000 rpm.

>>> 100 rev/sec * 60 sec/min = 6000 rev/min

I guess that if a human is throwing it will spin faster though.

Note: The other way is to measure it empirically. I have a Samsung Galaxy S9, which has a super-slow-mo mode which according to the internet runs at 960 fps. This should be fast enough, if we use the above calculations I should be able to get around 9 frames of the yoyo spinning before it completes a circle.

> Note: I started on this, it does appear the yoyo falls a lot slower than normal object, because it has to twirl around a large inertial mass in order to fall. TBD then, I’ll take a slo-mo video of a yoyo and another object falling side-by-side.

Q: Is an Arduino fast enough?

I had a friend work on a persistence-of-vision project and ended up needing to switch processors, which stalled out the project. Is an Arduino fast enough?!

The answer is yes, I’m driving like 6 LEDs, they were driving like a couple hundred RGB LEDs lol. I looked a bit into the individually addressable RGB LEDs & IMUs, which seem to run around 8 KHz for the LEDs, the normal IMU MPU-6050 can be sampled at 400 kHz, and the basic Atmega328 runs at 16 MHz. IDK why I was concerned when the v0.1 I wrote on an attiny. And for the yoyo itself, 5k rpm is only 83 Hz, so that should be plenty fine too.

here is some other notes of stuff i learned from my knowledgeable ee friends (who nicely don’t make fun of me for my basic questions):

so the ws2811 uses a 800khz clock, takes 24 bits for one led
so assuming naive bitbanging of the protocol without timers (not true with lib but conservative)
so 30us to address one led
150us for 5 leds
80hz refresh rate gives a time budget of 12,500us
so you have a very large margin even if you manually bitbang the protocol

somehow I had assumed that bitbanging was super fast, like in my head it felt like “let me drop from C to assembly to get max speed” but it’s not analogous at all

I’m assuming the most naive bitbanging implementation that looks like:
write(1)
sleep(high_time)
write(0)
sleep(low_time)
the sleeps can be interleaved with your code by using timers and such, but the sleep way is definitely slowest and easiest to analyze

projects blog (nouyang)