Category Archives: Thoughtful

IAP 2025 (advanced machining, intro to dl); 2025 goals (be a rich d*bag, anger as an identity), head scratchers (lifelong friendships, supportive workplace)

well! i keep meaning to write detailed posts about all the cool stuff i’m learning, and not getting around to it. alas

for now, I took intro to deep learning this past week with my roommate. it was a really great time, i solidified and filled some gaps in. i typed up a few quick notes below, primarily from the two guest lectures which they won’t post slides for.

i also am taking the “advanced machining” class. https://www.youtube.com/watch?v=qkjA94URV3k
I am enjoying it so far.

mostly i enjoy attending classes and it’s motivating to take the class and discuss with someone I know. the atmosphere feels great to be in a class with lots of other smart nerds thinking about nerdy things. the outside world can be blissfully ignored. (see section: 2025 goals)

advanced machining

it feels really wholesome to hear from someone who

class #1 (i missed this and am watching the video now) seems quite happy now who struggled in undergrad (took 7 years) and also felt shame about their startup (not VC funded, machining in house) and some takeaways (over-engineered heh).

last wed. last wed (class #2) was a blitz on stress, strain, etc. and there was also been a video tour of two machinists who just started their own shop — cool to me since in China I saw many banks of machines but it’s not like I would ever know where to start in terms of talking to the machinists.

yesterday (class #3) the latter part was (I walked in late because, well, i forgot about the class, but also the T was delayed 15-20 mins T^T during which time i got to hear two high schoolers discuss extensively the latest dating gossip which was a whirlwind of people and events over presumably ~6 months haha)

intro to deep learning

http://introtodeeplearning.com/ The recordings will go up presumably in a week or so. The labs are public online and can be run for free on google colab (although make sure to not just leave the tab open — I think you have maybe 4 hrs/week). https://github.com/MITDeepLearning/introtodeeplearning/tree/master The colab links are in the top of each notebook. Pick either pytorch or tensorflow as desired. note: I found some of the syntax tricky, fortunately they have working solution notebooks provided also. the last lab I did sink in $5 (they didn’t allow for less) which technically can be reimbursed.

e.g. day 1: i thought batches were for parallel processing — they are also for make gradient updates more stable. the ada stands for adaptive in learning rates — adapting to how fast the gradient is changing, etc. methods.

day 2? 3?: i will skip over some of the CNN and RL, Q learning etc. stuff as that stuff I’m more familiar with. it was still good to ground my understanding but i had fewer “aha” moments. I could also see where an audience member got lost — filters do downsize (turn 2×2 into 1 pixel) so you might imagine it quarters the image resolution — but actually you slide a pixel over and repeat the convolution, so in the the image is only “downsized” by maybe a pixel on each edge or something.

day 2? 3?: VAEs/GANs: this was basically all new content to me. the general idea of moving from deterministic model outputs to probabilistic outputs (hence loss using KL divergence) was important to allow for model to flex outside of training samples. made the amorphous idea of modeling a “probability distribution” clearer: predict mu and sigma of a normal distribution.

day 2? 3?: diffusion model: i haven’t followed diffusion models at all. basically trained to recover data from noise. so take an image, make it noisy, then noisier, etc. all the way back to random noise. train model to recover image at each step. then have self-supervised model that can generate stuff.

day 4 pt 1: literally people are using .split() on llm inputs and outputs

intuition why put in prompt “you are an mit mathematician” produces better results: on average, people on the internet are bad at math. the LLM is a statistical engine. this simply biases the output toward the training data that includes (probably) better math

intution why chain of thought prompting aka just prepending “think through this step-by-step” helps: all ML is error driven, and shorter output means there’s relatively little “surface area” for a model to make or correct a mistake.

can get performance boost from training data but eventually will tank. takeaway: evaluation is really important

day 4 pt 2. people are serious about using LLMs as judges of other LLM output

the mixture of weights (A + B/2) started as a joke and is now used by every LLM company. in fact there’s crazy family trees of mixtures of the mixtures themselves

some parameters are pretty well known now (adamw, 3-5 epochs, flashattentinon2). learning rate usually 1e-6 to 1e-3. batch size 8 or 16 determined by how much VRAM needed (with an accumulated update, can have a different “effective” batch size).

post-training is the term for what people work on nowadays. we don’t bother e.g. re-tokenizing for a different language, but just fiddle with the weights after.

for instance, LoRA. have a separate adaptive matrix on top of the LLM and just modify those weights for your task.

train/test split closer to 1-2% of samples, not 80/20 like in traditional ML

example of finetuning: to create a finnish language model: train a model that is good in language but bad at overall tasks, and a model that is excellent but bad in target task, then merge.

evaluation: it doesn’t work well and we don’t really know what we’re doing, but it’s really really important! (XD)

A lot of evaluation is actually for finding holes in dataset and then fixing those / adding more samples.

future trends: test time compute is stuff like, at inference, ask for several solutions and take the most common answer (majority vote)

recommended libraries: for finetuning, TRL from hugging face, axolotl (user friendly on top of TRL – easy to share and “spy” on other configs haha), unsloth (single gpu).

for supervised fine tuning: usually overkill to fully train (very high VRAM use), LoRA – high VRAM but recommended, QLoRA not recommended due to performance degrades

Pre train: trillions of samples, post train: > 1M (eg general purpose chatbot), fine tune: 100k-1M domain specific(eg medical llm), 10k-100k task specific (eg spell checker)

2025 goals

re: blissfully ignoring the outside world, that is my goal for 2025: be a rich d*bag. something about earning lots of money and donating it. turn a blind eye to … misery … or something …

… okay i don’t think i could go so far as to work on promoting tobacco but there’s probably stuff in between like “study illicit massage parlor industry and be depressed about humanity” and “figure out how to circumvent climate change regulations to expand oil and gas drilling”. otherwise i don’t think i can be useful in 2025-2029. thanks sexism. on the other hand still super proud i got to vote for two different women presidential candidates in my life time !!!! one day it will happen. even if i have to do it myself heh.

running

i guess that will be a goal. first though, finish my app, “have you run as much as my hamster”.com. my hamster ursula probably runs 1-3 miles every night heh

anger as an identity

I think a lot my identity was built around anger. it made me angry as a kid to travel and see someone younger than me, missing limbs, in rags traveling around on essentially a furniture moving dolly begging for money. meanwhile i had flown across the world.

Why would the world be so cruel and unjust? it really made me mad. this anger drove me past any insecurity and anxiety and self-hatred to keep going. i didn’t believe i could get into MIT, but i applied in part because yes, i wanted to change the world.

I didn’t believe i could get into grad school, but i applied because — okay actually i just applied because i’d be paid the same but actually get health insurance. it didn’t have much to do with anger lol

Anyway, anger in various forms has driven me through life. In some sense, anger is part of my identity, and I’m afraid to let go of it. I fear that if I stop being angry, I’ll stop trying to change the world.

But having anger at the world as a part of my identity, makes coming to terms with my inability to change it rather painful. Or makes it harder to see the small bits I do change and the change that happens over time.

I want to give myself permissions to be happy, to be confident that I won’t ever stop trying to change the world.

My wild implementation plan is to go to the opposite extreme and focus on being a rich douchebag and/or having tech bro optimism (that tech will fix the world), idk lol it’ll be a fun year

reflections on undergrad me: pep talk needed

i rewatched my hexapods video https://www.youtube.com/watch?app=desktop&v=qTh-OGA_LeM

(context: for the 2.007 class which is a robot competition, i elected to go do my own thing and build a hexapod, because … idk i wanted a dancing hexapod)

and WOW i can hear all the lack of confidence and downplaying and a little bit of the misery (that’s probably more my own memory though) in the (rather mumbled) voiceover. and reading my old instructables. i actually did a ton. it’s not as much as i wanted but it’s still a ton. it helps me hear it in my own voice right now.

when i watch a video of 2025 me in ten years, i want to come away with a sense of this person is super competent, confident, articulate, and rightfully proud of their own achievements and technical skills

head scratchers

friendships

i’ve never really considered the possibility of lifelong friendships before, so when a friend brought up the idea (in the context of finding emotional fulfillment) i was really stunned.

it really feels like i’m just starting to exit the unstable crisis mode i’ve been in for oh, the past decade. i mean way more stable than other people’s lives. but i haven’t really felt stable before.

a longer article another time. but essentially since my second hospitalization was such a miserably formative experience i always thought of friendships as a support network. the primary purpose is a safety net and my goal is the robustness of the network as a whole rather than individual links. it’s imperative that the graph is well-connected so that, even if/when edges fail (people move away, people start families, work gets busy, there’s a falling out, etc.) that each node is still well-connected overall. 

i would be a poor friend if i didn’t make sure my friends could rely on each other and didn’t need me.

but for her the focus is more on sharing the ups, not just the downs.

i never considered having a lifelong friendship as a goal or even a possibility.

still not sure how i feel about all this. it’s in some sense the opposite of my goal. in her framework, each deep friendship is special and irreplaceable. in my framework, having any individual link matter so much threatens the stability of the safety net.

i suppose that someone could have multiple deep friendships, grieve the loss of one, while still remaining well supported. tl;dr still scratching my head about this

supportive workplace

another head scratcher. i keep being mildly shocked each time my manager(s) are responsive and want me to succeed. i can only think of the misery of pushing my paper through on my own (no coauthors, no lab) into ICRA, and then instead of my committee celebrating that, feels like i got thrown under the bus at quals. that was, well, not helpful …

so yea, still constantly surprised and wrapping my head around this

toblog: boat rudders, lasercutters, and video games

well, i find myself telling the same stories and looking for similar pictures, so going on the todo list!

then i can spend more time developing more hobbies and stories hehe

this is a teaser for an upcoming post about lasercutters (you can get hobbyist grade ones that cut metal now! and … print in color?!)

the boat rudder project — wow, that epoxy rash was a nightmare

video games: the past two years have been a blur. ring fit, portal, portal 2, undertale, it takes two, unraveled, elden ring, animal crossing, zelda tears of the kingdom, ogame, ffvii, ffx, and ffxv. i think that’s the full list?

more about three dee printers soon (i have joined the dark side) in addition to the lasercutting

adventures in making a mini shop

and some thoughts about streamlit/stlite

anyway — off to sleep. perchance to dream …

i made a knitted hat on a knitting machine! (some cursing involved)

I made a hat!
(note: this post not instructions, just a note on new tech i found. hasten ye to youtube for instructions if you want those)

the machine

I used my friend’s knitting machine. It exclusively makes circles/cylinders. I didn’t know anything about knitting, and I was able to make a hat in … well okay probably two hours. But there are people online making one in 10 minutes! I probably could make one in ten minutes with some practice. (also not including buying yarns).

crank away!

The yarn is threaded onto this machine and you crank the handle. There’s a little counter and you just count — 60 loops. Put on the next color (literally just snip the yarn with a long tail, put it inside, then thread on a circle with the second yarn).  Crank away again another 40 loops. Realize you don’t have enough yarn and change the colors again. Here is a video.

casting off (and cursing)

At the end, you cut the yarn and spin the machine one loop. This starts the removal process. Stop after one loop! In fact, go really slowly at the end since if you go further the yarn pops off entirely, and we need to catch them before they pop off (at this stage they can completely undo).

You take a needle and thread yarn through all the final knits at the top and pull them off the machine one loop at a time. This is where the cursing comes in, because if you accidentally pop one off the machine because you’re a clumsy ape, then it can start to slip through multiple rows of yarn. You have to stop this process and then carefully re-knit them one row at a time.

It’s surprisingly hard to follow individual yarn threads and find the tiny loop that you dropped. If my friend hadn’t been there I think multiple hours of youtube videos and a general disillusionment would have resulted from the casting off process.

cinch and add a pompom

Once the hat is cast off, you have a long tube. You cinch it tight at the top. At this point I also made a pom pom (see previous posts) on the spot by wrapping yarn around four fingers, taking that tail from the cinched off hat, and using that to cinch off the pom pom also.

Then, tie a square knot or two.

As a final act, we hide the thread. To do so, you scrunch the hat and thread the yarn through a few loops (in this specific pattern which follows existing yarns) and then when you un-scrunch you can poke the thread through into the center of the hat.

Hat!

finally you push one end of the tube up into the other half and you get a hat! (I also rolled up the brim in the pic below)

the end.

appendix

Notes on yarn

There’s a bit of trickery with the yarn, it has to be yarn 4 (?) and fit in the needles and also slide off of them well. This Sentro machine has 43 needles I think which come up one at a time and grab the thread being fed in and then back down. Here is the thread I used.

other more complex inspiration from youtube

We also looked up how people make patterns on their hats (other than just solid swathes of color). Seems like they just manually do so — loop one thread, then the next, then another color, etc.

Also, here’s an example of Fixing a stitch, which is what I did with a lot of cursing: https://youtu.be/VhtOs-5lwI4?feature=shared&t=341

Manually knitting over the original to put in a design (“duplicate stitch”)

Or you can stitch, then cut the yarn, then stitch the new color, then cut the yarn, etc. It’s kind of intense: https://youtu.be/JMV49F45xuQ?feature=shared&t=1278

Weezer – Undone — The Sweater Song

oh, it’s true, you can pull a single yarn and undo the whole thing. my friend provided this extremely sophisticated proof, the lyrics from this song:

“♪ If you want to destroy my sweater ♪ Pull this thread as I walk away ♪ As I walk away! ♪”

 

Future work — DIY automated ugly christmas sweater?

in order to really make an ugly christmas sweater though, we can’t just be doing tubes all day e’ery day. So to do that you need to have two linear machines that interleave with each other. And to do that you first have to have one linear machine…

So some research (from above friend) on this.

The commercial linear ones are around $250 — $500. www.amazon.com/Knitting-Machine-Stitches-Domestic-Accessories/dp/B09KG8X6XT

In terms of DIY — This is a circular one (perhaps among the best?)

https://www.printables.com/model/355228-circular-sock-knitting-machine-for-my-mom-and-you

But no linear DIY / OSHW ones exist. So, maybe tbd?