AMIA 2024 – It’s a house of cards 0: the LLMs are supervising the LLMs (W05) Empowering Healthcare with Knowledge-Augmentsed LLMs – Innovations and Applications

Hullo! Long-time no see, actual blog posts on career / tech oriented topics.

Today I’m attending the American Medical Informatics Association 2024 Annual Symposium in San Francisco. My first fully-funded work trip, from flight to food (per-diem) to conference registration and hotels 🙂

For now, since time is short between sessions, I’ll focus on my own reflections. Later I can go back and (especially with the slides as reference) fill in a synopsis of (what I learned) from the workshop.

The first session of the day was a workshop on LLMs. I thought this might be more of a “work” shop but it was actually mostly presentations of recent papers by several speakers, and a panel. This is for the best since there were several dozen attendees. I’m actually really enjoying myself regardless. I am getting dragged from the dark ages to the new frontier of LLMs, and it’s exciting to see an entirely new field of research as well as application. Here I’m entirely anonymous, a blank slate. Mentally, I can come in with few preconceptions or emotional ties.

Note to self: get to conference early in order to avoid badge line, though it did move quick. Also exit early to get the snacks (nutri-grain bars and such).

So what does the frontier of LLMs look like? From my outsider perspective, it looks like a crazy house of cards to be honest, with LLMs on LLMs on LLMs. To be fair, I walked in (late, I got absorbed into ironing lol) and the first presentation included self-verification, with LLM-LLM checking compared to human-LLM checking. Cue visions of runaway AIs merrily herd-guiding themselves into complete ethical chaos. A lot of my work I doubted as being stuff I made up hackishly as a feral coding child that didn’t have the engineering chops to implement better solution. Surely the experts out there have far better methods. But nope! Indeed even for RAG (retrieval-augmented generation, like a librarian retrieving specific references for you, the LLM, to consider) the presenters are relying on LLMs (LangChain) to create the chunks to feed into the LLM.

Wat.

Maybe this is a solved problem and I don’t know it? Hopefully later workshops will reduce my skepticism since many people seem to just take it as part of the cog. I mean, if it gets you presentations and workshops … well … I guess you don’t have to innovate on every single part of a system to get a publication and move the field forward. Makes sense.

Also: I do have to consider that humans make plenty of mistakes also. Can we compare to self-driving cars, which (last I checked) have lower accident rates in several categories than humans? (though I need to check if this is simply because self-driving cars currently drive in easier conditions over fewer miles than humans). It’s entirely possible LLMs may be inaccurate but still better than humans. They incorporate a larger information base. For instance, locally humans may make a mistake (as in a MITOC seminar I attended) where e.g. instructors were not sure of signs of frostbite on dark skin instead of fair skin and had to look up the information. For LLMs, likely the information already “exists.”

Should we demand higher performance for LLMs?

Unfortunately, LLMs have hallucination and other errors that deviate from human intuition. (Consider how LLMs struggle to answer “how many r’s are in strawberry.” The conjecture is that LLMs are trained at the chunk-level (more like words) rather than character level, and contain no sense of logical reasoning since they are trained to predict the next token, so will struggle on these. However, this is clearly not intuitive nor expected to a vast majority of users). I view it as drawing a “bubble” of stability/robustness (as in robotics) that we can predict well with humans. But we have little intuitive idea of the boundaries with LLMs.

From software and systems engineering, we have the concept of CI/CD, or continuous integration / continuous development. Essentially, we write tests that are automatically run to monitor (production) code output. I could imagine something like the CAPTCHA system which I think relies on humans verifying other humans. We could imagine a system where a subset of predictions by either LLMs or physicians are sent for a “second opinion” by another physician (or LLM?) for quality monitoring.

The other approach is creating tests – creating the benchmarks, which apparently don’t exist for evaluation hallucinations. The panel at the end had speakers with many examples of how LLMs had failed, even at longstanding tasks such as checking for negation (e.g. patient did not receive xyz drug). I hope that there are efforts to crowdsource such examples. (A little voice in me is like: could a nefarious party use such a dataset? This voice has distracted me negatively in my career so I’ve decided to ignore it. I do believe the positive outcomes outweight any negatives at this stage. Certainly for me personally…).

I do need to read more in detail on how the LLMs of today are trained, since of course OpenAI faced similar issues with monitoring their chat output. They solved this in part by having humans read the output and answer multiple choice questions about the LLM output.

Another thought: a lot of hand wringing about how to ensure fairness across diverse patient populations. I know this is a technical talk, but part of me is like “oh easy if you actually put money on the line it will be solved.”

Another question I’ve yet to have answered: can LLMs be retrained (or finetuned) to embed a concept of uncertainty and self-evaluation of bias from the ground up? I am still skeptical of the original data sources LLMs are trained on, as well as the opaque safeguards implemented (though this does seem like a fun black hat activity now that I realized it’s actually a security question). I suspect this is too resource-intensive for companies to find it worthwhile, alas. (Another ethical issue I have with working in LLM space — I’m not working on climate change mitigation for sure, not even on research to make LLMs more efficient. Again going against my instinct and reminding myself to focus on core technical skills for now, as that’s what holding me back, not my varied social justice issues and ideas hah).

I am also curious if we have a sense of when halllucination and trust could be considered solved. If we have a clear vision of that, it’s possible we can work backwards.

All-in-all, it’s been really intellectually stimulating. I do feel a difference in my own confidence (arrogance) with my PhD. I wonder if this is how fresh grads with more confidence feel. I don’t feel a pressure to be perfect — to seize the opportunity to ask interesting questions all the time, to network obsessively. I can focus a bit more on just thinking things and doing happy things instead of being anxious about everything I’m not doing. It’s interesting to feel like “of course I will have interesting insights given my background and skill set, all the people who doubt me are wrong.” (Okay, it’s arrogance from a root of insecurity lol). Instead of acknowledging that everyone here will have really interesting insights and my task is to give space and respect people with different backgrounds and work to find these insights (or create them together).

Hopefully this will be reigned in by (in the negative?) performance evaluations one day, in a good and constructive way. Or perhaps in the positive by more genuine confidence in my skill set that lets me be open to valuing and respecting others more.

Honestly, I spend so much time right now just reading novels or stressing about the future and relationships instead of just making cool coding projects. But I do feel like I’m starting to come out of that and start living in the present. It will just take time.

Time to sneak in a bit of lunch in 12 minutes? I skipped breakfast. I guess I’ll just live off of the calories from coffee.

as a woman: scared and angry. but still proud of harris

at times like these i see myself how the world sees me: a woman

i want to be angry at my friends somehow since they are the closest to me and i’m terrified.

but i also feel the enormity of running for president in the united states. just how many people have to put in work for how many hours for this uncertainty of a venture that is american politics. so many smart, bright, talented people.

and to be honest, i think harris did amazing. she started three months ago (!). what have i done in three months? barely anything hah.

in medford — 75% of people voted for harris. i’m not sure the turnout, i think around 70% across the state. so there’s probably around 1/5 people who actively vote for trump here. in somerville it’s closer to 88%. 1/10 people or less. half as many. it’s weird to put it that way. but there are the big trump flags here that make me feel like i’m in georgia.

i’m angry at my friends for irrational reasons, as if I would be any happier if our lives became devoted to politics. it is good to be older and wiser than 2016 me who fell off the rails. i need to remember that.

i wasn’t sure how to tell people here that i didn’t feel the same amount of energy and enthusiasm as in 2020, when we had the biden checks (heh) and free time to campaign (and presumably other college folks were home too). i really got invested / enthusiastic over the summer in 2016 and 2020. (well, i was in georgia both elections).

i can feel it — the bubble i live in, and the bubble people my friends’ groups lives in. we don’t associate with the other — we don’t welcome people having different opinions. because of fear and anger. as if somehow my friends misled me to be optimistic. that i was the one lulled since the people around me were feeling secure. herd thinking.

my friend told me today: “i don’t know anyone who was enthusiastic about hillary clinton,” and it was interesting to me — in georgia i didn’t know anyone who was enthusiastic about sanders … bubbles in bubbles. sanders was a rude shock to me in the middle of my excitement, and my deep dive into how the republican smears worked. to be honest — i was wildly enthusiastic about hillary clinton haha. i am saddened by people who want to secede — as if the millions of democratic voters in georgia mean nothing. we delivered two democratic senators — for the tie breaking senate — in 2020! warnock and ossoff. i’m still so proud of georgia for that. and georgia voting for biden — what a moment.

but yes. i was just starting to get enthusiastic. the first female president of the united states! cat ladies! hella competent and holds her in debates! it would be such a relief to feel like, we are still on track for climate change, for abortion access. three months was such a short time … i haven’t even gotten arount to merch yet. i’m definitely going to though, the house three doors down has a giant trump sign. what’s the point of being a homeowner if i can’t have flags lol.

i’m trying to remember — wow, in my lifetime, i have voted for the first black president, and in just the past decade i have voted twice now for female candidates! HRC was: “the first woman to be listed as a presidential candidate in every state and territory” per wikipedia. the world is changing — and i am part of it.

i’m angry at my guy friends who don’t feel as scared as i do. who aren’t mad about roe v wade to the bone. i’m scared of dying in childbirth but it’s not their fault, i have to remember. i do see the loss now — i feel like the loss of roe v wade ripped away the option of building a home in georgia since i lost confidence in the medical care there. it’s not something abstract to me, it feels very much a part of my life and my life choices right now, in the short term (5 years).

but it’s not their fault. and if this helps me be more open about these topics and helps change minds, then that will be a positive result.

(things i’m mad about: trump being: a repeated molester, convicted felon, someone who literally tried to overthrow the government and started a riot in the capitol that killed people, suggested shooting protestors, threatened media freedom, etc. things i’m sad about: his policies are going to be worse for the middle class economy than harris’s)

(on top of the disgusting way it played out. that obama’s pick was blocked for months, and then trump’s pick got squished in in a week. ugh.)

and the global politics. (ah, don’t be mad at my friends who were so mad at the democratic party lately. it’s because i’m triggered from 2016 when i was way less emotionally competent lol).

i am still part of the world changing. focus instead — what can i do? the positives — abortion access is being enshrined in state constitutions and can continue to be enshrined. i can focus on becoming rich and powerful — and using my knowledge and skills to create climate change solutions. it feels weird — as a kid i was laser-focused on international development and the UN millenium development goals. maybe i need to travel more again. but now i feel the pressure of climate change, of how it’s killing people every day.

ah. i really did want to spend the next 1-2 years focused on pure technical achievement. i feel like i did my karmic duty in my former relationship and in my thesis topic and i was ready to put all the naysayers (including myself) in the dust. i was so excited to not think about any social issues for a while and just grind on technical content and build cool things. notch in and checkpoint a bunch of career wins and a sweet salary. so, so excited and almost feeling confident and purposeful.

now i’ve dropped back into the 2016 hellhole, 2016-2020 were awful. i think that’s what i’m fearing … i have to remind myself. i’m a different person now. i have a lot more tools at my disposal.

should i choose, i can still practice amnesia and indifference. many people do. i did my duty in 2020. i can say, i do not have more to give right now, grind on making lots of money, and then have all the contributions in 2026 and 2028 🙂

i’m looking forward to it !! yes, i have learned through fire about how being laser focused on the negatives can overwhelm. it does not mean my fellow americans are trumpian, even if now the system will teeter into autocracy. it means they are human. focus on the positives. multiple female candidates. multiple wins for abortion access and a sense that other people care. we can do this. we can build a better future; i will help build a better future.

and i guess i live in MA for now …

but first … sleep and getting outdoors

2024: elections, corn mazes, life

ah, to live in 2024 when one candidate is a businessman, liar, convicted molester, tried to overthrow the government, is twice-impeached, and is an 84 year old. and the other is a an attorney general with conviction about right and wrong and the importance of telling the truth. and it’s a toss-up.

Scary times. i don’t know that my friends here in MA really feel it.

I held an election party to understand the local ballot issues in MA as I’m voting here now. I haven’t really looked into the candidates yet, nor the Medford specific issues.

i went to a corn maze this past weekend with my owlhouse roommates (the place i moved out of). it was a good trip.

the weather was crisp (the 50s), the corn was tall, we got lost, we found the exit, we went to find the bridges, then we rescued a crying kid who had lost their parents. i guess as a kid the corn maze can be quite scary. but from our perspective i found it funny to yell “XYZ, we have your child!” In the end we brought the child to the maze entrance/exit to the surprise of apparently other adults in the group. all was well

i do think about how intensely the young can feel emotions. there is something beautiful and tragic in it. sitting in a church building probably hundreds of years old, with a spectrum from young kids to senior folks, hearing them light candles and tell us what they have been sad or happy about this past week. lurking in the back. i teared up and caught off guard i couldn’t stem it. these sharp, true, vulnerable emotions, alone in my headspace since no one there knew me.

i have dedicated the next while to enjoying tech again. not the applications, not the opportunity cost and the meta of what i could dedicate time to, ignoring the wider world, just the fun of crafting and solving purely technical problems. i’m not quite in the right headspace yet. but moving closer each week

projects blog (nouyang)