AMIA 2024 – [Official Day 1] quick thoughts

my brain is totally fried

right, saturday was workshops were i was upskilling and connecting with people on linkedin and sort of my coworkers

sunday — when i really started to dive into informatics. sunday was infinitely more exhausting, just way more people. interesting thoughts but no one to talk to about them, not possible to go to all the sessions. but i did learn yesterday (saturday at WINE, the women .. informatics… networking thing) you can swap between sessions during paper presentations, so i marked mine out carefully and have been going in and out of sessions.

sunday morning

morning i went to sessions on social media research. as usual i was late, insomnia and turning off my alarm and i find it really hard to stop ironing once i start but i’m also bad at it. but it was freeing that no one knowed or cared if i showed up on time, since my lab group went to alcatraz without me … first half of workshop was a bunch of paper presentations. afterward was group discussion. and at the end there was coffee break, so you could stick around and talk to people.

i learned through brief chat after: best practice is to contact a forum moderator and explain what research you want to do. if they don’t respond, tell university irb. university irb will likely say exempt since it’s social media research. prior big discussion on whether this is reasonable, observation that junior faculty are encouraged to use social media if irb approvals taking too long. note that irbs are overworked, so either have inter-institutional irb focused on this, or perhaps have subject matter experts to call in on these topics. dataset release: also occurs, and just run an off-the-shelf name stripper. (i stressed that names will slip through, but i guess… it’s… fine?)

i went over my idea of how to flip the legal power dynamic: companies agree to users’ privacy policy when we agree to theirs. incentive: perhaps just that

keynote

the keynote in the early afternoon was a really good crash course on theoretical ethics. no longer felt so obscure and hard to understand. i should start watching more talks.

> differential privacy — the simplest example is randomized response. for a survey about behavior people might lie about, e,g, their social distancing habits, we can give people plausible deniability. tell them to flip a coin, if it’s heads tell the truth, if it’s tails, flip the coin again: if it’s heads, tell the truth, if it’s tails, tell the predetermined response. that default response is yes to illegal behavior, so you have plausible deniability. on the individual level, who knows; at the aggregate level, you can work out mathematically an estimate of the truth.

i’d heard of this technique long ago, but never realized it counted as a differential privacy algorithm.

throw money at it, aka bug bounties, and concatenate models for more accuracy without less fairness

bounty a: find subgroup with different outcomes. bounty b: if not only subgroup where outcomes different for current model, but also find model with optimal performance on that subgroup;

then you can just concatenate models! and don’t have to trade fairness with accuracy — strict (theoretically proven) increase in accuracy. if it’s in subgroup z then use model B, otherwise use model A. downside: do not know full model complexity ahead of time so risk overfitting.

bug bounties in traditional security: avoid adversarial interaction (propublica, facebook, compas) and shift to more collaborative interaction.

(will insert rest of summary of talk tomorrow)

sunday afternoon

i flitted between sessions.

demos: people’s screens were illegibly tiny, and i couldn’t get my camera to zoom enough. next time i’m bringing binoculars or a phone camera binocular attachment. one presentation was mostly a webapp llm, but the enthusiasm was catchy. (the topic was connecting pregnant women to sources of info). little practical tidbits, like they tried conversation history but the llm tended to stop referring to primary srcs and start reflecting user inaccuracies. makes sense since the user input will be the most recent input. the other was a recommendation for LLM engineers handbook. also, models can run on laptops (macbook m series i’m guessing?).

-> lost-in-the-middle: issue with RAG where references at the beginning (ranked relevant) and end (due to how llms are trained to predict next token) are highly rated but ones in the middle are ignored.

epic: packed. the ten minutes i heard were a lot of “physicians hate changes usually but they actually sometimes liked this!” a lot of focus on a. we have so many users! and b. they’re happy with us! and not a lot of mention of hallucinations and integrity checking. i will have to find someone who went to the epic session to debrief me. otherwise, it matched my fears of industry: focused on putting gen AI in everything regardless of if reasonable or not and not really caring about unknown bugs.

~ epic: my own thoughts on ethics: i find there are multiple “choice” frames to approach this from. one: if it makes care 1% better for 1000 patients and 50% worse for one patient, is it worth it? two: if taking an extra month to implement some quality assurance avoids that one bad outcome, is the extra month worth it? three: what if it means we can provide care for one person we couldn’t see at all before due to lack of physician time? etc.

JAMA: i learned that physicians are exhausted and frankly rather scared of how AI is taking over. the FDA has approved over a thousand devices with AI components the past year. there is work on a podcast to explain recent research to physicians.

JAMA – methods move fast: if used same methods as five years ago in a paper submitted today, paper would be returned with tons of methods critiques

child maltreatment public (state-level) policy analysis: how different state laws correlated with different pediatric outcomes.

used LLMs to summarize a bunch of public policies. does maltreatment definition include physical abuse? is reporting centralized or not (hotline)? then factor analysis to find factors to make cohorts (find underlying patterns in policies): mandated reporter? reporter training? then cluster around to find specific factor combinations (few/moderate/high levels of training, or penalties, etc.). then look at outcome differences. (will insert screenshots later)

policy -> difference-in-difference: check if results still hold between 2019 and 2021 when policies were in-place and not-in-place.

dinner / expo

they have free headshots! i desperately need one, but [tmi] i have a giant pimple right now :'( maybe that’s what AI is for lol. the expo … somehow i was completely uninterested. i think i felt rejected by my coworkers. i called a friend for moral support lol

personal thoughts

overall the conference is more diverse than i expected (which isn’t saying that much though tbh). i wonder how the stats compare to IROS/RSS/ICRA.

i need to remind myself i’m utterly new to this field and i’m not here presenting a paper or poster. of course, i will have to work harder to connect technically with people, and surface a whole different set of my interests than i usually nerd out about. people here are not excited about welding robots or maker faire or engineering education. nor are they nerding out about the latest theoretical advances and architectures and benchmark leaderboards. people here talk about weird interesting healthcare things.

i’ve only officially been to robotics conferences as a presenter

so there’s multiple factors at play here for my social sore thumb feeling.

mostly, i was really surprised when a fellow attendee who also went to the boston VA for many years declined to connect on linkedin. but i guess we exchanged conference app qr codes? in any case – the main thing i learned talking to her is that there a huge disparity in resources in regional VAs. i thought it was a national organization but apparently not. we in boston have the resources to figure out how to get at data sets but researchers in other VA regional sites do not. even though the data sets are VA datasets! at least, this was her experience a decade ago. also, generally feeling a bit surprised that my coworkers are not meeting up between sessions and discussing topics in real time constantly. i guess we are not really in the same lab. i gave up on organizing meals with my labmates and just messaged randomly people (around my career level) to find someone to go to dinner with. a person from WINE invited me to a dinner with a bunch of student volunteers for the conference. note: if you join the working group, there’s a chance you can be invited to volunteer for the conference and the registration fee is waived. (but travel, hotel, food not covered).

people expect postdocs to have a research idea and focus, so i need a different self-intro hook.

that makes sense. i’m not really working with a professor directly on a specific idea. as a result of my lack of topic, i need a different self blurb than “i do data science at the VA.” i think the only thing i’ve done of interest to folks is my thesis research topic.

alternatively, i could just not talk to anyone and do my own work. but oh, i felt a bit sad to not have people to compare notes with after the sessions. i really wanted those interesting and thought-provoking technical conversations i would have late at night in undergrad. (i never really got that in grad school except for classes, not research). i really did yesterday. but y’know, this is like one day into the field for me depending on how you count lol

grad school – i wish that after leave of absence they’d invested in my success instead of throwing me in and asking me to perform better than other people with a handicap already. i learned so much in so little time from this conference ! !

i guess for many of the older/senior folks, AMIA is like putzgiving, where people fly in every year and it’s a reunion.

AMIA 2024 – It’s a house of cards 0: the LLMs are supervising the LLMs (W05) Empowering Healthcare with Knowledge-Augmentsed LLMs – Innovations and Applications

Hullo! Long-time no see, actual blog posts on career / tech oriented topics.

Today I’m attending the American Medical Informatics Association 2024 Annual Symposium in San Francisco. My first fully-funded work trip, from flight to food (per-diem) to conference registration and hotels 🙂

For now, since time is short between sessions, I’ll focus on my own reflections. Later I can go back and (especially with the slides as reference) fill in a synopsis of (what I learned) from the workshop.

The first session of the day was a workshop on LLMs. I thought this might be more of a “work” shop but it was actually mostly presentations of recent papers by several speakers, and a panel. This is for the best since there were several dozen attendees. I’m actually really enjoying myself regardless. I am getting dragged from the dark ages to the new frontier of LLMs, and it’s exciting to see an entirely new field of research as well as application. Here I’m entirely anonymous, a blank slate. Mentally, I can come in with few preconceptions or emotional ties.

Note to self: get to conference early in order to avoid badge line, though it did move quick. Also exit early to get the snacks (nutri-grain bars and such).

So what does the frontier of LLMs look like? From my outsider perspective, it looks like a crazy house of cards to be honest, with LLMs on LLMs on LLMs. To be fair, I walked in (late, I got absorbed into ironing lol) and the first presentation included self-verification, with LLM-LLM checking compared to human-LLM checking. Cue visions of runaway AIs merrily herd-guiding themselves into complete ethical chaos. A lot of my work I doubted as being stuff I made up hackishly as a feral coding child that didn’t have the engineering chops to implement better solution. Surely the experts out there have far better methods. But nope! Indeed even for RAG (retrieval-augmented generation, like a librarian retrieving specific references for you, the LLM, to consider) the presenters are relying on LLMs (LangChain) to create the chunks to feed into the LLM.

Wat.

Maybe this is a solved problem and I don’t know it? Hopefully later workshops will reduce my skepticism since many people seem to just take it as part of the cog. I mean, if it gets you presentations and workshops … well … I guess you don’t have to innovate on every single part of a system to get a publication and move the field forward. Makes sense.

Also: I do have to consider that humans make plenty of mistakes also. Can we compare to self-driving cars, which (last I checked) have lower accident rates in several categories than humans? (though I need to check if this is simply because self-driving cars currently drive in easier conditions over fewer miles than humans). It’s entirely possible LLMs may be inaccurate but still better than humans. They incorporate a larger information base. For instance, locally humans may make a mistake (as in a MITOC seminar I attended) where e.g. instructors were not sure of signs of frostbite on dark skin instead of fair skin and had to look up the information. For LLMs, likely the information already “exists.”

Should we demand higher performance for LLMs?

Unfortunately, LLMs have hallucination and other errors that deviate from human intuition. (Consider how LLMs struggle to answer “how many r’s are in strawberry.” The conjecture is that LLMs are trained at the chunk-level (more like words) rather than character level, and contain no sense of logical reasoning since they are trained to predict the next token, so will struggle on these. However, this is clearly not intuitive nor expected to a vast majority of users). I view it as drawing a “bubble” of stability/robustness (as in robotics) that we can predict well with humans. But we have little intuitive idea of the boundaries with LLMs.

From software and systems engineering, we have the concept of CI/CD, or continuous integration / continuous development. Essentially, we write tests that are automatically run to monitor (production) code output. I could imagine something like the CAPTCHA system which I think relies on humans verifying other humans. We could imagine a system where a subset of predictions by either LLMs or physicians are sent for a “second opinion” by another physician (or LLM?) for quality monitoring.

The other approach is creating tests – creating the benchmarks, which apparently don’t exist for evaluation hallucinations. The panel at the end had speakers with many examples of how LLMs had failed, even at longstanding tasks such as checking for negation (e.g. patient did not receive xyz drug). I hope that there are efforts to crowdsource such examples. (A little voice in me is like: could a nefarious party use such a dataset? This voice has distracted me negatively in my career so I’ve decided to ignore it. I do believe the positive outcomes outweight any negatives at this stage. Certainly for me personally…).

I do need to read more in detail on how the LLMs of today are trained, since of course OpenAI faced similar issues with monitoring their chat output. They solved this in part by having humans read the output and answer multiple choice questions about the LLM output.

Another thought: a lot of hand wringing about how to ensure fairness across diverse patient populations. I know this is a technical talk, but part of me is like “oh easy if you actually put money on the line it will be solved.”

Another question I’ve yet to have answered: can LLMs be retrained (or finetuned) to embed a concept of uncertainty and self-evaluation of bias from the ground up? I am still skeptical of the original data sources LLMs are trained on, as well as the opaque safeguards implemented (though this does seem like a fun black hat activity now that I realized it’s actually a security question). I suspect this is too resource-intensive for companies to find it worthwhile, alas. (Another ethical issue I have with working in LLM space — I’m not working on climate change mitigation for sure, not even on research to make LLMs more efficient. Again going against my instinct and reminding myself to focus on core technical skills for now, as that’s what holding me back, not my varied social justice issues and ideas hah).

I am also curious if we have a sense of when halllucination and trust could be considered solved. If we have a clear vision of that, it’s possible we can work backwards.

All-in-all, it’s been really intellectually stimulating. I do feel a difference in my own confidence (arrogance) with my PhD. I wonder if this is how fresh grads with more confidence feel. I don’t feel a pressure to be perfect — to seize the opportunity to ask interesting questions all the time, to network obsessively. I can focus a bit more on just thinking things and doing happy things instead of being anxious about everything I’m not doing. It’s interesting to feel like “of course I will have interesting insights given my background and skill set, all the people who doubt me are wrong.” (Okay, it’s arrogance from a root of insecurity lol). Instead of acknowledging that everyone here will have really interesting insights and my task is to give space and respect people with different backgrounds and work to find these insights (or create them together).

Hopefully this will be reigned in by (in the negative?) performance evaluations one day, in a good and constructive way. Or perhaps in the positive by more genuine confidence in my skill set that lets me be open to valuing and respecting others more.

Honestly, I spend so much time right now just reading novels or stressing about the future and relationships instead of just making cool coding projects. But I do feel like I’m starting to come out of that and start living in the present. It will just take time.

Time to sneak in a bit of lunch in 12 minutes? I skipped breakfast. I guess I’ll just live off of the calories from coffee.

as a woman: scared and angry. but still proud of harris

at times like these i see myself how the world sees me: a woman

i want to be angry at my friends somehow since they are the closest to me and i’m terrified.

but i also feel the enormity of running for president in the united states. just how many people have to put in work for how many hours for this uncertainty of a venture that is american politics. so many smart, bright, talented people.

and to be honest, i think harris did amazing. she started three months ago (!). what have i done in three months? barely anything hah.

in medford — 75% of people voted for harris. i’m not sure the turnout, i think around 70% across the state. so there’s probably around 1/5 people who actively vote for trump here. in somerville it’s closer to 88%. 1/10 people or less. half as many. it’s weird to put it that way. but there are the big trump flags here that make me feel like i’m in georgia.

i’m angry at my friends for irrational reasons, as if I would be any happier if our lives became devoted to politics. it is good to be older and wiser than 2016 me who fell off the rails. i need to remember that.

i wasn’t sure how to tell people here that i didn’t feel the same amount of energy and enthusiasm as in 2020, when we had the biden checks (heh) and free time to campaign (and presumably other college folks were home too). i really got invested / enthusiastic over the summer in 2016 and 2020. (well, i was in georgia both elections).

i can feel it — the bubble i live in, and the bubble people my friends’ groups lives in. we don’t associate with the other — we don’t welcome people having different opinions. because of fear and anger. as if somehow my friends misled me to be optimistic. that i was the one lulled since the people around me were feeling secure. herd thinking.

my friend told me today: “i don’t know anyone who was enthusiastic about hillary clinton,” and it was interesting to me — in georgia i didn’t know anyone who was enthusiastic about sanders … bubbles in bubbles. sanders was a rude shock to me in the middle of my excitement, and my deep dive into how the republican smears worked. to be honest — i was wildly enthusiastic about hillary clinton haha. i am saddened by people who want to secede — as if the millions of democratic voters in georgia mean nothing. we delivered two democratic senators — for the tie breaking senate — in 2020! warnock and ossoff. i’m still so proud of georgia for that. and georgia voting for biden — what a moment.

but yes. i was just starting to get enthusiastic. the first female president of the united states! cat ladies! hella competent and holds her in debates! it would be such a relief to feel like, we are still on track for climate change, for abortion access. three months was such a short time … i haven’t even gotten arount to merch yet. i’m definitely going to though, the house three doors down has a giant trump sign. what’s the point of being a homeowner if i can’t have flags lol.

i’m trying to remember — wow, in my lifetime, i have voted for the first black president, and in just the past decade i have voted twice now for female candidates! HRC was: “the first woman to be listed as a presidential candidate in every state and territory” per wikipedia. the world is changing — and i am part of it.

i’m angry at my guy friends who don’t feel as scared as i do. who aren’t mad about roe v wade to the bone. i’m scared of dying in childbirth but it’s not their fault, i have to remember. i do see the loss now — i feel like the loss of roe v wade ripped away the option of building a home in georgia since i lost confidence in the medical care there. it’s not something abstract to me, it feels very much a part of my life and my life choices right now, in the short term (5 years).

but it’s not their fault. and if this helps me be more open about these topics and helps change minds, then that will be a positive result.

(things i’m mad about: trump being: a repeated molester, convicted felon, someone who literally tried to overthrow the government and started a riot in the capitol that killed people, suggested shooting protestors, threatened media freedom, etc. things i’m sad about: his policies are going to be worse for the middle class economy than harris’s)

(on top of the disgusting way it played out. that obama’s pick was blocked for months, and then trump’s pick got squished in in a week. ugh.)

and the global politics. (ah, don’t be mad at my friends who were so mad at the democratic party lately. it’s because i’m triggered from 2016 when i was way less emotionally competent lol).

i am still part of the world changing. focus instead — what can i do? the positives — abortion access is being enshrined in state constitutions and can continue to be enshrined. i can focus on becoming rich and powerful — and using my knowledge and skills to create climate change solutions. it feels weird — as a kid i was laser-focused on international development and the UN millenium development goals. maybe i need to travel more again. but now i feel the pressure of climate change, of how it’s killing people every day.

ah. i really did want to spend the next 1-2 years focused on pure technical achievement. i feel like i did my karmic duty in my former relationship and in my thesis topic and i was ready to put all the naysayers (including myself) in the dust. i was so excited to not think about any social issues for a while and just grind on technical content and build cool things. notch in and checkpoint a bunch of career wins and a sweet salary. so, so excited and almost feeling confident and purposeful.

now i’ve dropped back into the 2016 hellhole, 2016-2020 were awful. i think that’s what i’m fearing … i have to remind myself. i’m a different person now. i have a lot more tools at my disposal.

should i choose, i can still practice amnesia and indifference. many people do. i did my duty in 2020. i can say, i do not have more to give right now, grind on making lots of money, and then have all the contributions in 2026 and 2028 🙂

i’m looking forward to it !! yes, i have learned through fire about how being laser focused on the negatives can overwhelm. it does not mean my fellow americans are trumpian, even if now the system will teeter into autocracy. it means they are human. focus on the positives. multiple female candidates. multiple wins for abortion access and a sense that other people care. we can do this. we can build a better future; i will help build a better future.

and i guess i live in MA for now …

but first … sleep and getting outdoors

projects blog (nouyang)