Category Archives: Thoughtful

ga4gh proposal. Beacon Versioning: Simple, Current, Future (three use cases)

Foreword

I am posting this on my blog because I spent a lot of effort on this email & there’s no reason for it to be buried inside a private mailing list.

if you, dear reader, are like most of my real-life friends and call biology “bi-lol-logy” — ignore this post, save your sanity, and come back to bioinformatics in a year. i think things will be much better then. in fact, i’m not even going to attempt to explain what’s going on here except to link to ga4gh: http://ga4gh.org/#/beacon.

otherwise… down the rabbit hole we go…

Email

Hi all,

I’d don’t want to stall momentum, since I care much more that Beacon v0.2 happens rather than a particular Beacon v0.2 happens, but as an engineer I’d also hate to see us be too hasty and make poor design choices.

Unfortunately it’s possible to describe a single variant in multiple ways in VCF

Yep, that’s concisely the problem with state-of-the-art.

From my perspective, there are three conflicting use cases and we’re trying to smush them into one Beacon/Server/Variants API spec, which may or may not be advisable.

USE CASES

1. Simple

You may only query for one position, limited to precise string

  • “Does AAG exist at position 1” –> implicitly asking, does an insertion of “AG” exist between positions 1 and 2 on the reference genome
  • Vision is “painless way for organizations to visibly commit to wanting to share genetic data by adopting a single standard, i.e. GA4GH”
  • Sidestep genetic data privacy and security issues by trading [usefulness to research] for [painless adoption]

2. Current

VCF-based

  • “Does an insertion of AG exist between reference coordinates 1 and 2”
  • Vision is “people share useful data for researching functional impact using the current industry-standard, VCF” — definitely better than silo-ed no-sharing world, but
    Source: Quote Investigator. Attribution: 1942 June 3, Florence Morning News, Mutt and Jeff Comic Strip, Page 7, Florence, South Carolina. (Newspaper Archive)
    Source: Quote Investigator. Attribution: 1942 June 3, Florence Morning News, Mutt and Jeff Comic Strip, Page 7, Florence, South Carolina. (Newspaper Archive)
    • Lamppost = current standards, which sort of support population/functional impact research if you try really hard
    • Dark = hopefully the future, where it’s painless to query for things like frame-restoring indels
    • …I hope this lamppost analogy makes sense outside the confines of my brain…

3. Future

population-based / reference-free

  • “Does an insertion of AG between query coordinates 1 and 2 exist where-ever the query ‘ATTATAGAGAG’ is best aligned on each genome in the population”
    • query string ‘ATTATAGAGAG’ used to locate position on genome
    • specific variant we’re looking for is AG, that is, we want to find genomes that say “AAGTTATAGAGAG” in the place where population-wide most genomes say “ATTATAGAGAG”
  • Vision is “future-oriented standard for developer to implement toward / iteratively develop”

IN MY OPINION

My gut feeling is #3 is beyond the scope of Beacon v0.2 and we should be clear that Beacon v0.2 is meant to support the #2 use case.

My personal opinion is that Beacon v0.2 should actually be a standardization of use case #1, but it seems like I’m in the minority (if anyone else cares about #1, please speak up!).

FURTHER NOTES

With respect to, “+1 for consistency with other GA4GH APIs” —

My concern is that currently the GA4GH APIs are very VCF-oriented, and VCF is very reference-oriented and not very population-scale-oriented [1]. On the other-hand, Beacon is population-oriented (no sense in having a Beacon to query two genomes, that doesn’t preserve anonymity at all).

My gut instinct is that the Variants API will move toward being population-oriented (reference-free). Consistency is very important, however, I think we should be cautious about moving toward consistency with Variants API in its old state. In fact it’s already starting to reflect this shift —

“graph”, in which all variation is associated with `Allele`s which may participate in `Varaints` or be called on their own. The “graph” mode is to be preferred in new client and server implementations.

[1] people are spending months merging VCF-based datasets and then indexing them with Tabix and wormtable, then they have to reindex for something as simple as querying a subset of the population … oh, I could got on but I hope I’m preaching to the choir here. If not, I’d much appreciate knowing where I’m incorrect if you’d care to explain. I’m certainly not an expert in bioinformatics.

THANKS

Thanks Mark Fiume for taking the lead and Stephen Keenan for organizing Beacon work.

CARBON COPY?

I think more lists (specifically ga4gh schema, & ga4gh server) needed to be included in this discussion, or we need an “Issues” for all of GA4GH, or something, but it’s getting very hard to keep tabs on Issues, some of which are closed, in three repositories at once. Or maybe I just need to “watch” and get email notifications on all three repos? How are people handling this crazy explosion of GA4GH work?

PUBLIC MAILING LIST

I also would note that I strongly prefer all ga4gh mailing lists be made public going forward. It’s really ridiculous to have people forward me emails from 3 different private mailing lists and link me to 10 issues on 3 repositories.

Although ga4gh-dwb-beacon is private mailing list :/ I’m still emailing instead of opening a public Issue on Github because it keeps feeling like “my calls are dropping” and no one is hears me…

other links

wow, the more i poke around on ga4gh github the more related links I see… here are some I need to read

https://github.com/ga4gh/schemas/pull/257/files

How your “Team” pictures influence my desire to even apply

Lately it’s the “startup thing” to put pictures of your team up on your website. Now, I don’t speak for all female engineers, but as a female engineer who’s kind of sensitive about these things, fairly or not, it’s an immediate turnoff to see pictures like this

Screenshot from 2015-02-25 17:59:40

Screenshot from 2015-02-25 17:59:36

It goes roughly like this:

  • I open my email.
  • Someone forwarded me an email. “Cool drone startup that’s looking to hire!”
  • I click the link and read about it, then somewhere along the way I see a picture of the Team.
  • I get irked and leave.

Sure, you all could be a bunch of egalitarian feminist dudes, and if I just go work for companies with a lot of females already I’m exacerbating the problem in some ways, but really, just kind of a turn-off.

If you at all care about getting a more diverse team, here’s two simple solutions:

1) Just don’t post pictures of your all white-male founders / leadership / engineering team. No pictures are better, then I can’t form preconceptions (yes, I recognize the irony here) about your team. Also, the more people you have, the more I’ll look specifically for females in engineering leadership positions. Mixing in your female HR / support department does not help you.

2) Or, just put a simple statement to the effect that you’re aware that your team is very white and male and that you’re working on it.

That’s enough to let me know that you care, which is a big deal to me. Working in a place where no one cares about feminism or feminism is an awkward topic would make me bitter and unhappy (and I’d leave) within months. You’ll have to word your statement to overcome people’s jadedness (“yea, right, that’s probably just their HR talking.”) and show that your statement reflects your company culture.

Oh! Ladies, one thing I’ve discovered is that older guys are pretty alright. Something about marrying and having a family… My current co-workers are almost all older white males, but it’s in some ways a lot more comfortable than hanging out at MITERS, because feminism isn’t a dirty word or somehow less important than the latest in kilowatt lasers.

Today, I am a 41-year-old father and husband whose feelings on this issue have changed. I have come a long way since being a single, 26-year-old state senator, and I am not afraid to say that my position has evolved as my experiences have broadened, deepened and become more personal.

Congressman Tim Ryan

(Source: Rep. Dillon, Rep. Ryan)

p.s. This also goes for conferences… I’m looking at you, NERC.

nerc
nerc speakers

Do you trust the police? (my response to Patel)

 

patel
http://www.gofundme.com/m757pw

The tragic story of Sureshbhai Patel‘s arrest and serious injury (partial paralysis) 1.5 weeks ago (on Feb 6th, 2014) resonates strongly with me for a few reasons, despite the fact that I am not Indian.

I am not here to discuss police brutality or white privilege or any of another alienating terms that people “know their opinion on.” I am here to discuss my personal feelings and experiences.

Here is the dashcam video of Patel’s arrest and injury, should you wish to judge for yourself.

fifty-seven

 

Patel is 57, around the age of my parents. I love my parents a lot. Although 60 years old seemed crazy old to me a few years ago (when I was a teenager), my parents are in good health and kick my butt in getting stuff done, including physical chores.

grandparents leaning in

 

I am friends with a Chinese couple (the husband works in biotech and the wife is a Harvard grad student) in my area. They recently had a baby, and their parents flew in from China to help take care of the baby. (I’m not sure if this is a thing in European cultures? but I think it is common in Chinese cultures for the grandparents to take care of the children while the parents works. See here and here). So immigrants bring this cultural mindset with them.

(By contrast, I’m an ABC, in other words I was born and grew up in the US, and find the thought of my parents taking care of my baby to be weird and unrealistic… Actually, I’ve instilled in myself a deep suspicion of babies, so the words “my baby” :s eww).

Like Patel,  my friends’ parents speak almost no English, but are by no means stupid. They (the grandparents) constantly invite me over for dinner and cook delicious foodstuffs for me, oh man, now I’m hungry. Sadly I can’t find any pictures of the food they cooked right now (possibly I don’t have any pictures). They always send me home with tupperwares of leftovers too. As a result, I’m quite fond of them, even if making conversation is a bit of a struggle due to language, cultural, and generational differences.

English as a Second Language

I’ve known from a friend’s experiences that simply not knowing English or not understanding it well enough / fast enough can get you arrested, even if you’ve spent decades in the US, worked your way up from being a dirt-poor immigrant, and now contribute to the US economy, pay your taxes, and are generally an upstanding citizen.

To get your record cleaned of an arrest when you did nothing wrong is still an easy few thousand dollars in lawyer’s fees. They did not benefit from police cams; they swept their arrest under the rug out of shame. Shame for what? Not being born in the US? Years later, it’s just something to be bitter about and not ever bring up again after dealing with it.

As a result, since a young age I’ve been open to the idea that our justice and policing system is not perfect.

Do I trust the police?  Unfortunately, instinctively I do not.

due to my aforementioned experiences.

I temper my gut feelings with the knowledge that I should look more into the research behind whether police are trust-worthy, since my personal experiences will only ever constitute a part of the whole picture.

global comparison

For instance, having visited other countries, I am able to get some prospective and understand that the US is pretty high up there in terms of standards of governance. In 2008, the US ranked in the 90-100th percentile in terms of perceived corruption:

http://en.wikipedia.org/wiki/Worldwide_Governance_Indicators
http://en.wikipedia.org/wiki/Worldwide_Governance_Indicators

Or from the world bank in 2013, with links to the methodology:

http://info.worldbank.org/governance/wgi/index.aspx#reports
Interactive version: http://info.worldbank.org/governance/wgi/index.aspx#reports

Personally, I’ve visited several other countries where, in broad daylight, police officers will deliberately lead you (tourist in car = people with money) astray or ambush you, waste your time and make you sweat, and then threaten you with a ticket unless you realize what is going on and pay them a bribe.

However,

None of these facts excuse us from working together to build more just and trustworthy social systems. Our rule of law in the US maybe be good, on average, compared to the rest of the world. But you or I could easily be caught in one of the “outlier events” and lose thousands of dollars, our mobility, or our life. Averages are small comfort then.

Thanks for listening.