FASTA files for BRCA1, BRCA2, SMA, KIR, MHC (bash scripting Entrez Direct)

I needed to create some FASTA files of important regions of the human genome for work. BRCA1 & 2 are important in breast cancer research, SMA is implicated in  spinal muscular atrophy, and KIR and MHC are important for your immune system (e.g. why you have to have organ donor “compatibility”). I wrote a bash script that used Entrez Direct to automatically download these files from the NCBI servers.

FASTA files

If you just want the FASTA files to play with, they are here.

They were downloaded from the NCBI website and based on the NCBI Gene database coordinates against hg38.

source code

https://github.com/nouyang-curoverse/GA4GH_regions

This repository contains a link to download the final FASTA files, the ids.csv file I used as the master list of “mhc” and “kir” genes, the bash script file, and the xsl “xml transform” file I used to extract the information I needed from the xml file.

what i learned

Hey, what’s a gene anyway? If you use various databases (ensembl, ncbi, genenames, lumc, omim, lrg), you”ll get a whole range of coordinates for the same gene.

Take, for instance, BRCA2:

Start End Length Source
32,889,611 32,974,403 84,792 ensembl
32,884,617 32,975,809 91,192 lrg
32,889,617 32,973,809 84,192 ncbi entrez
32,889,616 32,973,808 84,192 OMIM
32,890,598 32,972,907 82,309 COSMIC sanger
32,889,641 32,907,422 17,781 CGAP

 

In the end I standardized around using the NCBI Gene database and ignored the rest.

Hey, what’s the KIR gene region anyway? Turns out there’s a gazillion KIR genes, and there’s not exactly a “list” of them. Same for HLA. I just used my best human judgement and culled them from searches on NCBI Gene. From the git repo Readme:

For the HLA genes, I used the IMGT list of gene names, found at ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/fasta/ . For the KIR genes, I used NCBI Gene query to list all the KIR genes. http://www.ncbi.nlm.nih.gov/gene and "Homo sapiens"[porgn] AND KIR

The final list is in ids.csv.

The manual process was to download the FASTA file from the NCBI Gene database.

Screenshot from 2015-01-24 23:47:21

 

But for some genes, that is the KIR and HLA genes, each gene (e.g. just HLA-E) has many “alternate locii” versions.

Screenshot from 2015-01-24 23:49:03

Downloading these by hand would take days. Therefore, I needed to figure out a way to script this. My initial thought was to write a scraper using Google Sheet Scripts or a python library like Beautiful Soup. However, I thought this was dumb, because this NCBI Gene site is clearly the front-end to some database that hopefully had an API for programmatic access of some kind.

After a few days (seriously, this all took way longer than I thought it would) and with help from the biostars community I was able to figure out how to use Entrez Direct and write a bash script to automate the process for all the  genes.

(It has a hacky fix, where the bash script needs to be run again for each line in the file,

for run in {1..48} #change this!

 

I was too lazy to debug it that day and just wanted to get this finally done).

 

The source code is provided here, and contains reasonable comments and spits out some debugging info when you run it.

That’s all.

Leave a comment if you have questions 🙂

appendix

Emailed from me to ga4gh-dwg :

With help from biostars*, I finished pulling out the KIR and HLA regions and fixed the SMA region.

The updated collection may be publicly viewed at
https://workbench.qr1hi.arvadosapi.com/collections/download/qr1hi-4zz18-7zk4muy5grnaqpv/4qji0cfumh25dttlwteo6rj2b83z2b8vz1l0rja3uzo82bf3s/

The updated collection README and scripts are at
https://github.com/nouyang-curoverse/GA4GH_regions (the FASTA files are named by gene name and ncbi gene id, e.g. BRCA1-672.fa)

Of note, I did not mirror the IMGT HLA contents (ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/fasta/), even though that was requested on the minutes from the DWG meeting, due to their policy https://www.ebi.ac.uk/ipd/imgt/hla/licence.html .

Feedback appreciated!

Thanks,
–Nancy

*credit to https://www.biostars.org/p/122680/ and https://www.biostars.org/p/122522/

Bootstrap Carousel Tutorial: getting started, then customizing — example code & bugs

Recently I had “great fun” modifying bootstrap v3’s image gallery slideshow thingy, Carousel (click here to see an interactive example).

It has some nice features built-in, like indicators for what slide you are on, auto-scaling so it works on any browser window size, and an easy way to select options like cycling the carousel continuously, changing the interval between auto-slides, or reacting to keyboard events.

I made some modifications to get it to look and behave the way I wanted. Namely, I wanted readable captions underneath the images. Bootstrap has the captions overlaid on the image by default, and white-on-white doesn’t work so well.

I mostly succeeded… here’s a screenshot:

Screenshot from 2015-01-21 15:36:57
Live at doc.arvados.org someday soon…

…although I still dislike how verbose the Carousel code is. Oh well, our docs site was already using Bootstrap.

The how-to on bootstrap’s page is pretty terse and kind of confusing for me, so I’m writing up a more detailed explanation here.

stuff needed

From http://getbootstrap.com/getting-started/, download the zip file. You’ll need

1. css/bootstrap.css (you can use the .min version if you’d like, which is just compressed into non-human-readable form)

2. js/bootstrap.js (you can also get away with just carousel.js).

You’ll also need jquery.js from http://jquery.com/download/, try the link “Download the compressed, production jQuery 1.11.2”.

Note: We’ll include the javascript files at the end of HTML so most of the page loads quicker.

how to

Alright, now how do you include all these files? (see this gist for the full HTML file, or download a self-contained zip file of all the files: bootstrap-carousel-example).

Put the CSS at the top, either by linking to a .css file or including the CSS directly:

<head>
  <link href="./css/bootstrap.css" rel="stylesheet">
  <style>
    /* put your custom CSS here if you'd like, for instance... */
    .carousel { 
      margin: 1em; 
    }
  </style>

Inside the body section, put your Carousel HTML.  It’s kind of long so just check out the gist. Put the javascript at the bottom, either by linking to a .js file or else including it directly

<script src="./js/jquery.min.js"></script>
<script src="./js/bootstrap.min.js"></script>
<script type="text/javascript">
// Put your custom javascript options here, for instance...
$(document).ready(function() {
    $('.carousel').carousel({interval: 100});
    });
</script>
</body>

</html>

step 3: profit?

If you load the html file in your browser, magic should now happen.

For reference, you can download this zip file where everything is set-up already: bootstrap-carousel-example

step 4: customize

What if we want the images to cycle faster, or the images to not cycle at all?

Okay, I’ve already shown how you can select some of the built-in bootstrap options using javascript:

$(document).ready(function() {
    $('.carousel').carousel({interval: 100});
    });

You can select individual carousels on a page by naming the carousel

<div id="carousel-keyfeatures" class="carousel slide" data-interval="false">

and then using a more specific jquery call

$(document).ready(function() {
    $('.carousel-keyfeatures').carousel({interval: 100});
    });

Alternatively, if you want some option to apply to all carousels on a page / site, you can pull this code out into a separate file

./js/carousel-override.js

and include it in each HTML file

<script src="./js/carousel-override.js"></script>

A third way to change an the options on a carousel is to include it in the HTML, like so:

<div id="carousel-example-generic" class="carousel slide" data-ride="carousel" data-interval="false">

step 5: sweeter customizations!

Now, to make more custom modifications, I had to fiddle with the CSS. A few hours later, I made the modifications you can see at this gist. Alternatively, download the zip file and uncomment the line

<link href="./css/carousel-override.css" rel="stylesheet">

trouble-shooting tips (bugs!)

First, cool tools:

1) Use firefox or chrome’s powerful built-in developer tools. The default shortcut to bring them up is ctrl-shift-c (you actually can’t get rid of our change this shortcut easily, which is annoying).

You can delete things, edit things without refreshing the page or modifying files, and other wonderful tools.

Screenshot from 2015-01-21 16:44:34
Screenshot of Firefox’s “Inspector” tool in use.

2) Use a Javascript hint tool, such as http://jshint.com/.

If you just use the Console tool in Firefox Inspector to see the javascript error messages, it is fairly cryptic.

SyntaxError: missing ) after argument list

Tools like JSHint can provide some more helpful feedback about what went wrong.

bugs

these things tripped me up:

1) the Bootstrap website makes ( and { look very similar on my computer, so I skipped the curly brackets and used

$('.carousel').carousel( interval: 2000 )

instead of the correct

$('.carousel').carousel({ interval: 2000 })

UPDATE 03 April 2015

I filed a bug on github: https://github.com/twbs/bootstrap/issues/16225, or as follows:

Font in code blocks on http://getbootstrap.com/ does not distinguish { and ( clearly, can cause anguish for newcomers to javascript. Please remove the Courier New font fallback option. Just monospace works well.

In Detail

I was following the the example at http://getbootstrap.com/javascript/#carousel, and my brain removed the “extra parenthesis” I think, similar to how you don’t notice I doubled “the” in this sentence unless I point it out.

This is a screenshot of the bootstrap website “code” block font currently on my computer:
js-bootstrap-font

Please remove the courier new font fallback option. Just monospace works well.

 code, kbd, pre, samp {
   font-family: monospace;
 }

Results in the much-better

js-bootstrap-font-better

(Most open-source projects in have nice websites but no issue tracker nor contact info for their own website or their documentation, only for their github repo around the actual code, and it’s not always clear where to file a bug…)

END UPDATE

2) I didn’t realize I needed to include the jquery file first, so I got cryptic messages like “$ is undefined”.

3) I spent a while debugging a weird gap that left the rightmost controls in midair. It occurred when the browser window was too big. Turns out I needed to hardcode a max-width, which I just hardcoded to be slightly smaller than the smallest image width I had.

appendix

Here are the files used in this tutorial again:

Gist

bootstrap-carousel-example.zip

i am a meat reducer (some musings on not-vegetarianism)

(no, meat-reducer is not actually a thing)

Over the last year, I’ve been toying with reducing my meat intake. Here are the causes and the successive iterations I went through.

chickenstareoff

i won’t spend money on meat, but i will eat meat going to waste

The initial deciding factor was founding a company with one other person who didn’t eat much meat. I didn’t like meat that much anyway (too many dry chicken breasts I’d shoved down to avoid food waste), so when buying company rations it was easier to stockpile food we would both eat (ramen, frozen cheese pizza, frozen bean burritos, cake mix cake, cheetos… I had such a terrible diet. YET SO GREAT).

His general philosophy was to avoid spending money on meat while still prioritizing not wasting food. This seemed agreeable to me, since I dislike food waste more than I dislike killing animals by a wide margin. Fish were deemed okay to make a complete diet easier, and so company celebrations were at sushi restaurants (dubious from an ecological standpoint, honestly, since over-fishing is a big problem).

The other major point was to not take a stand, to got the opposite direction of evangelist vegans or vegetarians who believe they have the higher moral ground.

first dilemma

Chicken instant noodles uses chicken stock. But I dislike shrimp instant noodles. Is paying for chicken instant noodles okay?

I decided it was, because most places don’t stock vegetarian instant noodles.

second dilemma

I hosted a boatwarming  bbq party. Was it okay to buy meat burgers and hotdogs for my friends who no doubt would expect it to be there and might find eating veggie burgers weird and unfulfilling?

I decided it was, although I’ll probably try to avoid the situation in the future.

Apparently, in mixed-diet household house parties, the vegan person might supply the vegan food and the omnivores buys the meat for everyone else.

third dilemma

When treating someone to lunch or dinner, is it okay to pay for their meat selections? We took our interns out to lunch several times.

I decided it was okay.

what options are there: few!

It was fun to go to restaurants and see what the vegetarian options were. They’re often not great. For instance, dim sum, that favorite of my group of friends. Good luck filling yourself up on anything vegetarian at dim sum. Crispy taro thing? Filled with meat inside. Long noodles? Secretly has shrimp embedded inside.

it’s hard to remember at first

I would catch myself ordering my go-to dishes at restaurants and then go “Derp.”

leather and down

Then the question comes up. What about spending money on animal products? Leather work boots are comfy and weatherproof. Down jackets are nice and warm. I decided spending money on these was okay.

it’s hard to not eat meat even when i’m not buying it

1) There’s a lot of free food around MIT, usually not-vegetarian

2) When I visit my parents or when they visit me, they cook me large quantities of meat (maybe I will make an exception and declare to them I am a meat reducer next time…)

It’s hard to explain your dietary preferences or avoid talking about them when you make them arse complicated

I had to declare my food preferences for ordering company lunch. (My company, <15 people, is a mix of vegan, vegetarian, and omnivore). My first week at work, my parents had cooked me a large batch of meat, so I ate it every day. Additionally, there’s a decent amount leftover from company lunch and I try to divert everything from the trash, including the meat dishes.

I wasn’t sure how to explain that I’m sort-of vegetarian except where chicken noodles and food going to waste are concerned, since really, if you’re sort-of vegetarian, you’re not actually vegetarian.

what about deer killed by bow and arrow from a friend of a friend

I’m not joking, this actually came up. I decided that it was okay to eat this meat, since it was definitely not factory-farmed.

why?

enh, why not. A lot of my friends are into it, I don’t actually like meat that much, and factory farming is definitively terrible.

also it’s delicious

i really want an eggplant parmesan sandwich. SO DELICIOUSSS. and hummus and raisin bread. and saag paneer. mmm. quinoa, onions, and sweet potatoes. sweet potato pizza. nachos and cheese. microwaved frozen vegetarian dumplings.

these are all super-simple college cooking foods.

currently

At work, I sit halfway between the vegetarians and the omnivores and pick exclusively from the vegetarian side if I can reach it, but if there’s leftovers I’ll eat them meat and all.

I never buy meat at the grocery store nor at restaurants. If I’m treating someone, I’ll pay for their meat selection. If it’s free food, I’ll avoid eating meat unless I’m really hungry, since usually someone else will get around to eating it.

Currently, I am a meat-reducer.

 

projects blog (nouyang)