Category Archives: Ideas

MIT Course Majors Male Female Ratio, Fall 2011 (scraping with google docs)

as seen at stata; click to see larger pic

when i saw this in stata a few days ago i legitimately wondered for a few seconds which way this was color-coded until I saw a note (cropped out in this pic), which made me smile. At MIT and hopefully in the general world the imperfect gender ratios are really not so noticeable unless you’re looking for it. MIT is wonderful (I say this now because I have no finals this term, all project classes that I loved even if I didn’t do as well as I’d’ve liked)

Anyway, there were no citations! so I decided to go off and investigate on my own and treat it as a data visualization exercise (hi, upcoming IAP / startlabs / c2c / metrify).

EDIT: Derp! I data-failed. You can see civil/environmental engineering (course 1 — see http://web.mit.edu/catalog/subjects.html if you’re confused by the numbers) is definitely off. : / messed up copying and pasting somewhere. Will redo this for realz one day (they have data for multiple years, I want to clean it up automatically).

sorted by %
sorted by course number

~~

http://www.quora.com/MIT-1/What-is-the-gender-ratio-for-each-of-the-majors-at-MIT
data source
Here’s the data I used (I should just email them for the csv’s, but I thought this would be a simple data scraping exercise):
http://web.mit.edu/registrar/stats/gender/index.html
http://web.mit.edu/registrar/stats/yrpts/index.html
What tripped me up is that the women vs. everyone listing of majors is ever so slightly off, and I have to decide how many distinctions I want to make. Do I really care about 2 vs 2A? not really. If I care about 6-1 vs 6-2 vs 6-3, do I just drop 6-7?

assorted google spreadsheets of data from those sources
cleaned data for charts
https://docs.google.com/spreadsheet/ccc?key=0Atr0HZeoMbmgdHFNZU5mbEdYU2VIOE1BUTRLZHJJVkE

raw data sorted for female / undergards
https://docs.google.com/spreadsheet/ccc?key=0Atr0HZeoMbmgdFg0Vk14RW9JQjJtSFFsMXJBZ2hnSEE

raw scraped html table data
https://docs.google.com/spreadsheet/ccc?key=0Atr0HZeoMbmgdFYzNUdEN2NKM2lGbHV6OVN4YkJNTmc

where i tried to (see script editor) automagically clean up the data and failed:
https://docs.google.com/spreadsheet/ccc?key=0Atr0HZeoMbmgdFRkSXp5SW9teEJiT0t5U2FJeTBUVnc&hl=en_US#gid=0

the
google apps code 
I used for that last gdoc

function loadImport() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sheet = ss.getActiveSheet();
var data_sskey = "0Atr0HZeoMbmgdFYzNUdEN2NKM2lGbHV6OVN4YkJNTmc";
var row_init = "13";
var col_ug = "G";
var col_grad = "M";
var col_total = "O";
var col_courseName = "A";
var col_courseNumber = "Q";

var cols = [col_ug, col_grad, col_total];
//create arrayformulas for copying selected data over
//e.g. =ImportRange("0Atr0HZeoMbmgdFYzNUdEN2NKM2lGbHV6OVN4YkJNTmc","Women!G13:G")

//Add Course Names
var finalRanges = [ '=ImportRange("' + data_sskey + '","Women!' + col_courseName + "13:" + col_courseName+ '")'];

//Add Enrollment Data in alternating female / all
for (var i = 0; i < cols.length; i++){
femRange = '=ImportRange("' + data_sskey + '","Women!' + cols[i] + "13:" + cols[i]+ '")';
finalRanges.push(femRange);
everyoneRange = '=ImportRange("' + data_sskey + '","Everyone!' + cols[i] + "13:" + cols[i]+ '")';
finalRanges.push(everyoneRange);
}

//Add Course Numbers
finalRanges.push('=ImportRange("' + data_sskey + '","Women!' + col_courseNumber + "13:" + col_courseNumber+ '")');


// Sets the column header names to match copied data.
//sheet.getRange("A1").setValue(range);
var columnNames = ['Course Name', 'Fem UG', 'All UG', 'Fem Grad', 'All Grad', 'Fem Total', 'All Total', 'Course Number'];
var headersRange = sheet.getRange(1,1,1,columnNames.length);
headersRange.setValues([columnNames]);
//var dataRange = sheet.getRange(2,1,1,columnNames.length);
for (var j = 0; j < columnNames.length; j++){
sheet.getRange(2, j+1).setValue(finalRanges[j]);
}

}
//method getRange(row, column, optNumRows, optNumColumns)



Google Docs Scripting Tutorials
An excellent step-by-step free tutorial on ImportXML (similar to importHTML that I used), how I heard of xpath (which I didn’t end up need to use due to importHTML magic): http://www.distilled.net/blog/distilled/guide-to-google-docs-importxml/
other useful tutorials I referred to
http://blog.ouseful.info/2008/10/14/data-scraping-wikipedia-with-google-spreadsheets/
http://blog.fastfedora.com/2011/08/how-to-use-open-apis.html *shudders* speaking of government apis, I really should be less terrible and fix my github codeforamerica repos which I left half-finished. AUGH. I FAIL. This is why I haven’t applied to any jobs yet. Upcoming: post on why you shouldn’t hire me (with the caveat that I feel okay listing these because these are past mistakes that I don’t intend to make again).
http://blog.ouseful.info/2010/02/15/creating-a-winter-olympics-2010-medal-map-in-google-spreadsheets/
eeee shiny http://pipes.yahoo.com/pipes/
I also attempted to use google fusion tables but did not have a good experience.
http://code.google.com/googleapps/appsscript/articles/writing_spreadsheet_data.html
http://code.google.com/googleapps/appsscript/service_spreadsheet.html
To investigate: http://www.propublica.org/nerds/item/using-google-refine-for-data-cleaning

Machine Shop Safety: Yale and the Missing White Woman Syndrome

MITERS is a student shop on MIT’s campus. Funding-wise, we run independently of any MIT department, which is great since our projects don’t exactly fall into research all of the time. We also don’t have to pay rent or utilities due to being located on MIT property. However, this also means that for better or for worse we are subject to the scrutiny of MIT’s space administrators and Environmental Health and Safety staff.

MITERS runs a delicate balancing act that, for me, strikes home two contrasting trends: increasing paranoia (and safety) over the years (perhaps best dramatized by 9/11 and Homeland Security), and increased interest in the amorphous idea of the “maker movement.” I am torn. Sometimes I am skeptical; Traffic mentions that the number of deaths from parachutes leveled off and has remained constant despite increased safety measures; what happened was that better safety measures made people comfortable with taking greater risks. Sometimes I am cautious; Traffic also mentions that we are all overconfident drivers, believing the past predicts the future as we confidently take to the roads (one study showed that ambulance drivers, because of the urgency of their mission and contrary to what many would think, actually drove “better” — turning sooner and driving more smoothly — than the average driver).

Recently, the Yale death (I apologize to those still in mourning for this crass mention) has lead to scrutiny of the sort that makes all of us at MITERS a bit uncomfortable. Our machine tools are old and no one would mind a newer mill or lathe or better safeguards. However, we all seem to feel instinctively that too much scrutiny on any level is toxic to the creative haven of MITERS (and dear lord, apparently a ton of grad students will be moving into our building which will be fit for the Prime Minister of Singapore to look upon within a year or two — that’ll be interesting, but no point in worrying just yet). In reaction, my fellow MITERS officer Julian Merrick remarked that this situation seems reminiscent of Missing White Woman Syndrome, where there is a disproportionate reaction to unfortunate cases involving people of one class (white and female) as opposed to another (male, other ethnicity). Indeed, a google search turns up a thread on Practical Machinist remarking the same.

One surprising constant throughout the ongoing whole process, however, has been people’s appreciation of the output of MITERS. Uniformly, people have been supportive of MITERS, even if we may have different ideas of the kind of support needed. Ultimately, I am hopeful that we will emerge through this with better safety and our core culture intact.