Digital Silence
The apparent gender gap in scientific computing,
why it matters,
and strategies for change
Kezia Manlove
Penn State
April 16, 2015
http://slides.com/keziamanlove/digitalsilence
and computing and science
are increasingly connected
We live in a digital world
"[The scientific discipline] concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems"
Methods
Scientific computing is on the rise
Web of Science
...and so is female participation in STEM
... but female participation in computer science is stagnant
Increased computing
Increased women
Stagnant women in computer science
How much do women participate in scientific computing outside CS?
1. Computational science publications
2. Published software
3. Raw code
Three lines of evidence
Female authorship on computational science papers
Case-paper search
- Searched computational science methods and "statistics"
- 2008-2012
- Retrieved 200 "most-relevant"
For each "case" paper, found "control"
- same journal, same time
Extracted all author first names
-Ran through gender classifier
Female authorship on computational science papers
Female authorship drops from 26.5% to 17.5%
1. Computational science publications
2.. Published software
3. Raw code
Three lines of evidence
~30% reduction in female authorship on CS papers
Published software contributions
ROpenSci analysis:
2. Extracted maintainer names for all active packages
1. Scraped CRAN
3. Ran through gender classifier
Published software contributions
Ben Marwick @ https://github.com/ropensci/unconf/issues/13
15% of ~4700 packages maintained by women
1. Computational science publications
2.. Published software
3. Raw code
Three lines of evidence
~30% reduction in female authorship on CS papers
15% of packages maintained by women
Women producing raw code
Alyssa Frazee (Johns Hopkins)
github
www.github.com
Code management system
~ 9 million users
Projects stored in repositories
Repositories belong to users (with user names)
Repositories map to primary language
Classified user names based on gender
Scraped username + language from all repositories with >= 5 stars
Women producing raw code
R
Male
Ambiguous
Female
Less than 2.5% of starred github repos map to female usernames
1. Computational science publications
2.. Published software
3. Raw code
Three lines of evidence
~30% reduction in female authorship on CS papers
15% of packages maintained by women
<2.5% of starred github repos map to female names
Cyberinfrastructure and
High Performance Computing
Useful for research
Costly
Who uses it?
Another possible evidence line
Costs of limited participation
Students
Few mentors, limited guidance on CS skills
Science
BBSRC study of UK scientists: biggest future vulnerability is scientists' CS skills
Individual
ASA post-grad survey: computational skills are biggest liability
Institutions
Investment in infrastructure that people don't efficiently use
What limits participation?
Few formal expectations
Culture ostracizes newcomers
Students enter with limited background
Free/Libre/Open Source ("F/LOSS") Computing Culture
R is Open Source...
Bad track record wrt female/newcomer participation
~ 2.5% female; stable since ~ 2000 (... based on terrible data)
... and R Developers are very male.
61 men and 1 woman acknowledged major contributors
21 men on development team
Women & R
Have domain-specific expertise
(>40% of PhDs in statistics...)
Large programming community
Benefit from F/LOSS
Prominent female R developers
Karline Soetaert ----------- deSolve + 14 other packages
Hana Sevcikova ------------ fractals and population projection modeling
Ulrike Gromping ----------- design of experiments
Daniela Witten -------------- machine learning
Anne-Laure Boulesteix -- BioConductor (geneSelector, etc.)
Two F/LOSS Roles
Builders
Consumers
Apprentice model for contribution
Sweep floor first, then build mansions
Venerates self-teaching
Getting started in Open Source is hard
Varied entry points
Newcomers are busy / stressed
People with knowledge are busy/stressed
Computing is not the #1 priority
Formal expectations:
What CS skills should scientists have?
Data management
Code management
General hardware/software knowledge
Data visualization
Document preparation
(Data collection)
How to catch newcomers up
Doesn't know doesn't care
Meet newcomers where they are
Encourage note-taking
Explain why you do what you do
Point newcomers to other resources
Strategies to improve your own programming
1. Keep code clean and organized
Follow a style-guide (~10 mins of reading = much better code)
Use a consistent filing system
My ideas here:
http://ciddgsa.com/2015/04/11/a-few-steps-toward-better-cleaner-more-organized-code/
2. Self-Evaluate
Pick an old script to clean and functionalize
Assess where you're at wrt Data and Code management, basic logic ("Fizz-Buzz"), Data visualization, Document prep
4. Expect to develop new skills
3. Simulate everything
Group learning
Code Reviews
Peer reviews done once/semester (for credit??)
Collaborative projects like kaggle.com
Participate online
Twitter (#rstats, #ropensci), edit Wikipedia
Stack-overflow, r-help, etc.
Take-home messages
Three major contributing factors
1. No clear expectations
2. Intimidating on-line community
3. Limited background skill
Articulate
Foster local community
Framework to catch up
There's a gender gap in scientific computing.
Lack of computing skill is problematic for science, individuals, institutions, and future workforce.
Thanks
Penn State Academic Computing Fellows Program
Data / Analysis from:
Ben Marwick & ROpenSci, Alyssa Frazee, Association of American Universities Data Exchange
Ideas from:
Michael Lerch, Laura Sampson, Rebecca Belou, CIDD Grad Student Group, Cross Lab, Alyssa Frazee, Amanda Golberg, John Paxton
MSU Math-Sci Department for letting me come
Questions / Comments?
http://slides.com/keziamanlove/digitalsilence
Supp info: http://keziamanlove.com/digital-silence/
Contact
kezia.manlove <at> gmail