Digital Silence

 

The apparent gender gap in scientific computing,

why it matters,

and strategies for change

Kezia Manlove

Penn State

April 16, 2015

http://slides.com/keziamanlove/digitalsilence

 

and computing and science

are increasingly connected

We live in a digital world

Scientific Computing

  • Numerical Analysis
  • Monte Carlo
  • Graph theory

"[The scientific discipline] concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems"

  • Molecular dynamics
  • Linear programming
  • Discrete Fourier Transform
  • Numerical Linear Algebra
  • Numerical methods for integration / differentiation

Methods

Scientific computing is on the rise

Web of Science

 

  • Searched methods for "scientific computing"

 

  • Recorded papers returned in even years across 15 disciplines

...and so is female participation in STEM

... but female participation in computer science is stagnant

Increased computing

Increased women

Stagnant women in computer science

How much do women participate in scientific computing outside CS?

1. Computational science publications

 

2. Published software

 

3. Raw code

Three lines of evidence

Female authorship on computational science papers

Case-paper search

- Searched computational science methods and "statistics"

- 2008-2012

- Retrieved 200 "most-relevant"

For each "case" paper, found "control"

- same journal, same time

Extracted all author first names

-Ran through gender classifier

Female authorship on computational science papers

Female authorship drops from 26.5% to 17.5%

1. Computational science publications

 

 

2.. Published software

 

 

3. Raw code

Three lines of evidence

~30% reduction in female authorship on CS papers

Published software contributions

ROpenSci analysis:

2. Extracted maintainer names for all active packages

1. Scraped CRAN

3. Ran through gender classifier

Published software contributions

Ben Marwick @ https://github.com/ropensci/unconf/issues/13

15% of ~4700 packages maintained by women

1. Computational science publications

 

 

2.. Published software

 

 

3. Raw code

Three lines of evidence

~30% reduction in female authorship on CS papers

15% of packages maintained by women

Women producing raw code

Alyssa Frazee (Johns Hopkins)

github

www.github.com

Code management system

~ 9 million users

Projects stored in repositories

Repositories belong to users (with user names)

Repositories map to primary language

Classified user names based on gender

Scraped username + language from all repositories with >= 5 stars

Women producing raw code

R

Male

Ambiguous

Female

Less than 2.5% of starred github repos map to female usernames

1. Computational science publications

 

 

2.. Published software

 

 

3. Raw code

Three lines of evidence

~30% reduction in female authorship on CS papers

15% of packages maintained by women

<2.5% of starred github repos map to female names

Cyberinfrastructure and

High Performance Computing

Useful for research

 

Costly

 

Who uses it?

Another possible evidence line

Costs of limited participation

Students

Few mentors, limited guidance on CS skills

Science

BBSRC study of UK scientists: biggest future vulnerability is scientists' CS skills

Individual

ASA post-grad survey: computational skills are biggest liability

 

Institutions

Investment in infrastructure that people don't efficiently use

What limits participation?

Few formal expectations

Culture ostracizes newcomers

Students enter with limited background

Free/Libre/Open Source ("F/LOSS") Computing Culture

R is Open Source...

Bad track record wrt female/newcomer participation

~ 2.5% female; stable since ~ 2000 (... based on terrible data)

... and R Developers are very male.

61 men and 1 woman acknowledged major contributors

21 men on development team

Women & R

 

Have domain-specific expertise

(>40% of PhDs in statistics...)

Large programming community

Benefit from F/LOSS

Prominent female R developers

Karline Soetaert ----------- deSolve + 14 other packages

Hana Sevcikova ------------ fractals and population projection modeling

Ulrike Gromping ----------- design of experiments

Daniela Witten -------------- machine learning

Anne-Laure Boulesteix -- BioConductor (geneSelector, etc.)

Two F/LOSS Roles

Builders

Consumers

Apprentice model for contribution

Sweep floor first, then build mansions

Venerates self-teaching

Getting started in Open Source is hard

Varied entry points

 

Newcomers are busy / stressed

 

People with knowledge are busy/stressed

 

Computing is not the #1 priority

Formal expectations:

What CS skills should scientists have?

Data management

Code management

General hardware/software knowledge

Data visualization

Document preparation

(Data collection)

How to catch newcomers up

 

 

Doesn't know         doesn't care

\neq

Meet newcomers where they are

Encourage note-taking

 Explain why you do what you do

Point newcomers to other resources

Strategies to improve your own programming

1. Keep code clean and organized

Follow a style-guide (~10 mins of reading = much better code)

Use a consistent filing system

My ideas here:

http://ciddgsa.com/2015/04/11/a-few-steps-toward-better-cleaner-more-organized-code/

2. Self-Evaluate

Pick an old script to clean and functionalize

Assess where you're at wrt Data and Code management, basic logic ("Fizz-Buzz"),  Data visualization, Document prep

4. Expect to develop new skills

3. Simulate everything

Group learning

Code Reviews

Peer reviews done once/semester (for credit??)

Collaborative projects like kaggle.com

  • Data cleaning and prep
  • Metric development
  • Partition projects and design workflows
  • Exposure to bigger data
  • Builds scientific logic                              

Participate online

Twitter (#rstats, #ropensci), edit Wikipedia

Stack-overflow, r-help, etc.

Take-home messages

 

 

 

 

Three major contributing factors

1. No clear expectations

2. Intimidating on-line community

3. Limited background skill

 

Articulate

Foster local community

Framework to catch up

There's a gender gap in scientific computing.

 

Lack of computing skill is problematic for science, individuals, institutions, and future workforce.

Thanks

Penn State Academic Computing Fellows Program

Data / Analysis from:

Ben Marwick & ROpenSci, Alyssa Frazee, Association of American Universities Data Exchange

Ideas from:

Michael Lerch, Laura Sampson, Rebecca Belou, CIDD Grad Student Group, Cross Lab, Alyssa Frazee, Amanda Golberg, John Paxton

MSU Math-Sci Department for letting me come

Questions / Comments?

http://slides.com/keziamanlove/digitalsilence

Supp info: http://keziamanlove.com/digital-silence/

 

Contact

kezia.manlove <at> gmail

Digital Silence

By Kezia Manlove

Digital Silence

Gender biases in scientific computing

  • 806