Working with Race & Ethnicity Data in the Criminal Justice System

Branden Dupont

Data Analyst

Medical College of Wisconsin

 Who Am I?

 Datashare @ MCW IHE = Local IDS





Documented Lack of Quality in Race/Ethnicity Justice Data

Scope of the Problem - U.S.

  • 2016 Urban Institute report found in a review of 40 states that only 15 state's arrest records reported ethnicity.
  • Often is not self-identified, but assigned by official
  • This often feeds into the front end of the system (jail, pretrial, prosecutor)
  • Even when reported the quality is suspect. e.g. SPP found only 37% of Hispanic affiliated names were correctly classified in Texas

Locally, Milwaukee Shares This Problem

  • Officers assign race and sometimes ethnicity
  • Data from law enforcement feeds into the system, in particular PROTECT
  • Oftentimes, lack of confidence in the quality of race/ethnicity data
  • Currently no systems in place to assess the size, direction of potential misclassification
  • Mapping home addresses reveal questions about data quality (officer assigned vs census reported)

Implications for Measuring Racial Ethnic Disparities


  • "[F]ailing to account for Hispanics in white and black estimates tends to inflate white proportions and deflate black proportions of arrests, admissions, and prison population estimates, masking the “true” black and white racial disproportionality." - Harris CT et al


Technical Implications

  • Stakeholders want to be confident a measure is accurate for defining the size of the problem and measuring impact of a program
  • Need to get a sense of the magnitude of the misclassification
  • Even an immediate fix - still need to benchmark performance to the past and assess quality in an ongoing way
  • 15.1% of Milwaukee County is hispanic/latino and represents ~40% of all hispanic/latinos that live in Wisconsin.
  • Community trust in analysis results

Program and Policy Implications

Methods to Impute Hispanic from White in Criminal Justice Data

Existing Standards

  • The Standford Policing Project follows an established benchmark of reclassifying an individual as hispanic if 75% or more people with that same last name are Hispanic affiliated. (Melendres v. Arpaio, 2009; Word and Perkins, 1996).[8]
  • In their analysis of racial disparities in policing stops, individuals labled as white that meet the 75% threshold are changed to hispanic. They note that 90% of people with Hispanic-affiliated names identify as Hispanic.


  • Ethnicolr is a python package used to predict race and ethnicity using first and last name.


  • Ethnicolor's model was validated using Florida Voter Registration data precision of .83 and recall of .84 when both last and first name


  • Higher quality probabilistic output compared to another classifier often used in the criminal justice system: offender risk assement. These models at their best reach an AUC of .72. (COMPAS and LSI-R .66)


  •  Easy, automated way to error correct improper classification, fix missing entries, and gauge the overall quality in race/ethnicity data.

Approach to PROTECT Race/Ethnicity Error Correction


  • MacArthur project evaluating disparities in prosecutorial decision-making


  • A race/ethnicity of white is reclassified as hispanic if the first name and last name in PROTECT has a Hispanic affiliated predicted score of 75% using Ethnicolr


  • Hispanics who are reclassified from white are more likely to be correctly assigned, but will miss more instances where white should be reclassified as Hispanic (precision vs recall trade-off).




  • Imputation using Ethnicolr classifies 5848 more defendants as Hispanic instead of nh white for a total of 13,215 on a 5 year cohort of DA Referrals (2014-2018) using a predicted score of 75% or greater.


  • This represents a percentage increase of ~80% Hispanics reclassified from nh white.

Kleczka Presentation

By Branden DuPont

Kleczka Presentation

  • 187
Loading comments...

More from Branden DuPont