Documented Lack of Quality in Race/Ethnicity Justice Data
Scope of the Problem - U.S.
2016 Urban Institute report found in a review of 40 states that only 15 state's arrest records reported ethnicity.
Often is not self-identified, but assigned by official
This often feeds into the front end of the system (jail, pretrial, prosecutor)
Even when reported the quality is suspect. e.g. SPP found only 37% of Hispanic affiliated names were correctly classified in Texas
Locally, Milwaukee Shares This Problem
Officers assign race and sometimes ethnicity
Data from law enforcement feeds into the system, in particular PROTECT
Oftentimes, lack of confidence in the quality of race/ethnicity data
Currently no systems in place to assess the size, direction of potential misclassification
Mapping home addresses reveal questions about data quality (officer assigned vs census reported)
Implications for Measuring Racial Ethnic Disparities
"[F]ailing to account for Hispanics in white and black estimates tends to inflate white proportions and deflate black proportions of arrests, admissions, and prison population estimates, masking the “true” black and white racial disproportionality." - Harris CT et al
Technical Implications
Stakeholders want to be confident a measure is accurate for defining the size of the problem and measuring impact of a program
Need to get a sense of the magnitude of the misclassification
Even an immediate fix - still need to benchmark performance to the past and assess quality in an ongoing way
15.1% of Milwaukee County is hispanic/latino and represents ~40% of all hispanic/latinos that live in Wisconsin.
Community trust in analysis results
Program and Policy Implications
Methods to Impute Hispanic from White in Criminal Justice Data
Existing Standards
The Standford Policing Project follows an established benchmark of reclassifying an individual as hispanic if 75% or more people with that same last name are Hispanic affiliated. (Melendres v. Arpaio, 2009; Word and Perkins, 1996).[8]
In their analysis of racial disparities in policing stops, individuals labled as white that meet the 75% threshold are changed to hispanic. They note that 90% of people with Hispanic-affiliated names identify as Hispanic.
Ethnicolr
Ethnicolr is a python package used to predict race and ethnicity using first and last name.
Ethnicolor's model was validated using Florida Voter Registration data precision of .83 and recall of .84 when both last and first name
Ethnicolr
Higher quality probabilistic output compared to another classifier often used in the criminal justice system: offender risk assement. These models at their best reach an AUC of .72. (COMPAS and LSI-R .66)
Easy, automated way to error correct improper classification, fix missing entries, and gauge the overall quality in race/ethnicity data.
Approach to PROTECT Race/Ethnicity Error Correction
Approach
MacArthur project evaluating disparities in prosecutorial decision-making
A race/ethnicity of white is reclassified as hispanic if the first name and last name in PROTECT has a Hispanic affiliated predicted score of 75% using Ethnicolr
Hispanics who are reclassified from white are more likely to be correctly assigned, but will miss more instances where white should be reclassified as Hispanic (precision vs recall trade-off).
Results
Imputation using Ethnicolr classifies 5848 more defendants as Hispanic instead of nh white for a total of 13,215 on a 5 year cohort of DA Referrals (2014-2018) using a predicted score of 75% or greater.
This represents a percentage increase of ~80% Hispanics reclassified from nh white.