Documented Lack of Quality in Race/Ethnicity Justice Data
Scope of the Problem - U.S.
2016 Urban Institute report found in a review of 40 states that only 15 state's arrest records reported ethnicity.
Often is not self-identified, but assigned by official
This often feeds into the front end of the system (jail, pretrial, prosecutor)
Even when reported the quality is suspect. e.g. SPP found only 37% of Hispanic affiliated names were correctly classified in Texas
Locally, Milwaukee Shares This Problem
Officers assign race and sometimes ethnicity
Data from law enforcement feeds into the system, in particular PROTECT
Oftentimes, lack of confidence in the quality of race/ethnicity data
Currently no systems in place to assess the size, direction of potential misclassification
Mapping home addresses reveal questions about data quality (officer assigned vs census reported)
Implications for Measuring Racial Ethnic Disparities
"[F]ailing to account for Hispanics in white and black estimates tends to inflate white proportions and deflate black proportions of arrests, admissions, and prison population estimates, masking the “true” black and white racial disproportionality." - Harris CT et al
Technical Implications
Stakeholders want to be confident a measure is accurate for defining the size of the problem and measuring impact of a program
Need to get a sense of the magnitude of the misclassification
Even an immediate fix - still need to benchmark performance to the past and assess quality in an ongoing way
15.1% of Milwaukee County is hispanic/latino and represents ~40% of all hispanic/latinos that live in Wisconsin.
Community trust in analysis results
Program and Policy Implications
Methods to Impute Hispanic from White in Criminal Justice Data
Existing Standards
The Standford Policing Project follows an established benchmark of reclassifying an individual as hispanic if 75% or more people with that same last name are Hispanic affiliated. (Melendres v. Arpaio, 2009; Word and Perkins, 1996).[8]
In their analysis of racial disparities in policing stops, individuals labled as white that meet the 75% threshold are changed to hispanic. They note that 90% of people with Hispanic-affiliated names identify as Hispanic.
Ethnicolr
Ethnicolr is a python package used to predict race and ethnicity using first and last name.
Ethnicolor's model was validated using Florida Voter Registration data precision of .83 and recall of .84 when both last and first name
Ethnicolr
Higher quality probabilistic output compared to another classifier often used in the criminal justice system: offender risk assement. These models at their best reach an AUC of .72. (COMPAS and LSI-R .66)
Easy, automated way to error correct improper classification, fix missing entries, and gauge the overall quality in race/ethnicity data.
Approach to PROTECT Race/Ethnicity Error Correction
Approach
MacArthur project evaluating disparities in prosecutorial decision-making
A race/ethnicity of white is reclassified as hispanic if the first name and last name in PROTECT has a Hispanic affiliated predicted score of 75% using Ethnicolr
Hispanics who are reclassified from white are more likely to be correctly assigned, but will miss more instances where white should be reclassified as Hispanic (precision vs recall trade-off).
Results
Imputation using Ethnicolr classifies 5848 more defendants as Hispanic instead of nh white for a total of 13,215 on a 5 year cohort of DA Referrals (2014-2018) using a predicted score of 75% or greater.
This represents a percentage increase of ~80% Hispanics reclassified from nh white.
Sentencing Features: Type and Length
Sentencing Type
Goal: calculating sentence length from circuit court data is complex
Start with meaningful variables that can be built with confidence
Generate custodial sentence for Milwaukee DA and other researchers
Custodial Sentence
Goal: Identify jail and prison sentence
combination = custodial sentence
Drop sentenced counts that are:
Imposed and stayed
Not custodial related (e.g. firearm related)
Prison or Jail/HOC with zero days
Exceptions:
Check court record of events - time served disposition
Probation sentence with jail condition time
Sentence Length:
A Work in Progress
Sentence Length Can Get Complex
Image: one individual case with two convicted counts
(1) one for felony burglary
Prison (3 yrs) and E.S. (2 yrs)
Concurrent to count 2
(2) felon in possession of a firearm
Prison (2 yrs) and E.S. (2 yrs)
Concurrent to count 1 and any other sentence
"Any Other Sentence"
That same individual has another case 20 days earlier in the same county with one convicted count for
(1) another for felon in possession of a firearm
Prison (3 yrs) and E.S. (2 yrs)
Whether a Count is Concurrent or Consecutive Lives in Free Form Text
"concurrent to count 1, two, and 3 in this case and consecutive to case 13cf04"
"concurr with 15cf3316"
"Concurrent with: Concurrent to count two and consecutive to count three. *Credit of 180 days as to count one and two."
Task Isn't Surmountable
Only need to classify which counts are consecutive and sentences that are consecutive to another case
Presumption is concurrent
Cases in same time horizon can be linked by SID and general record linkage
Counts and case numbers can be extracted using NLP/Regex