A Deep Learning Assessment of the Right to Counsel

Patrick Power

Shomik Ghosh

Markus Schwedeler

Overview

Introduce Estimation Framework

Generalizes OLS
Allows for Nonparametric Effects
Inherently compositional, even under regularization

Hypothesis: The Right to Counsel assists those who are currently housed at the cost of those who are unhoused

Results: Exploiting ongoing roll-out of the policy across Connecticut, we find little evidence to support this hypothesis, which suggests that the policy scales better than perviously understood

Assess the Effects of the Right to Counsel

Right to Counsel: Provides low income households with legal representation in an eviction case

Regularizing the Forward Pass

Non-ParametriC Clustering

A "global" approach to local sampling corrections

Inherently Compositional

Even under data-dependent regularization

NestS oLS & Supervized learning

Allows for fine control

over the hypothesis space

Clear/Intuitive/Mathematical --> Inductive Bias

data N : Set where
  zero : N 
  suc : N -> N 
    
 suc ( suc (suc zero ) )

problem

When clusters differ in distributions, it can be problematic to not distinguish between these problems
We have a toy example which highlights this
We show how this problem can get worse in large dimensions

\underset{f \in \sigma(X)}{\textrm{inf}} \ \mathbb{E}[(Y - f)^2]

\underset{f \in \sigma(X)}{\textrm{inf}} \ \mathbb{E}[(Y - f)^2 \mid A]

i.i.d Problem

Cluster Problem

Generalizing Across Zip codes

\underset{f \in \sigma(x)}{\textrm{inf}} \ \mathbb{E}\big[(Y-f)^2 \big]

Need bandwidth that is sensitive to the presence of clusters

Estimand:

The challenge of Higher dimensions

X_{i,c} \sim \mathcal{N}(0, I)

X_{i,c} \sim \mathcal{N}(\bar{X}_c, I)

\bar{X}_{c} \sim \mathcal{N}(0, I)

Extending Balestriero [2021]

i.i.d Data

Cluster Data

W_{i} \sim \mathbb{P}_W \iff \big(\mathcal{R}^d, \mathcal{B}(\mathcal{R}^n), \mathbb{P}_W \big)

\mathcal{R}(f; \mathbb{P}, \mathcal{l}) := \mathbb{E}\big[\mathcal{l}(f(X), Y) \big]

f = g \circ \phi

Linear Map

Feature Map

\Big(\{\textrm{Keys}\}, \mathcal{F}, \mathbb{P} \Big)

Probability Space

W_i : \{\textrm{Keys}\} \to \mathcal{R^d}

Random Variables

Probability Model

How exactly does the LLN kick in here?

Kernel Methods

Deep Learning

Supervised

Motivation

K(x,x')

\int \Big \langle \nabla_{\theta} f_x\big(\theta(t)\big), \nabla_{\theta} f_{x'}\big(\theta(t)\big) \Big\rangle dt

Domingos [2020]

Motivation

K(x,x')

\nabla_{\theta} f_x\big(\theta(t)\big)

K(\theta,\theta')

Df_x(\theta)[v] = \langle v, \textrm{grad} f_x(\theta) \rangle

Domingos [2020]

Reisz Representation Theorem

Kernel Methods

Deep Learning

Supervised

High Level Idea

K(x,x')

\int \Big \langle \nabla_{\theta} f_x\big(\theta(t)\big), \nabla_{\theta} f_{x'}\big(\theta(t)\big) \Big\rangle dt

Domingos [2020]

Cluster

\int \Big \langle \nabla_{\theta} \tilde{f}_x\big(\theta(t)\big), \nabla_{\theta} \tilde{f}_{x'}\big(\theta(t)\big) \Big\rangle dt

\tilde{K}_c(x,x')

introduction

Context

2 Million Eviction fillings each year in U.S.
Gap in legal representation (90/10) in favor of landlords

Policy

Right to Counsel: Provides low income households with legal representation in an eviction case

Question

Does the Right to Counsel assist those who are housed at the cost of those who are not housed?

This paper

Estimation

Deep Learning Estimator the "corrects" for the zip-code level assignment

Key Empirical Result

Linear Models suggest adverse unintended consequences of the policy
Preferred Model suggests limited to no negative effects

Setting

Connecticut's state-wide implementation of policy

Practical concerns

(1) Why go beyond the linear model?

(2) Why allow the influence of the zip code to vary across covariates?

Linear Estimator

Locally smooth across clusters

\mathbb{E}[Y_1(0) - Y_0(0)|X, Z \in A^c] = \mathbb{E}[Y_1(0) - Y_0(0)|X, Z \in A]

A \equiv \{\textrm{Untreated Zip Code} \}

A^c \equiv \{\textrm{Treated Zip Code} \}

Sampling distribution of a standardized estimate of a linear difference-in-difference model

Motivation

Locally smooth across clusters

Whenever treatment is assigned across clusters, you want your estimator to be able to generalize across clusters.

Zip Code Correction

Cross Sectional

Locally smooth across clusters

\mathbb{E}[Y|X]

Estimand

\mathbb{E}[Y|X, Z \in A], \quad A \subset \ \{\textrm{All zip codes}\}

Observe

Challenge

Generalize across the unobserved zip codes

Locally smooth across clusters

Approach

Zip Code Correction

Repeated Cross Section

Locally smooth across clusters

\mathbb{E}[Y_1(0) - Y_0(0)|X]

Estimand

\mathbb{E}[Y_1(0) - Y_0(0)|X, Z \in A], \quad A \equiv \{\textrm{Untreated Zip Codes}\}

Observe

Challenge

Generalize across the treated zip codes

Locally smooth across clusters

Approach

Y_1(0)

Time Period

Potential Outcome

Supervised Deep Learning

Models formed by composing parameterized functions
Parameters updated via some form of gradient descent

Deep Learning

f_{\theta_2} \circ g_{\theta_1} \circ h_{\theta_0}

\theta(t) = \theta(t-1) - \alpha _t \nabla \mathcal{L}(\theta(t-1) )

Regularizing the Forward Pass

Models formed by composing embellished parameterized functions

\tilde{f}_{\theta_2} \ \tilde{\circ} \ g_{\theta_1} \ \tilde{\circ} \ h_{\theta_0}

\tilde{f}_{\theta_2}(x) = \big(f_{\theta_2}(x), m(f_{\theta})(x)\big)

inductive bias of algorithm

\underset{\theta}{\textrm{solve}} \ F(\theta, X, Y) = 0

\textrm{g}^n(\theta) = g \circ g \dots g(\theta)

g(\theta) = \theta - \alpha \Big(\partial _{\theta} (1/n)|F(\theta, X, Y)|_2^2\Big)

Linear Regression Diff-in-Diff Fails

Supervised Diff-in-Diff Fails

RFP Diff-in-Diff Fails

Contexts

State-level adoption of policy
Well documented staggered roll-out

Motivation

Measurement

Evictions are often informal (24% of forced moves)
HUD Rapid Rehousing Data

Few Barries to Housing
Limited Financial Support
Standard Lease Agreement

Seron et al. 2014, (NYC)
Greiner, Pattanayak and Hennessy 2013, 2012 (Boston)
Collinson et al. 2022 (Cook County & NYC)
Cassidy and Curry. 2022 (NYC)
Abramson 2022 (San Diego)

Evictions (Effects & Policies)

Literature Review

Deep Learning

Jacot et at [2018], Nagarajan and Kolter [2019], Wilson [2020], Belkin [2021], Zhang et al. [2021], Balestriero et al. [2021]
Griewank and Walther [2008], Frostig et al. [2018]
Finn et al. [2017], Kelly et al. [2020], Domingos [2020]

simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. -Zhang [2017]
We discuss in further detail below how these observations rule out all of VC-dimension, Rademacher complexity, and uniform stability as possible explanations for the generalization performance of state-of-the-art neural networks.
In contrast with classical convex empirical risk minimization, where explicit regularization is necessary to rule out trivial solutions, we found that regularization plays a rather different role in deep learning. It appears to be more of a tuning parameter that often helps improve the final test error of a model, but the absence of all regularization does not necessarily imply poor generalization error. As reported by Krizhevsky et al. (2012), `2-regularization (weight decay) sometimes even helps optimization, illustrating its poorly understood nature in deep learning.

Deep Learning Theory

Generalizing Across Zip codes

Context

Methodology

OUtline

Results

Context

Eviction process

Notice to Quit

Reasons: Nonpayment of Rent (3), Lease Violation (15)
Formally Served: Ex State Marshal

Summons & Complaint

Landlord: File Summons and Complaint with clerk
State Marshal: Serves copy to each tenant

Judgements

Default: Tenant fails to file appearance or respond to complaint
Stipulated: Judge approves agreed upon settlement
Trial: Judge hears case and makes ruling

Right to Counsel implementation

Policy implementation

Salience of Treatment

30% of Evictions & 20% of Renter Population
Legal Representation: 80% of landlords, 10% of tenants
Household income less than 80% of median state income (~$79,000 for a family of four)

Treated Zip Codes

Policy implementation

Courts and Landlords inform tenant of the existence of Right to Counsel

MAP

Rapid Rehousing

rapid rehousing

Overview

Housing Identification Services
Financial Assistance for housing-related expenses
Case management services

Features

No Preconditions to Housing
Financial assistance typically lasts 6 months
Typical Lease Agreement

rapid rehousing

Financial

Start-up/Move-in Costs
- First/Last Month Rent, Security/Utility deposit
Time limited financial assistance after move-in

Implementation

"Creativity is encouraged" in the design of the program

1

Experience Homelessness

2

Enter Shelter

4

Find Housing

5

Exit Rapid Rehousing

3

Start Rapid Rehousing Program

Rapid Rehousing timeline

data specifics

Homeless Management Information Systems

Individual level data
3338 Households in 2019-July 2022
Search length, Gender, Race, Age, Kids, Program Date

Feedforward Neural Net

X_i : \textrm{Age, Gender, Race, Family Size, Month}

Z_i : \textrm{Zip Code}

*Confidence Bands are constructed via random sampling initial weights of neural network

Regularizing the forward pass

*Confidence Bands are constructed via random sampling initial weights of neural network

X_i : \textrm{Age, Gender, Race, Family Size, Month}

Z_i : \textrm{Zip Code}

Difference-in-Difference

*Confidence Bands are constructed via stratified sampling without replacement (75% sampling rate)

Difference-in-Difference With Controls

X_i : \textrm{Age, Gender, Race, Family Size}

Z_i : \textrm{Zip Code}

*Confidence Bands are constructed via stratified sampling without replacement (75% sampling rate)

methodology

Easier to Implement
Better adhere to Potential Outcome Framework
Assess the effects of the policy at Scale

Observe subset of clusters
Covariates can differ across clusters
Assess the effects of the policy at Scale

\mathbb{P}_{X|C=c} \neq \mathbb{P}_{X|C=c'}

\mathbb{P}_{Y|X, C=c} \neq \mathbb{P}_{Y|X,C=c'}

Cluster Randomized control trials

Treatment assigned at level about unit of interest

Motivation

Tragic Triad

Framework

Prediction

Training

Function

Composition

Regularizing the Forward Pass

h(x)

h \circ g (x)

\tilde{h}(x) := \big( h(x), \textcolor{blue}{m}(h)(x) \big)

\tilde{h} >=> \tilde{g} (x)

h \overset{F_\textcolor{blue}{m}}{\longmapsto} \tilde{h}

\textrm{Regularizing the Forward Pass} \ \iff \textcolor{blue}{m}

h (g (x))

\big(h(g(x)), \textcolor{blue}{m}(h)(g(x)) + \textcolor{blue}{m}(g)(x)\big)

estimation framework

\textrm{linearModel} \ \textcolor{blue}{\textrm{data}}

\textrm{linearModel} \ \circ \textrm{identityMap} \ \textcolor{blue}{\textrm{data}}

\textrm{linearModel} \ \circ \textrm{featureMap} \ \textcolor{blue}{\textrm{data}}

\textrm{linearModel} \ \circ (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) \ \textcolor{purple}{\textrm{params}}

\textrm{linearModel} \ \circ (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) \circ \textrm{identityMap} \ \textcolor{purple}{\textrm{params}}

\textrm{linearModel} \ \circ (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) \circ (\textrm{clusterMap} \ \textcolor{blue}{\textrm{data}})\ \textcolor{purple}{\textrm{params}}

\textrm{linearModel} \ >=> (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) >=> (\textrm{clusterMap} \ \textcolor{blue}{\textrm{data}})\ \textcolor{purple}{\textrm{params}}

ODE

Regularized ODE

Kelly [2020]

Double Machine Learning

- Lasso & Partially Linear Models

Regulization will be problematic

Difficult to fit with partially linear neural network

Double Machine Learning

Same Data Set as Above!

Every Model has some form of regularization
Clear Inductive Bias

Extending Domingos [2020]

\partial_{\theta} \tilde{f}_{x}\big(\theta\big) := \partial _{\theta}f_x(\theta_c^*(\theta))\partial _{\theta} \theta_c^*(\theta) + \lambda\partial_{\theta} f_{x}\big(\theta\big)

Implicit Function:

Applying (k) iterations of gradient descent to cluster (c)

Regularization Term:

Ensure updates happen in the right space

Regularized Version of Model Agnostic Meta-Learning

Model

Cluster Specific Model

Regularizing the Forward Pass

K(x,x') = \int _0^1 \Big \langle \nabla_{\theta} \tilde{f}_x\big(\theta(t)\big), \nabla_{\theta} \tilde{f}_{x'}\big(\theta(t)\big) \Big\rangle

\partial_{\theta} \tilde{f}_{x}\big(\theta\big) := \partial _{\theta}f_x(\theta_c^*(\theta))\partial _{\theta} \theta_c^*(\theta) + \lambda\partial_{\theta} f_{x}\big(\theta\big)

K_c(x,x') = K(x, x') + \int _1^{1+\varepsilon} \Big \langle \nabla_{\theta} \tilde{f}_x\big(\theta(t)\big), \nabla_{\theta} \tilde{f}_{x'}\big(\theta(t)\big) \Big\rangle

Key Idea

Compose functions

Compose "Embellished" Functions

Wrong Space

Overfits in the Tails

Reasonable

motivation

\mathbb{E}[Y|X, C \in A], \quad A \subset \Omega

Cross Sectional

Repeated Cross Section

Locally smooth across clusters

Set of Clusters

Treated Clusters

\mathbb{E}[Y_1|X, C \in A] - \mathbb{E}[Y_0|X, C \in A]

\sum \big(p_{1,C|X}(c) - p_{0,C|X}(c)\big) \Big(\mathbb{E}[Y_1|X, C=c] - \mathbb{E}[Y_0|X, C=c] \Big)

Locally smooth across clusters

Results

Notation

S_i : \textrm{Search Duration}

X_i : \textrm{Age, Gender, Race, Family Size}

Z_i : \textrm{Zip Code}

D_i= \begin{cases} 1 \quad \textrm{if Treated} \\ 0 \quad \textrm{if Control} \end{cases}

1_{S_i < r} : \textrm{Found Housing within} \ r \ \textrm{days}

r \in \mathcal{R}_+: \textrm{Search Threshold}

r \longmapsto \hat{\beta}(r):= \underset{\beta}{\textrm{argmin}}\ \sum _i \mathcal{L}(r, S_i, X_i, Z_i, D_i)

Paper about Scale

Hardware	Compile Time	Compiled Run Time	Ratio
CPU	1.7547	0.5288	3.3185
GPU	2.6512	0.0009	2806.6818

Compute

Composing Models

f \circ g \to \tilde{f} >=> \tilde{g}

Policies

Does the Right to Counsel shift the costs of housing to those who are currently without housing?

A Deep Learning Assessment of the Right to Counsel

Overview

Regularizing the Forward Pass

Non-ParametriC Clustering

Inherently Compositional

NestS oLS & Supervized learning

problem

Generalizing Across Zip codes

The challenge of Higher dimensions

Motivation

Motivation

High Level Idea

introduction

This paper

Practical concerns

Linear Estimator

Motivation

Zip Code Correction

Zip Code Correction

Deep Learning

inductive bias of algorithm

Linear Regression Diff-in-Diff Fails

Supervised Diff-in-Diff Fails

RFP Diff-in-Diff Fails

Motivation

Literature Review

Deep Learning Theory

Generalizing Across Zip codes

OUtline

Context

Eviction process

Right to Counsel implementation

Policy implementation

Policy implementation

MAP

Rapid Rehousing

rapid rehousing

rapid rehousing

1

2

4

5

3

Rapid Rehousing timeline

data specifics

Feedforward Neural Net

Regularizing the forward pass

Difference-in-Difference

Difference-in-Difference With Controls

methodology

Cluster Randomized control trials

Framework

Regularizing the Forward Pass

estimation framework

ODE

Regularized ODE

Double Machine Learning

Double Machine Learning

Extending Domingos [2020]

Regularizing the Forward Pass

Key Idea

motivation

Results

Paper about Scale

The End