A Deep Learning Assessment of the Right to Counsel

Overview

Introduce Estimation Framework

  • Generalizes OLS
  • Allows for Nonparametric Effects
  • Inherently compositional, even under regularization

Hypothesis: The Right to Counsel assists those who are currently housed at the cost of those who are unhoused

Results: Exploiting ongoing roll-out of the policy across Connecticut, we find little evidence to support this hypothesis, which suggests that the policy scales better than perviously understood

Assess the Effects of the Right to Counsel

Right to Counsel: Provides low income households with legal representation in an eviction case

Regularizing the Forward Pass

Non-ParametriC Clustering

A "global" approach to local sampling corrections 

Inherently Compositional

Even under data-dependent regularization

NestS oLS & Supervized learning

Allows for fine control 

over the hypothesis space

Clear/Intuitive/Mathematical --> Inductive Bias

data N : Set where
  zero : N 
  suc : N -> N 
    
 suc ( suc (suc zero ) )

problem

  • When clusters differ in distributions, it can be problematic to not distinguish between these problems
  • We have a toy example which highlights this
  • We show how this problem can get worse in large dimensions
\underset{f \in \sigma(X)}{\textrm{inf}} \ \mathbb{E}[(Y - f)^2]
\underset{f \in \sigma(X)}{\textrm{inf}} \ \mathbb{E}[(Y - f)^2 \mid A]

i.i.d Problem

Cluster Problem

Generalizing Across Zip codes

\underset{f \in \sigma(x)}{\textrm{inf}} \ \mathbb{E}\big[(Y-f)^2 \big]

Need bandwidth that is sensitive to the presence of clusters

Estimand:

The challenge of Higher dimensions

X_{i,c} \sim \mathcal{N}(0, I)
X_{i,c} \sim \mathcal{N}(\bar{X}_c, I)
\bar{X}_{c} \sim \mathcal{N}(0, I)

Extending Balestriero [2021]

i.i.d Data

Cluster Data

W_{i} \sim \mathbb{P}_W \iff \big(\mathcal{R}^d, \mathcal{B}(\mathcal{R}^n), \mathbb{P}_W \big)
\mathcal{R}(f; \mathbb{P}, \mathcal{l}) := \mathbb{E}\big[\mathcal{l}(f(X), Y) \big]
f = g \circ \phi

Linear Map

Feature Map

\Big(\{\textrm{Keys}\}, \mathcal{F}, \mathbb{P} \Big)

Probability Space

W_i : \{\textrm{Keys}\} \to \mathcal{R^d}

Random Variables

Probability Model

How exactly does the LLN kick in here?

Kernel Methods

 Deep Learning

Supervised

Motivation

K(x,x')
\int \Big \langle \nabla_{\theta} f_x\big(\theta(t)\big), \nabla_{\theta} f_{x'}\big(\theta(t)\big) \Big\rangle dt

 Domingos [2020]

Motivation

K(x,x')
\nabla_{\theta} f_x\big(\theta(t)\big)
K(\theta,\theta')
Df_x(\theta)[v] = \langle v, \textrm{grad} f_x(\theta) \rangle

Domingos [2020]

Reisz Representation Theorem

Kernel Methods

 Deep Learning

Supervised

High Level Idea

K(x,x')
\int \Big \langle \nabla_{\theta} f_x\big(\theta(t)\big), \nabla_{\theta} f_{x'}\big(\theta(t)\big) \Big\rangle dt

 Domingos [2020]

Cluster

\int \Big \langle \nabla_{\theta} \tilde{f}_x\big(\theta(t)\big), \nabla_{\theta} \tilde{f}_{x'}\big(\theta(t)\big) \Big\rangle dt
\tilde{K}_c(x,x')

introduction

Context

  • 2 Million Eviction fillings each year in U.S.
  • Gap in legal representation (90/10) in favor of landlords

Policy

Right to Counsel: Provides low income households with legal representation in an eviction case

Question

Does the Right to Counsel assist those who are housed at the cost of those who are not housed?

This paper

Estimation

  • Deep Learning Estimator the "corrects" for the zip-code level assignment

Key Empirical Result

  • Linear Models suggest adverse unintended consequences of the policy 
  • Preferred Model suggests limited to no negative effects

Setting

  • Connecticut's state-wide implementation of policy

Practical concerns

(1) Why go beyond the linear model?

(2) Why allow the influence of the zip code to vary across covariates?

Linear Estimator

Locally smooth across clusters

\mathbb{E}[Y_1(0) - Y_0(0)|X, Z \in A^c] = \mathbb{E}[Y_1(0) - Y_0(0)|X, Z \in A]
A \equiv \{\textrm{Untreated Zip Code} \}
A^c \equiv \{\textrm{Treated Zip Code} \}

Sampling distribution of a standardized estimate of a linear difference-in-difference model

Motivation

Locally smooth across clusters

Whenever treatment is assigned across clusters, you want your estimator to be able to generalize across clusters. 

Zip Code Correction

Cross Sectional

Locally smooth across clusters

\mathbb{E}[Y|X]

Estimand

\mathbb{E}[Y|X, Z \in A], \quad A \subset \ \{\textrm{All zip codes}\}

Observe

Challenge

Generalize across the unobserved zip codes

Locally smooth across clusters

Approach

Zip Code Correction

Repeated Cross Section

Locally smooth across clusters

\mathbb{E}[Y_1(0) - Y_0(0)|X]

Estimand

\mathbb{E}[Y_1(0) - Y_0(0)|X, Z \in A], \quad A \equiv \{\textrm{Untreated Zip Codes}\}

Observe

Challenge

Generalize across the treated zip codes

Locally smooth across clusters

Approach

Y_1(0)

Time Period

Potential Outcome

Supervised Deep Learning

  • Models formed by composing parameterized functions
  • Parameters updated via some form of gradient descent

Deep Learning

f_{\theta_2} \circ g_{\theta_1} \circ h_{\theta_0}
\theta(t) = \theta(t-1) - \alpha _t \nabla \mathcal{L}(\theta(t-1) )

Regularizing the Forward Pass

  • Models formed by composing embellished parameterized functions
\tilde{f}_{\theta_2} \ \tilde{\circ} \ g_{\theta_1} \ \tilde{\circ} \ h_{\theta_0}
\tilde{f}_{\theta_2}(x) = \big(f_{\theta_2}(x), m(f_{\theta})(x)\big)

inductive bias of algorithm

\underset{\theta}{\textrm{solve}} \ F(\theta, X, Y) = 0
\textrm{g}^n(\theta) = g \circ g \dots g(\theta)
g(\theta) = \theta - \alpha \Big(\partial _{\theta} (1/n)|F(\theta, X, Y)|_2^2\Big)

Linear Regression Diff-in-Diff Fails

Supervised Diff-in-Diff Fails

RFP Diff-in-Diff Fails

Contexts

  • State-level adoption of policy
  • Well documented staggered roll-out

Motivation

Measurement

  • Evictions are often informal (24% of forced moves)
  • HUD Rapid Rehousing Data
  1. Few Barries to Housing
  2. Limited Financial Support
  3. Standard Lease Agreement
  • Seron et al. 2014, (NYC)
  • Greiner, Pattanayak and Hennessy 2013, 2012 (Boston)
  • Collinson et al. 2022 (Cook County & NYC)
  • Cassidy and Curry. 2022 (NYC)
  • Abramson 2022 (San Diego)

Literature Review

 

  • Jacot et at [2018], Nagarajan and Kolter [2019], Wilson [2020], Belkin [2021], Zhang et al. [2021], Balestriero et al. [2021]
  • Griewank and Walther [2008], Frostig et al. [2018]
  • Finn et al. [2017], Kelly et al. [2020], Domingos [2020]
  • simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice. -Zhang [2017]
  • We discuss in further detail below how these observations rule out all of VC-dimension, Rademacher complexity, and uniform stability as possible explanations for the generalization performance of state-of-the-art neural networks.
  • In contrast with classical convex empirical risk minimization, where explicit regularization is necessary to rule out trivial solutions, we found that regularization plays a rather different role in deep learning. It appears to be more of a tuning parameter that often helps improve the final test error of a model, but the absence of all regularization does not necessarily imply poor generalization error. As reported by Krizhevsky et al. (2012), `2-regularization (weight decay) sometimes even helps optimization, illustrating its poorly understood nature in deep learning.

Deep Learning Theory

Generalizing Across Zip codes

OUtline

 

Context

Eviction process

Right to Counsel implementation

Policy implementation

  • 30% of Evictions & 20% of Renter Population
  • Legal Representation: 80% of landlords, 10% of tenants
  • Household income less than 80% of median state income (~$79,000 for a family of four)

Policy implementation

  • Courts and Landlords inform tenant of the existence of Right to Counsel

MAP

Rapid Rehousing

rapid rehousing

Overview

  • Housing Identification Services
  • Financial Assistance for housing-related expenses
  • Case management services

Features

  • No Preconditions to Housing
  • Financial assistance typically lasts 6 months
  • Typical Lease Agreement

rapid rehousing

Financial

  • Start-up/Move-in Costs
    • First/Last Month Rent, Security/Utility deposit
  • Time limited financial assistance after move-in

Implementation

  • "Creativity is encouraged" in the design of the program

1

Experience Homelessness

2

Enter Shelter

4

Find Housing

5

Exit Rapid Rehousing

3

Start Rapid Rehousing Program

Rapid Rehousing timeline

data specifics

Homeless Management Information Systems

  • Individual level data
  • 3338 Households in 2019-July 2022
  • Search length, Gender, Race, Age, Kids, Program Date

Feedforward Neural Net

X_i : \textrm{Age, Gender, Race, Family Size, Month}
Z_i : \textrm{Zip Code}

*Confidence Bands are constructed via random sampling initial weights of neural network

Regularizing the forward pass

*Confidence Bands are constructed via random sampling initial weights of neural network

X_i : \textrm{Age, Gender, Race, Family Size, Month}
Z_i : \textrm{Zip Code}

Difference-in-Difference

*Confidence Bands are constructed via stratified sampling without replacement (75% sampling rate)

Difference-in-Difference With Controls

X_i : \textrm{Age, Gender, Race, Family Size}
Z_i : \textrm{Zip Code}

*Confidence Bands are constructed via stratified sampling without replacement (75% sampling rate)

methodology

  • Easier to Implement
  • Better adhere to Potential Outcome Framework
  • Assess the effects of the policy at Scale
  • Observe subset of clusters
  • Covariates can differ across clusters
  • Assess the effects of the policy at Scale
\mathbb{P}_{X|C=c} \neq \mathbb{P}_{X|C=c'}
\mathbb{P}_{Y|X, C=c} \neq \mathbb{P}_{Y|X,C=c'}

Cluster Randomized control trials

Treatment assigned at level about unit of interest

Motivation

Tragic Triad

Framework

Prediction

Training

Function

Composition

Regularizing the Forward Pass

h(x)
h \circ g (x)
\tilde{h}(x) := \big( h(x), \textcolor{blue}{m}(h)(x) \big)
\tilde{h} >=> \tilde{g} (x)
h \overset{F_\textcolor{blue}{m}}{\longmapsto} \tilde{h}
\textrm{Regularizing the Forward Pass} \ \iff \textcolor{blue}{m}
h (g (x))
\big(h(g(x)), \textcolor{blue}{m}(h)(g(x)) + \textcolor{blue}{m}(g)(x)\big)

estimation framework

\textrm{linearModel} \ \textcolor{blue}{\textrm{data}}
\textrm{linearModel} \ \circ \textrm{identityMap} \ \textcolor{blue}{\textrm{data}}
\textrm{linearModel} \ \circ \textrm{featureMap} \ \textcolor{blue}{\textrm{data}}
\textrm{linearModel} \ \circ (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) \ \textcolor{purple}{\textrm{params}}
\textrm{linearModel} \ \circ (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) \circ \textrm{identityMap} \ \textcolor{purple}{\textrm{params}}
\textrm{linearModel} \ \circ (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) \circ (\textrm{clusterMap} \ \textcolor{blue}{\textrm{data}})\ \textcolor{purple}{\textrm{params}}
\textrm{linearModel} \ >=> (\textrm{featureMap} \ \textcolor{blue}{\textrm{data}}) >=> (\textrm{clusterMap} \ \textcolor{blue}{\textrm{data}})\ \textcolor{purple}{\textrm{params}}

ODE

Regularized ODE

Kelly [2020]

Double Machine Learning

- Lasso & Partially Linear Models

Regulization will be problematic

Difficult to fit with partially linear neural network

Double Machine Learning

Same Data Set as Above!

  • Every Model has some form of regularization
  • Clear Inductive Bias

Extending Domingos [2020]

\partial_{\theta} \tilde{f}_{x}\big(\theta\big) := \partial _{\theta}f_x(\theta_c^*(\theta))\partial _{\theta} \theta_c^*(\theta) + \lambda\partial_{\theta} f_{x}\big(\theta\big)

Implicit Function:

Applying (k) iterations of gradient descent to cluster (c)

Regularization Term:

Ensure updates happen in the right space

Regularized Version of Model Agnostic Meta-Learning

Model

Cluster Specific Model

Regularizing the Forward Pass

K(x,x') = \int _0^1 \Big \langle \nabla_{\theta} \tilde{f}_x\big(\theta(t)\big), \nabla_{\theta} \tilde{f}_{x'}\big(\theta(t)\big) \Big\rangle
\partial_{\theta} \tilde{f}_{x}\big(\theta\big) := \partial _{\theta}f_x(\theta_c^*(\theta))\partial _{\theta} \theta_c^*(\theta) + \lambda\partial_{\theta} f_{x}\big(\theta\big)
K_c(x,x') = K(x, x') + \int _1^{1+\varepsilon} \Big \langle \nabla_{\theta} \tilde{f}_x\big(\theta(t)\big), \nabla_{\theta} \tilde{f}_{x'}\big(\theta(t)\big) \Big\rangle

Key Idea

Compose functions

Compose "Embellished" Functions

Wrong Space

Overfits in the Tails

Reasonable

motivation

\mathbb{E}[Y|X, C \in A], \quad A \subset \Omega

Cross Sectional

Repeated Cross Section

Locally smooth across clusters

Set of Clusters

Treated Clusters

\mathbb{E}[Y_1|X, C \in A] - \mathbb{E}[Y_0|X, C \in A]
\sum \big(p_{1,C|X}(c) - p_{0,C|X}(c)\big) \Big(\mathbb{E}[Y_1|X, C=c] - \mathbb{E}[Y_0|X, C=c] \Big)

Locally smooth across clusters

Results

Notation

S_i : \textrm{Search Duration}
X_i : \textrm{Age, Gender, Race, Family Size}
Z_i : \textrm{Zip Code}
D_i= \begin{cases} 1 \quad \textrm{if Treated} \\ 0 \quad \textrm{if Control} \end{cases}
1_{S_i < r} : \textrm{Found Housing within} \ r \ \textrm{days}
r \in \mathcal{R}_+: \textrm{Search Threshold}
r \longmapsto \hat{\beta}(r):= \underset{\beta}{\textrm{argmin}}\ \sum _i \mathcal{L}(r, S_i, X_i, Z_i, D_i)

Paper about Scale

Hardware Compile Time Compiled Run Time Ratio
CPU 1.7547 0.5288 3.3185
GPU 2.6512 0.0009 2806.6818

Compute

Composing Models

f \circ g \to \tilde{f} >=> \tilde{g}

Policies

Does the Right to Counsel shift the costs of housing to those who are currently without housing?

The End