Autoencoder Asset Pricing Models

Gu, Kelly, and Xiu

Motivation

The IPCA model (a latent factor conditional asset pricing model) is powerful.
However, it assumes the factor exposures are a linear function of the covariates.
Existing literature suggests their relationship might be nonlinear.

Objective

Create a nonlinear version of IPCA using autoencoders

Results

Reduces out-of-sample pricing errors (predictive \(R^2\))
Imposes economic restriction of no-arbitrage (no intercept)

Background

A model of returns

\[r_{i,t+1}=\alpha_{i,t}+\beta'_{i,t}f_{t+1}+\epsilon_{i,t+1}\]

\(r_{i,t+1}\): return for stock \(i\) at time \(t+1\)
\(f_{t+1}\): systemic risk factors at time \(t+1\)
\(\beta'_{i,t}\): exposure of stock \(i\) at \(t+1\) to systemic risk factors
\(\alpha_{i,t}\): intercept term (perhaps set to 0)
\(\epsilon_{i,t+1}\): error

A model of returns

\[r_{i,t+1}=\alpha_{i,t}+\beta'_{i,t}f_{t+1}+\epsilon_{i,t+1}\]

\(f_{t+1}\): systemic risk factors at time \(t+1\)
- These are the "factors" in a factor model
- Systemwide: no \(i\) subscript
- Can be pre-specified or latent
- We will use latent factors

IPCA

From "Characteristics are covariances" by KPS
Idea: characteristics proxy for exposure to risk factors
- Momentum, volatility, bid-ask spread
Conditional exposures: \(\beta(z_{i,t})'=z'_{i,t}\Gamma_\beta\)
\(z_{i,t}\): vector of characteristics of asset \(i\)

IPCA

Provides robust interpretation of returns
If characteristics proxy for risk factors, then \(\beta\neq 0\) and \(\alpha=0\)
If not, then characteristics can be used for compensation without risk, so \(\beta=0\) and \(\alpha\neq 0\)
- This is an "anomaly" (arbitrage)

IPCA Analysis

\(R^2_{total}\): fraction of variance in \(r_{t+1}\) explained by \(\hat{\beta}_{i,t}'\hat{f}_{t+1}\)
- Ability of model to explained realized variation in returns (systemic risks)

IPCA Analysis

\(R^2_{predictive}\): fraction of variance in \(r_{t+1}\) explained by \(\hat{\beta}_{i,t}'\hat{\lambda}\)
- \(\hat\lambda\): vector of estimated risk factor prices
- \(\hat{\beta}_{i,t}'\hat{\lambda}\): model-based conditional expected return on asset \(i\) given \(t\) information
- Measures the accuracy of model-implied conditional expected returns
  - Ability to describe differences in average returns (risk compensation).

IPCA Results

Achieves similar \(R^2_{total}\) to Fama-French (in sample)
Better \(R^2_{total}\) out of sample
More than double Fama-French in \(R^2_{predicitve}\)
\(\alpha\) usually insignificant from zero for 5-factor
- If significant, returns are small

Autoencoders

Neural network for dimension reduction
Output layer is the same as input layer
Hidden layer(s) have fewer neurons

New Model

Conditional Autoencoder

Covariates improve estimates of factor loadings and latent factors
Design new neural network structure by augmenting a standard autoencoder to incorporate covariates
\(r_{i,t}=\beta'_{i,t-1}f_t+u_{i,t}\)

Conditional Autoencoder

Beta (left side)

Neural network to compute betas:

z^{(0)}_{i,t-1} = z_{i,t-1} \\ z^{(l)}_{i,t-1} = g\left(b^{(l-1)}+W^{(l-1)}z^{(l-1)}_{i,t-1}\right),\quad l-1,\dots,L_\beta \\ \beta_{i,t-1}=b^{(L_\beta)}+W^{(L_\beta)}z^{(L_\beta)}_{i,t-1}

Factor (right side)

Neural network to compute betas:

r^{(0)}_{t} = r_t \\ r^{(l)}_t = \widetilde g\left(\widetilde b^{(l-1)}+\widetilde W^{(l-1)}r^{(l-1)}_t\right),\quad l-1,\dots,L_f \\ f_t=\widetilde b^{(L_f)}+\widetilde W^{(L_f)}r_t^{(L_f)}

\(L_f=1\)
- This makes the factors interpretable as portfolios

Factor (right side)

Difficult to use full cross section of individual stock returns
- Many weight parameters: 30,000 firms, 720 months
- Panel is unbalanced: only 6,000 stocks/month
Solution: initialize network with set of portfolios
- \(x_t=(Z'_{t-1}Z_{t-1})^{-1}Z_{t-1}r_t\)
- Set of portfolios dynamically re-weighted by characteristic
- \(j^{th}\) element: return of long-short portfolio constructed by sorting stocks based on \(j^{th}\) characteristic

Objetive Function

\(\theta\): summarizes weight parameters
\(\phi(\theta)\): penalty function for regularization
- Use LASSO (\(l_1\))
- \(\phi(\theta;\lambda)=\lambda \sum_j |\theta_j|\)
- Set coefficients on a subset of covariates to exactly zero
- Imposes sparsity on weights

\mathcal L(\theta;\cdot) = \frac{1}{NT}\sum_{t=1}^T\sum_{i=1}^N ||r_{i,t}-\beta'_{i,t-1}f_t||^2+\phi(\theta;\cdot)

Other Regularization Techniques

Early stopping: stop when validation sample errors begin to increase
- Usually occurs before errors minimized in training
Ensemble: train 10 networks and use average prediction

Optimization Algorithn

Stochastic gradient descent
Adam optimizer
Batch normalized: for each hidden layer in each training step (batch), cross-sectionally de-mean and standardize
- Motivated by internal covariate shift: inputs of hidden layers follow different distributions than their counterparts in the validation sample
- Should restore representation power of the unit

Data

Dataset

Source: CRSP monthly data from NYSE, AMEX, and NASDAQ
Range: March 1957 to December 2016 (60 years)
30,000 total stocks, ~6,200 per month
Training: 1957-1974 (18 years)
Validation: 1975-1986 (12 years)
Testing: 1987-2016 (30 years)

Characteristics

94 characteristics
- 61 updated annually
- 13 updated quarterly
- 20 updated monthly
Delay characteristics to avoid forward looking bias:
- Monthly by 1 month
- Quarterly by 4 months
- Annually by 6 months
Avoid recursively refitting model each month
- Refit annually (most signals annual)

Characteristics

Missing characteristics replaced by cross-sectional median of that characteristic for that month
Distributions can be skewed and leptokurtic
- Rank-normalize characteristics
- Create 94 managed portfolios
- Also include one equal weighted market portfolio
No filters based on prices or share types

Experiment

Model Set

PCA: linear, constant betas, no conditioning
IPCA: linear, conditional betas
CA0: conditional autoencoder, single layer in both beta and factor networks (similar to IPCA)
CA1: add hidden layer with 32 neurons to beta
CA2: add second hidden layer with 16 neurons to beta
CA3: add third hidden layer with 8 neurons to beta
FF: Fama-French model with observable factors
Try each model with 1 to 6 factors

Metrics

R^2_{total} = 1-\frac{\sum_{(i,t)\in OOS} (r_{i,t}-\hat\beta'_{i,t-1}\hat f_{t})^2}{\sum_{(i,t)\in OOS}r^2_{i,t}}

R^2_{pred} = 1-\frac{\sum_{(i,t)\in OOS} (r_{i,t}-\hat\beta'_{i,t-1}\hat \lambda_{t-1})^2}{\sum_{(i,t)\in OOS}r^2_{i,t}}

\(\hat\lambda_{t-1}\): sample average of \(\hat f\) up to month \(t-1\)

Results

OOS \(R^2_{total}\)

IPCA with 6 factors top performing
Closely followed by CA networks
FF models worst:
- Infrequent re-estimation of parameters
- Much larger cross-section of stocks than normal

OOS \(R^2_{total}\)

OOS \(R^2_{pred}\)

CAs did much better than IPCA
Static models did poorly

OOS \(R^2_{pred}\)

Economic Performance

Sort stocks based on OOS forecast
Create zero-net investment portfolio
- Buy top 10%
- Sell bottom 10%
Equal-weighted and value-weighted portfolios
CA2 top performing

Economic Performance

Constructed using mean and covariance matrix of estimated factors through \(t\) and tracking the post-formation \(t+1\) return
CA3 5-factor top performing
Not necessarily implementable strategies

Risk Premia vs. Mispricing

Models above specified with no intercept (\(\alpha\))
- Imposes no arbitrage
- Stock characteristics proxy for compensated factor risk exposures
Should there be an intercept?

Risk Premia vs. Mispricing

If zero-intercept is correct model, time series average of model residuals for each asset should be statistically indistinguishable from zero
\(\alpha_i := E[u_{i,t}]=E[r_{i,t}]-E[\beta'_{i,t-1}f_t]\)
Use \(t\)-tests

Risk Premia vs. Mispricing

Magnitude of alphas less for CA
Fewer are significant
- Those that are have small magnitude (7 bps/mo)

Characteristics Importance

Variable importance by reduction in \(R^2_{total}\) when removed
Top 20 characteristics contributed 90% for CA1-3
Three influential categories:
- Price trend: reversal, momentum
- Liquidity: turnover, dollar volume, bid-ask spread
- Risk measures: volatility, market beta

Robustness Check

Rerun on random subsamples
Still performs well

Monte-Carlo Simulations

Construct a dataset and test on it
Performs well

Conclusion

New nonlinear conditional asset pricing model
Embeds economic restriction of no-arbitrage
Dominates other asset pricing models
- Especially in predictive power

Analysis

Well written and robust
Like that they note perhaps not implementable
Powerful framework for evaluating all characteristics
Personally biased against monthly studies
- Need to look back far
- Market much different 50 years ago
Their model can handle daily data/characteristics
- Would like to see study with short return periods and higher frequency characteristics