Autoencoder Asset Pricing Models

Gu, Kelly, and Xiu

Motivation

  • The IPCA model (a latent factor conditional asset pricing model) is powerful.
  • However, it assumes the factor exposures are a linear function of the covariates.
  • Existing literature suggests their relationship might be nonlinear.

Objective

  • Create a nonlinear version of IPCA using autoencoders

Results

  • Reduces out-of-sample pricing errors (predictive \(R^2\))
  • Imposes economic restriction of no-arbitrage (no intercept)

Background

 

A model of returns

\[r_{i,t+1}=\alpha_{i,t}+\beta'_{i,t}f_{t+1}+\epsilon_{i,t+1}\]

  • \(r_{i,t+1}\): return for stock \(i\) at time \(t+1\)
  • \(f_{t+1}\): systemic risk factors at time \(t+1\)
  • \(\beta'_{i,t}\): exposure of stock \(i\) at \(t+1\) to systemic risk factors
  • \(\alpha_{i,t}\): intercept term (perhaps set to 0)
  • \(\epsilon_{i,t+1}\): error

A model of returns

\[r_{i,t+1}=\alpha_{i,t}+\beta'_{i,t}f_{t+1}+\epsilon_{i,t+1}\]

  • \(f_{t+1}\): systemic risk factors at time \(t+1\)
    • These are the "factors" in a factor model
    • Systemwide: no \(i\) subscript
    • Can be pre-specified or latent
    • We will use latent factors

IPCA

  • From "Characteristics are covariances" by KPS
  • Idea: characteristics proxy for exposure to risk factors
    • Momentum, volatility, bid-ask spread
  • Conditional exposures: \(\beta(z_{i,t})'=z'_{i,t}\Gamma_\beta\)
  • \(z_{i,t}\): vector of characteristics of asset \(i\)

IPCA

  • Provides robust interpretation of returns
  • If characteristics proxy for risk factors, then \(\beta\neq 0\) and \(\alpha=0\)
  • If not, then characteristics can be used for compensation without risk, so \(\beta=0\) and \(\alpha\neq 0\)
    • This is an "anomaly" (arbitrage)

IPCA Analysis

  • \(R^2_{total}\): fraction of variance in \(r_{t+1}\) explained by \(\hat{\beta}_{i,t}'\hat{f}_{t+1}\)
    • Ability of model to explained realized variation in returns (systemic risks)

IPCA Analysis

  • \(R^2_{predictive}\): fraction of variance in \(r_{t+1}\) explained by \(\hat{\beta}_{i,t}'\hat{\lambda}\)
    • ​\(\hat\lambda\): vector of estimated risk factor prices
    • \(\hat{\beta}_{i,t}'\hat{\lambda}\): model-based conditional expected return on asset \(i\) given \(t\) information
    • Measures the accuracy of model-implied conditional expected returns 
      • Ability to describe differences in average returns (risk compensation).

IPCA Results

  • Achieves similar \(R^2_{total}\) to Fama-French (in sample)
  • Better \(R^2_{total}\) out of sample
  • More than double Fama-French in \(R^2_{predicitve}\)
  • \(\alpha\) usually insignificant from zero for 5-factor
    • ​If significant, returns are small

Autoencoders

  • Neural network for dimension reduction
  • Output layer is the same as input layer
  • Hidden layer(s) have fewer neurons

New Model

Conditional Autoencoder

  • Covariates improve estimates of factor loadings and latent factors
  • Design new neural network structure by augmenting a standard autoencoder to incorporate covariates
  • \(r_{i,t}=\beta'_{i,t-1}f_t+u_{i,t}\)

Conditional Autoencoder

Beta (left side)

  • Neural network to compute betas:
z^{(0)}_{i,t-1} = z_{i,t-1} \\ z^{(l)}_{i,t-1} = g\left(b^{(l-1)}+W^{(l-1)}z^{(l-1)}_{i,t-1}\right),\quad l-1,\dots,L_\beta \\ \beta_{i,t-1}=b^{(L_\beta)}+W^{(L_\beta)}z^{(L_\beta)}_{i,t-1}

Factor (right side)

  • Neural network to compute betas:
r^{(0)}_{t} = r_t \\ r^{(l)}_t = \widetilde g\left(\widetilde b^{(l-1)}+\widetilde W^{(l-1)}r^{(l-1)}_t\right),\quad l-1,\dots,L_f \\ f_t=\widetilde b^{(L_f)}+\widetilde W^{(L_f)}r_t^{(L_f)}
  • \(L_f=1\)
    • This makes the factors interpretable as portfolios

Factor (right side)

  • Difficult to use full cross section of individual stock returns 
    • Many weight parameters: 30,000 firms, 720 months
    • Panel is unbalanced: only 6,000 stocks/month
  • Solution: initialize network with set of portfolios
    • \(x_t=(Z'_{t-1}Z_{t-1})^{-1}Z_{t-1}r_t\)
    • Set of portfolios dynamically re-weighted by characteristic
    • \(j^{th}\) element: return of long-short portfolio constructed by sorting stocks based on \(j^{th}\) characteristic

Objetive Function

  • \(\theta\): summarizes weight parameters
  • \(\phi(\theta)\): penalty function for regularization
    • Use LASSO (\(l_1\))
    • \(\phi(\theta;\lambda)=\lambda \sum_j |\theta_j|\)
    • Set coefficients on a subset of covariates to exactly zero
    • Imposes sparsity on weights
\mathcal L(\theta;\cdot) = \frac{1}{NT}\sum_{t=1}^T\sum_{i=1}^N ||r_{i,t}-\beta'_{i,t-1}f_t||^2+\phi(\theta;\cdot)

Other Regularization Techniques

  • Early stopping: stop when validation sample errors begin to increase
    • Usually occurs before errors minimized in training
  • Ensemble: train 10 networks and use average prediction

Optimization Algorithn

  • Stochastic gradient descent
  • Adam optimizer
  • Batch normalized: for each hidden layer in each training step (batch), cross-sectionally de-mean and standardize
    • Motivated by internal covariate shift: inputs of hidden layers follow different distributions than their counterparts in the validation sample
    • Should restore representation power of the unit

Data

Dataset

  • Source: CRSP monthly data from NYSE, AMEX, and NASDAQ
  • Range: March 1957 to December 2016 (60 years)
  • 30,000 total stocks, ~6,200 per month
  • Training: 1957-1974 (18 years)
  • Validation: 1975-1986 (12 years)
  • Testing: 1987-2016 (30 years)

Characteristics

  • 94 characteristics
    • 61 updated annually
    • 13 updated quarterly
    • 20 updated monthly
  • Delay characteristics to avoid forward looking bias:
    • Monthly by 1 month
    • Quarterly by 4 months
    • Annually by 6 months
  • Avoid recursively refitting model each month
    • Refit annually (most signals annual)

Characteristics

  • Missing characteristics replaced by cross-sectional median of that characteristic for that month
  • Distributions can be skewed and leptokurtic
    • Rank-normalize characteristics
    • Create 94 managed portfolios
    • Also include one equal weighted market portfolio
  • No filters based on prices or share types

Experiment

Model Set

  • PCA: linear, constant betas, no conditioning
  • IPCA: linear, conditional betas
  • CA0: conditional autoencoder, single layer in both beta and factor networks (similar to IPCA)
  • CA1: add hidden layer with 32 neurons to beta
  • CA2: add second hidden layer with 16 neurons to beta
  • CA3: add third hidden layer with 8 neurons to beta
  • FF: Fama-French model with observable factors
  • Try each model with 1 to 6 factors

Metrics

R^2_{total} = 1-\frac{\sum_{(i,t)\in OOS} (r_{i,t}-\hat\beta'_{i,t-1}\hat f_{t})^2}{\sum_{(i,t)\in OOS}r^2_{i,t}}
R^2_{pred} = 1-\frac{\sum_{(i,t)\in OOS} (r_{i,t}-\hat\beta'_{i,t-1}\hat \lambda_{t-1})^2}{\sum_{(i,t)\in OOS}r^2_{i,t}}
  • \(\hat\lambda_{t-1}\): sample average of \(\hat f\) up to month \(t-1\) 

Results

OOS \(R^2_{total}\)

  • IPCA with 6 factors top performing
  • Closely followed by CA networks
  • FF models worst:
    • Infrequent re-estimation of parameters
    • Much larger cross-section of stocks than normal

OOS \(R^2_{total}\)

OOS \(R^2_{total}\)

OOS \(R^2_{pred}\)

  • CAs did much better than IPCA
  • Static models did poorly

OOS \(R^2_{pred}\)

OOS \(R^2_{pred}\)

Economic Performance

  • Sort stocks based on OOS forecast
  • Create zero-net investment portfolio
    • Buy top 10%
    • Sell bottom 10%
  • Equal-weighted and value-weighted portfolios
  • CA2 top performing

Economic Performance

Economic Performance

  • Constructed using mean and covariance matrix of estimated factors through \(t\) and tracking the post-formation \(t+1\) return
  • CA3 5-factor top performing
  • Not necessarily implementable strategies

Risk Premia vs. Mispricing

  • Models above specified with no intercept (\(\alpha\))
    • Imposes no arbitrage
    • Stock characteristics proxy for compensated factor risk exposures
  • Should there be an intercept?

Risk Premia vs. Mispricing

  • If zero-intercept is correct model, time series average of model residuals for each asset should be statistically indistinguishable from zero
  • \(\alpha_i := E[u_{i,t}]=E[r_{i,t}]-E[\beta'_{i,t-1}f_t]\)
  • Use \(t\)-tests

Risk Premia vs. Mispricing

  • Magnitude of alphas less for CA
  • Fewer are significant
    • Those that are have small magnitude (7 bps/mo)

Characteristics Importance

  • Variable importance by reduction in \(R^2_{total}\) when removed
  • Top 20 characteristics contributed 90% for CA1-3
  • Three influential categories:
    • Price trend: reversal, momentum
    • Liquidity: turnover, dollar volume, bid-ask spread
    • Risk measures: volatility, market beta

Robustness Check

  • Rerun on random subsamples
  • Still performs well

Monte-Carlo Simulations

  • Construct a dataset and test on it
  • Performs well

Conclusion 

  • New nonlinear conditional asset pricing model
  • Embeds economic restriction of no-arbitrage
  • Dominates other asset pricing models
    • Especially in predictive power

Analysis

  • Well written and robust
  • Like that they note perhaps not implementable
  • Powerful framework for evaluating all characteristics
  • Personally biased against monthly studies
    • Need to look back far
    • Market much different 50 years ago
  • Their model can handle daily data/characteristics 
    • Would like to see study with short return periods and higher frequency characteristics
Made with Slides.com