Temporal Relational Ranking for Stock Prediction

Feng et al.

Motivation

Most methods are either time series or cross-sectional
Most optimized for classification or regression

Objective

Tailor deep learning models for stock ranking
Capture stock relations in a time-sensitive manner

Results

New component: Temporal Graph Convolution (TGC)
Novel architecture: Relational Stock Ranking (RSR)
- Combine LSTM with TGC
- Jointly models temporal evolution and relation network of stocks
Back-testing on NASDAQ and NYSE
- Beats benchmarks and SFM model

Background

Ranking as a Target

Classification and regression may be suboptimal
Ranking selects stocks with highest expected revenue

Ranking as a Target

Method 2: $\downarrow$ MSE, $\downarrow$ profit
Method 1: picked the biggest change,

$\uparrow$ MSE, $\uparrow$ profit

Stock Relations

Three types of relations:
- Industry
- First Order
- Second Order
Relations are stated as (subject; predicate; object)

Stock Relations

Industry:
- NASDAQ and NYSE classify each stock into a sector and industry
- (GOOGL; Computer Software: Programming, Data Processing; FB)

Stock Relations

First and Second Order:
- Data from Wikidata
- Tens of millions of objects
- Hundreds of millions of statements
  - (Alphabet, Inc.; founded by; Larry Page)

Stock Relations

First Order:
- Company $i$ has a first order relation with $j$ if there is a statement with $i$ as the subject and $j$ as the object
- (Citigroup Inc.; owned by; BlackRock Inc.)

Stock Relations

Second Order:
- Company $i$ has a second order relation with $j$ if they have statements sharing the same object

Stock Relations

Previous/Component Models

Long Short Term Memory

(LSTM)

Graph-based Learning

Minimize objective function $\Gamma=\Omega+\lambda\Phi$
$\Omega$: task specific loss
$\Phi$: graph regularization term; smooths prediction over graph

\[\Phi=\sum_{i=1}^N\sum_{j=1}^N \text{strength of smoothness}_{ij}\cdot \text{smoothness}_{ij}\]

Graph-based Learning

\[\Phi=\sum_{i=1}^N\sum_{j=1}^N \text{strength of smoothness}_{ij}\cdot \text{smoothness}_{ij}\]

Strength of smoothness= $g(x_i,x_j)$ = similarity between feature vectors
- i.e., edge weight

Graph-based Learning

\[\Phi=\sum_{i=1}^N\sum_{j=1}^N \text{strength of smoothness}_{ij}\cdot \text{smoothness}_{ij}\]

smoothness=$||\frac{f(x_i)}{\sqrt{D_{ii}}} - \frac{f(x_j)}{\sqrt{D_{jj}}}||^2$
$D_{ii}$: degree of node $i$
Enforces that similar nodes have similar predictions

Graph Convolutional Networks

Combine smoothness assumption with CNNs
Cannot directly apply convolution onto adjacency matrix
Solution: Use spectral convolutions

Graph Convolutional Networks

\[f(F,X)=UFU^TX\]

$f$: filtering operation of convolution
$F$: diagonal matrix parameterizing convolution
$U$: eigenvector matrix of graph Laplacian matrix $L$
$L=U\Lambda U^T=D^{-1/2}(D-A)D^{-1/2}$

Graph Convolutional Networks

\[f(F,X)=UFU^TX\]

Approximate equation with Chebyshev polynomials of order $k=1$ (proven sufficient in previous work)
Reduces to $f(F,X)=AX$
Elements of A: $A_{ij}=g(x_i,x_j)$
Inject convolution into a fully connected layer as $A(XW+b)$

New Model

Relational Stock Ranking

Input:
$X^t=[x^{t-S+1},\dots,x^t]^T\in \R^{S\times D}$
- sequential input features of a specific stock
- sequence length $S$; $D$ features
$\mathcal X^t\in \R^{N\times S\times D}$
- collected sequential features of all $N$ stocks
Target: $\hat r^{t+1}=f(\mathcal X^t)$
- ranking-aware 1-day return

Relational Stock Ranking

Relations of two stocks:
- $K$ types of relations
- Encode pairwise relation between two stocks as a multi-hot binary vector $a_{ij}\in \R^k$
Relation of all stocks:
- $\mathcal A \in \R^{N\times N\times K}$
- $i$-th row, $j$-th column is $a_{ij}$

Relational Stock Ranking

Three Layers:
- Sequential Embedding Layer
- Relational Embedding Layer
- Prediction Layer

Relational Stock Ranking

Sequential Embedding Layer

Intuitive to first regard the historical status of a stock
First apply a sequential embedding layer
Choose LSTM since captures long-term dependency
- ex: interest rates

Sequential Embedding Layer

Take last hidden state $h_i^t$
Set it as sequential embedding $e_i^t=h_i^t$
$E^t=LSTM(\mathcal X^t)=[e_1^t,\dots,e_N^t]^T\in \R^{N\times U}$
$U$: number of hidden units in LSTM

Relational Embedding Layer

New component of neural network modeling: Temporal Graph Convolution
Generates relational embeddings $\bar E^t\in \R^{N\times U}$
These are time-sensitive (dynamic)
- Key technical contribution of this work
We will build up the model intuitively

Relational Embedding Layer

Uniform Embedding Propagation
- From link analysis

\[\overline{e_i^t} = \sum_{\{j|sum(a_ji)>0\}} \frac{1}{d_j}e_j^t\]

Only stocks that have at least one relation are used
$d_j$: number of stocks satisfying the condition

Relational Embedding Layer

2. Weighted Embedding Propagation

Different relations may have different impacts

\[\overline{e_i^t} = \sum_{\{j|sum(a_ji)>0\}} \frac{g(a_{ji})}{d_j}e_j^t\]

g: Relation-strength function
- Aims to learn impact strength of relations in $a_{ji}$

Relational Embedding Layer

3. Time-aware Embedding Propagation

Relation-strength may evolve over time

\[\overline{e_i^t} = \sum_{\{j|sum(a_ji)>0\}} \frac{g(a_{ji},e_i^t,e_j^t)}{d_j}e_j^t\]

Includes temporal and stock information

Relational Embedding Layer

Two designs of $g$:
- Explicit modeling
- Implicit modeling

Relational Embedding Layer

Explicit modeling:

g(a_{ji},e_i^t,e_j^t)=e_i^{t^T}e_j^t\times \phi(w^Ta_{ji}+b)

similarity

relation importance

$w\in\R^K,b$: model parameters to be learned
$\phi$: leaky rectifier activation function

Relational Embedding Layer

Implicit modeling:

g(a_{ji},e_i^t,e_j^t)=\phi(w^T[e_i^{t^T},e_j^{t^T},a_{ji}^T]^T+b)

$w\in\R^{2U+K},b$: learned
Learn both similarity and relation importance
$g$ is then normalized through a softmax function

Relational Embedding Layer

Connection with graph-based learning:
- Embedding propagation equivalent to GCN at time $t$
- Proof omitted
- GCN must have fixed adjacency matrix

Prediction Layer

Feed both sequential embeddings and revised relational embeddings into a fully connected layer to predict rank-aware returns

Prediction Layer

Proposed objective function:

l(\hat r^{t+1},r^{t+1})=||\hat r^{t+1}-r^{t+1}||^2+\alpha \sum_{i=0}^N\sum_{j=0}^N \max\{0,-(\hat r_i^{t+1}-\hat r_j^{t+1})(r_i^{t+1}-r_j^{t+1})\}

First term: SSE
- punishes difference between ground-truth and prediction
Second term: pair-wise max-margin loss
- Encourages predicted ranking scores of stock pair to have the same relative order as ground-truth

Data

Sequential Data

Source: Google Finance
Historical price data from NASDAQ and NYSE
- NASDAQ more volatile, NYSE more stable
Filter stocks:
- Traded on at least 98% of days
- Never been traded less than $5.00

Historical Price Data

$p_i^t$: closing price of stock $i$ on day $t$
- Normalized by max price of stock $i$
Ground truth: $r_i^{t+1}=(p_i^{t+1}-p_i^t)/p_i^t$
Calculate 5, 10, 20, 30 day moving averages of returns

Stock Relation Data

Sector-industry relations: NASDAQ and NYSE
Company-based relations: Wikidata

Experiment

Three Metrics:

Mean Squared Error (MSE)
- Want smaller value of MSE ($\geq 0$)
Mean Reciprocal Rank (MRR)
- Take average of reciprocal rank of selected stock
- Want larger value of MRR ($[0,1]$)
Cumulative Investment Return Ratio (IRR)
- Want larger value of IRR

Experimental Setting

Market close on day $t$: predict a ranking list and buy the highest ranked stock
Market close on day $t+1$: sell the stock

Experimental Setting

Assumptions:

Same amount of money spent every day
Market liquid enough to buy stock at the closing price on day $t$
Liquid enough to sell at closing price on day $t+1$
Transaction costs ignored

Methods

SFM: Fourier signal deep learning
LSTM: regression target
Rank_LSTM: rank target
GBR: add graph regularization term to Rank_LSTM
GCN: static graph convolution
RSR_E: explicitly modeled RSR
RSR_I: implicitly modeled RSR

Methods

Grid search validation
Adam optimizer, learning rate 0.001

Three Research Questions

How does ranking compare to classification or regression? Can RSR outperform SOTA solutions?
Do stock relations enhance the neural network-based solution? Does TGC outperform GBR or GCN?
How does our proposed RSR solution perform under different back testing strategies?

Q1: Rank

Rank_LSTM outperforms SFM and LSTM in IRR
Fails to beat in all metrics
- Could be because there is a tradeoff between accurately predicting value and order
High variance
- Picking one stock volatile
LSTM performs unexpectedly bad on the NYSE
- Can do better with different parameters (snooping)

Q1: Rank

Q1: Rank

Q2: Relations: Industry

Industry relations more valuable on NYSE than NASDAQ
- NYSE less volatile, industry relations are long-term
On NYSE, all relational methods outperform Rank_LSTM
On NYSE, RSR_E and RSR_I are top two methods
Performance across metrics again inconsistent
On NYSE, much of gains achieved on days 206 and 209
- Importance of accurately doing rank and size

Q2: Relations: Industry

Q2: Relations: Wiki

RSR_E and RSR_I achieve best IRR performance
- Especially on NYSE

Q2: Relations: Wiki

Q2: Relations: Sector-wise

Is the performance sensitive to sectors?
Back-tested each sector separately on NASDAQ
Only technology produced an acceptable IRR

Q2: Relations: Types of Wiki

Compared performance when a type of relation was removed
Top 5 factors:
1. Product or material produced
2. Member of
3. Industry
4. Part of
5. Product or material produced; Industry

Q2: Relations: Types of Wiki

Q2: Brief Conclusion

Considering stock relations helpful for stock ranking, especially on stable markets
Proposed TGC is promising model for stock relations
Important to consider appropriate relations suitable for the target market

Q3: Strategy

Three strategies: Top1, Top5, Top10
Equally split budget among stocks
Top1>Top5>Top10
- If ranking algorithm good, then should put all money on stock with highest expected return
RSR_I continues to do poorly in NASDAQ-Industry

Q3: Strategy

Q3: Strategy: Comparison to Benchmarks

Compare strategy to S&P 500 and DJIA
Also formulate two ideal versions of strategies:
Market: select stocks with highest return ratio from the whole market
Selected: select stocks with highest return ratio from the set of stocks traded by RSR_I

Q3: Strategy: Comparison to Benchmarks

Conclusion

Demonstrated potential of learning-to-rank methods
Proposed RSR framework with novel TGC component
Outperforms indicies and SOTA methods

Future Work

Explore more advanced learning-to-rank techniques
Integrate risk management
Investigate other strategies
- buy-hold-sell (long)
- borrow-sell-buy (short)
Integrate alternative data
- News/social media
Explore TGC in other fields

Analysis

Trading strategy seems unlikely
No reports on statistical significance
Not sure why prices were normalized if in return space
Would like to see sector relations as well
Would like to see all types of relations at once
Everything available on github

Temporal Relational Ranking for Stock Prediction

Feng et al.

Motivation

Objective

Results

Background

Ranking as a Target

Ranking as a Target

Stock Relations

Stock Relations

Stock Relations

Stock Relations

Stock Relations

Stock Relations

Previous/Component Models

Long Short Term Memory

(LSTM)

Graph-based Learning

Graph-based Learning

Graph-based Learning

Graph Convolutional Networks

Graph Convolutional Networks

Graph Convolutional Networks

New Model

Relational Stock Ranking

Relational Stock Ranking

Relational Stock Ranking

Relational Stock Ranking

Sequential Embedding Layer

Sequential Embedding Layer

Relational Embedding Layer

Relational Embedding Layer

Relational Embedding Layer

Relational Embedding Layer

Relational Embedding Layer

Relational Embedding Layer

Relational Embedding Layer

Relational Embedding Layer

Prediction Layer

Prediction Layer

Data

Sequential Data

Historical Price Data

Historical Price Data

Stock Relation Data

Experiment

Three Metrics:

Experimental Setting

Experimental Setting

Methods

Methods

Three Research Questions

Q1: Rank

Q1: Rank

Q1: Rank

Q2: Relations: Industry

Q2: Relations: Industry

Q2: Relations: Industry

Q2: Relations: Wiki

Q2: Relations: Wiki

Q2: Relations: Wiki

Q2: Relations: Sector-wise

Q2: Relations: Types of Wiki

Q2: Relations: Types of Wiki

Q2: Brief Conclusion

Q3: Strategy

Q3: Strategy

Q3: Strategy: Comparison to Benchmarks

Q3: Strategy: Comparison to Benchmarks

Conclusion

Future Work

Analysis

Temporal Relational Ranking

More from Connor Chapin