\textbf{Definition 1 (Gene expression data).} The fundamental entity in gene expression profile data is the individual patient. Each patient profile comprises tens of thousands of genes with measured features.
Let
\[
X = \{x^{(m)}\}_{m=1}^M
\]
denote a dataset of $M$ patients. Each patient can be represented as a sequence of $N$ gene measurements:
\[
x^{(m)} = \{x^{(m)}_1, x^{(m)}_2, \dots, x^{(m)}_N\}.
\]
Let
\[
Y = \{y_1, y_2, \dots, y_{|Y|}\}
\]
denote the set of subtypes for a cancer. Each $x^{(m)}$ is associated with a label $y$.