NMDP Results

Vidhi Lalchand, Ph.D.

For IMU Biosciences

Data & Model 

~300 samples, Ratios [On local parents]

    stratify_labels = clean_response["agvhd34"].values
    indices = np.arange(len(clean_tv_num))
    train_indices, val_indices = train_test_split(
        indices, train_size=250, stratify=stratify_labels, random_state=24
    )
    y = clean_response["agvhd34"].values
    N = len(y)

    X_raw = clean_tv_num.values.astype(float)  # raw ratios

    # 1) Log-transform counts (if they are non-negative)
    X_log = np.log1p(X_raw)    # log(1 + x), safe for x >= 0

    # 2) Standardize to improve kurtosis / skewness
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X_log)

Model Framework: Logistic Classification with Elastic net regularisation 

\min_{w,b}\;\; \mathcal{J}(w,b) = -\sum_{i=1}^N \left[ w_1\, y_i \log\!\big(\sigma(z_i)\big) + w_0\, (1-y_i)\log\!\big(1-\sigma(z_i)\big) \right] + \alpha\left[ \text{weight penalty} \right],
\text{weight penalty} = 0.5\,\|w\|_1 + (1-0.5)\,\frac{1}{2}\|w\|_2^2
z_i = w^\top x_i + b, \qquad \sigma(z_{i}) = p_{i} = \frac{1}{1 + e^{-z_{i}}}.
w_1 = \frac{N}{2N_1}, \qquad w_0 = \frac{N}{2N_0}.

For total samples NNN, positives N1N_1N1, negatives N0N_0N0

Threshold \(p_{i}\) to classify, \( y_{i} \in (0,1)\)

Total objective Loss: 

Results for Training on Raw - Ratios Tv5

Respectable but unreliable generalisation. 

Results for Training on Scaled - Ratios Tv5

100% training acc, generalises with strong penalisation for false negatives, but can still produce FNs. Scaling the ratios is better for generalisation.

class TinyMLP(nn.Module):
    def __init__(self, d_in, d_hidden=24, p=0.4):
        super().__init__()
        self.net = nn.Sequential(
            nn.LayerNorm(d_in),
            nn.Linear(d_in, d_hidden), nn.ReLU(), nn.Dropout(p),
            nn.Linear(d_hidden, d_hidden//2), nn.ReLU(), nn.Dropout(p),
            nn.Linear(d_hidden//2, 1)
        )
    def forward(self, x): return self.net(x).squeeze(-1)

Alternative Framework: Small MLP w. dropout

Canonical MLP w. ReLU non-linearity and a dropout probability parameter of 0.4

Results for Training with Small MLP w. aggressive dropout

100% training acc, generalises with strong penalisation for false negatives, over cautious 

NMDP Final Results

By Vidhi Lalchand

NMDP Final Results

  • 7