\(TP = \sum^T_{t=1}\ell_t\hat{\ell}_t\),
\(FP=\sum^T_{t=1}(1-\ell_t)\hat{\ell}_t\),
\(FN=\sum^T_{t=1}\ell_t(1-\hat{\ell}_t)\).
\(\ell_t\): true label for pair \(t\).
\(\hat{\ell}_t\): estimated label for pair \(t\).
Where:
\(TP = \sum^T_{t=1}\ell_t\hat{\ell}_t\),
\(FP=\sum^T_{t=1}(1-\ell_t)\hat{\ell}_t\),
\(FN=\sum^T_{t=1}\ell_t(1-\hat{\ell}_t)\).
\(\ell_t\): true label for pair \(t\).
\(\hat{\ell}_t\): estimated label for pair \(t\).
Where:
\(\alpha = 0\): recall,
\(\alpha = 1\): precision.
With the purpose of getting \(F_\alpha\), sampling pairs and asking Oracle to estimate is a good choice.
Wait...
Wait...
Since the original distribution is imbalanced, uniformly sampling doesn't work here because of how we calculate \(F_\alpha\) (Why?).
Wait...
Since the original distribution is imbalanced, uniformly sampling doesn't work here because of how we calculate \(F_\alpha\) (Why?).
Wrong answers don't contribute to F score!
So, why not use bias against bias?
Instead of sampling directly on \(p(x)\) to estimate \(\theta = E[f(X)] \),
i.e. \(\hat{\theta} = \frac{1}{T}\sum^T_{i=1}f(x_i)\),
we draw samples from \(q(x)\).
Interestingly, we can still do estimation by \(\hat{\theta}^{IS} = \frac{1}{T}\sum^T_{i=1}\frac{p(x_i)}{q(x_i)}f(x_i)\).
Rewrite \(\textit{F}_{\alpha,T} = \frac{TP}{\alpha(TP+FP) + (1-\alpha)(TP+FN)}\)
to \(\textit{F}_{\alpha}^{AIS} = \frac{\sum^T_{t=1} w_t \ell_t \hat{\ell}_t}{\alpha \sum^T_{t=1}w_t\hat{\ell}_t + (1-\alpha)\sum^T_{t=1}w_t \ell_t}\)
Note:
Rewrite \(\textit{F}_{\alpha,T} = \frac{TP}{\alpha(TP+FP) + (1-\alpha)(TP+FN)}\)
to \(\textit{F}_{\alpha}^{AIS} = \frac{\sum^T_{t=1} w_t \ell_t \hat{\ell}_t}{\alpha \sum^T_{t=1}w_t\hat{\ell}_t + (1-\alpha)\sum^T_{t=1}w_t \ell_t}\)
Find \(q(z_t)\), how?
Note:
\(q^* \in \arg \min_q Var(\hat{F}_\alpha^{AIS}[q])\)
Previous work: \(q^*(z) \propto p(z) \cdot A(F_\alpha,p_{Oracle}(1|z_t))\)
\(w_t = \frac{p(z_t)}{q(z_t)}\)
Note: \(A(\cdot)\) here stands for ellipsis
\(q^* \in \arg \min_q Var(\hat{F}_\alpha^{AIS}[q])\)
\(w_t = \frac{p(z_t)}{q(z_t)}\)
Note: \(A(\cdot)\) here stands for ellipsis
As long as we got \(F_\alpha\) and \(p_{Oracle}(1|z_t)\), problem solved.
Previous work: \(q^*(z) \propto p(z) \cdot A(F_\alpha,p_{Oracle}(1|z_t))\)
\(q(z) = \epsilon \cdot p(z) + (1-\epsilon) \cdot q^*(z)\).
Epsilon greedy, multi-armed bandits problem.
\(F_\alpha\) and \(p(1|z)\) - unknown!
Approximate them iteratively: For each step \(t+1\), use \(F_\alpha\) and \(p(1|z)\) in step \(t\).
Intuitively, we can just use \(\hat{F}^{AIS}_\alpha\) instead of \(F_\alpha\).
There's no way we can do to get the distribution of the oracle without asking them for every \(z\).
But...
There's no way we can do to get the distribution of the oracle without asking them for every \(z\).
But...
Use stratification to approximate is feasible
Stratification is a statistical method to estimate some variable by putting samples into several strata. A.K.A. bin boxes.
Similarity func \(s: \mathcal{Z} \to \mathbb{R}\) kicks in.
For each \(z\) in \(P_k\), we can use \(p(1|P_k)\) instead of \(p(1|z)\) now.
We treat each \(p(1|P_k) \sim Bernoulli(\pi_k)\), so \(\pi_k \sim Beta(\alpha, \beta)\).
How to update \(\alpha\) and \(\beta\) iteratively? Easy.
if \(\ell_t = 1\), \(\alpha += 1\)
if \(\ell_t = 0\), \(\beta += 1\)