A classification problem:
\(\hat{y}^{(0)}_i = 0\)
\(\hat{y}^{(1)}_i = f_1(x_i) = \hat{y}^{(0)}_i + f_1(x_i) \)
\(\hat{y}^{(2)}_i = f_1(x_i) + f_2(x_i)= \hat{y}^{(1)}_i + f_2(x_i) \)
...
\(\hat{y}^{(t)}_i = \sum^t_{k=1} f_k(x_i) = \hat{y}^{(t-1)}_i + f_t(x_i) \)
\(Obj^{(t)} = \sum^n_{i=1}l(y_i,\hat{y}_i^{(t)}) + \sum^t_{i=1}\Omega(f_i)\)
\(\ \ \ \ \ \ \ \ \ \ = \sum^n_{i=1}l(y_i,\hat{y}_i^{(t-1)} + f_t(x_i)) + \Omega(f_t) + const\)
\( \Omega = 3 \gamma + \frac{1}{2} \lambda (4 + 0.01 + 1) \)
the score of the left child
the score of the right child
the score of if we do not split
the complexity cost by introducing additional leaf
Solution: Skip missing data and give them a default direction in each split
Sorting data in each step of split finding.
Precomputing a data structure that is sorted on each column