ILLUSTRIS TNG

C. Cuesta, C. Becker,

S. Bose, C. Arnold and C. Baugh

Galaxy-Halo Connection

Hydro simulations

Empirical

models

Zenter, Hearin, van den Bosch (2013)

Nelson et al (2018)

Full Physics

Dark Matter Only

1) Find 50 most bound DM particles

Halo #1

Halo #23

Halo #1

2) Find DMO halo with at least 50% of these particles

97% of halos matched

Bose et al (2019)

Bijective!

2) Find DMO halo with at least 50% of these particles

Clustering as a function of halo mass: HOD

\sigma_{\log M}

M_{\mathrm{min}}

Full Physics

Dark Matter Only

r^2\xi(r)

r \,\, \mathrm{[Mpc/h]}

Shape

Dynamics

Temporal Evolution

Environment

Mass

Radius

Concentration

...

Velocity dispersion

Vmax

Velocity anisotropy

Spin

....

Formation time

Nmergers

...

Mass in torus

around halo

Full Physics

Dark Matter Only

Halo #1

Halo #23

Halo #1

N_c = f(x=\mathrm{Halo \, properties}) \\ = \int_{M_{\star, \mathrm{min}} }^\infty dM_\star \phi_c(M_\star|x)

Learn from Illustris

Decision Trees

Be greedy: try all splits

S = \{(\bold{x}_1, y_1), ..., ((\bold{x}_n, y_n))\}

x_k < t

YES

\mathrm{MSE}(S) - \left( \color{green}{\frac{|S_L|}{|S|}\mathrm{MSE}(S_L)} \color{black}{+} \color{blue}{\frac{|S_R|}{|S|}\mathrm{MSE}(S_R) }\right)

How good is a given split?

S_L

S_R

Loss function

Mean Squared Error

Regularisation

Maximum depth

\mathrm{IMP}(S) - \left( \color{green}{\frac{|S_L|}{|S|}\mathrm{IMP}(S_L)} \color{black}{+} \color{blue}{\frac{|S_R|}{|S|}\mathrm{IMP}(S_R) }\right)

Boosting

Tree depth

Error

Decision Tree Regression

Target

Data

Underfitting

Overfitting

Optimal

Bagging

Tree Depth

Credit: https://scikit-learn.org/stable/modules/tree.html

Fight overfitting: Bagging

Bootstrap 1

Bootstrap 2

Bootstrap 3

Decision Tree 1

Decision Tree 2

Decision Tree 3

Average

Extras!

Out of bag errors (no need for validation set)

Can do the same with features (improve over greediness)

Fight bias: Boosting

Focus on difficult samples: Gradient descent in function space!

Credit :  https://towardsdatascience.com/ensemble-methods-bagging-boosting-and-stacking-c9214a10a205

y_i = y_{i-1} - c_l \nabla_{y_{l-1}}E(y_{l-1})

Error

Previous prediction

New prediction

Bagging + Boosting =

LightGradientBoostedMachine

(LGBM)

Model performance

r^2\xi(r)

r \,\, \mathrm{[Mpc/h]}

\frac{\left(\mathrm{MSE}_\mathrm{all} - \mathrm{MSE}_\mathrm{drop}\right)}{\mathrm{MSE}_\mathrm{all}}

Simplify the model

Clustering and environment

With

\Delta M_5

Without

\Delta M_5

r \,\, \mathrm{[Mpc/h]}

r^2\xi(r)

r \,\, \mathrm{[Mpc/h]}

Clustering and environment

Conclusions

Trained an ensemble of decision trees to model the relation between stellar mass and dark matter halo properties.

It reproduces the two point correlation function of central galaxies in Illustris TNG, as opposed to the baseline HOD model.

The model can be used to populate mock catalogues, based on the halo's:

#TODO Satellite galaxies?

Bias in environment have a strong effect in clustering.

What can we learn from the machine?

i) Decision Trees default: Sum of impurity gains (MSE reduction at a given split) per feature.

Dynamical range, number of splits

Correlations

ii) Difference in MSE after retraining a model without the feature of interest.

Uncorrelated features

Should I ask a question after this talk?

Is it almost lunch time?

Do you want your colleagues to hate you?

Ask!

Do I care?

Don't ask

Is it about magnetic fields?

Ask!

Don't ask

YES

Don't ask

A tree grows in Illustris TNG: the galaxy-halo connection learnedby boosted decision trees

By carol cuesta

A tree grows in Illustris TNG: the galaxy-halo connection learnedby boosted decision trees

5 years ago
761

C. Cuesta, C. Becker,

S. Bose, C. Arnold and C. Baugh

Galaxy-Halo Connection

Clustering as a function of halo mass: HOD

Be greedy: try all splits

Decision Tree Regression

Fight overfitting: Bagging

Fight bias: Boosting

Model performance

Simplify the model

Clustering and environment

Clustering and environment

Conclusions

What can we learn from the machine?

Uncorrelated features

A tree grows in Illustris TNG: the galaxy-halo connection learnedby boosted decision trees

More from carol cuesta