ILLUSTRIS TNG

C. Cuesta, C. Becker,

S. Bose, C. Arnold and C. Baugh

Galaxy-Halo Connection

Hydro simulations

Empirical

models

&

?

Full Physics

Dark Matter Only

1) Find 50 most bound DM particles

Halo #1

Halo #23

Halo #1

2) Find DMO halo with at least 50% of these particles

97% of halos matched

Bijective!

2) Find DMO halo with at least 50% of these particles

Clustering as a function of halo mass: HOD

\sigma_{\log M}
M_{\mathrm{min}}

Full Physics

Dark Matter Only

r^2\xi(r)
r \,\, \mathrm{[Mpc/h]}

Shape

Dynamics

Temporal Evolution

Environment

Mass

Radius

Concentration

...

Velocity dispersion

Vmax

Velocity anisotropy

Spin

....

Formation time

Nmergers

...

Mass  in torus

around halo

 

Full Physics

Dark Matter Only

Halo #1

Halo #23

Halo #1

N_c = f(x=\mathrm{Halo \, properties}) \\ = \int_{M_{\star, \mathrm{min}} }^\infty dM_\star \phi_c(M_\star|x)

Learn from Illustris

Decision Trees

Be greedy: try all splits

S = \{(\bold{x}_1, y_1), ..., ((\bold{x}_n, y_n))\}
S
x_k < t

YES

NO

\mathrm{MSE}(S) - \left( \color{green}{\frac{|S_L|}{|S|}\mathrm{MSE}(S_L)} \color{black}{+} \color{blue}{\frac{|S_R|}{|S|}\mathrm{MSE}(S_R) }\right)

How good is a given split?

S_L
S
S_R

Loss function 

 Mean Squared Error

Regularisation

Maximum depth

\mathrm{IMP}(S) - \left( \color{green}{\frac{|S_L|}{|S|}\mathrm{IMP}(S_L)} \color{black}{+} \color{blue}{\frac{|S_R|}{|S|}\mathrm{IMP}(S_R) }\right)

Boosting

Tree depth

Error

Decision Tree Regression

Target

Data

Underfitting

Overfitting

Optimal

Bagging

Tree Depth

Fight overfitting: Bagging

Bootstrap 1

Bootstrap 2

Bootstrap 3

Decision Tree 1

Decision Tree 2

Decision Tree 3

Average

Extras!

Out of bag errors (no need for validation set)

Can do the same with features (improve over greediness)

 

Fight bias: Boosting

Focus on difficult samples: Gradient descent in function space!

y_i = y_{i-1} - c_l \nabla_{y_{l-1}}E(y_{l-1})

Error

Previous prediction

New prediction

Bagging + Boosting =

LightGradientBoostedMachine

(LGBM)

Model performance

r^2\xi(r)
r \,\, \mathrm{[Mpc/h]}
\frac{\left(\mathrm{MSE}_\mathrm{all} - \mathrm{MSE}_\mathrm{drop}\right)}{\mathrm{MSE}_\mathrm{all}}

Simplify the model

Clustering and environment

With

\Delta M_5

Without

\Delta M_5
r \,\, \mathrm{[Mpc/h]}
r^2\xi(r)
r \,\, \mathrm{[Mpc/h]}

Clustering and environment

Conclusions

  • Trained an ensemble of decision trees to model the relation between stellar mass and dark matter halo properties.
  • It reproduces the two point correlation function of central galaxies in Illustris TNG, as opposed to the baseline HOD model.
  • The model can be used to populate mock catalogues, based on  the halo's:
  • #TODO Satellite galaxies?
  • Bias in environment have a strong effect in clustering.

What can we learn from the machine?

i) Decision Trees default: Sum of impurity gains (MSE reduction at a given split) per feature.

 

Dynamical range, number of splits

 Correlations

ii) Difference in MSE after retraining a model without the feature of interest.

 

Uncorrelated features

Should I ask a question after this talk?

Is it almost lunch time?

Do you want your colleagues to hate you?

Ask!

Do I care?

Don't ask

Is it about magnetic fields?

Ask!

Ask!

Don't ask

YES

NO

NO

NO

YES

YES

NO

NO

YES

Don't ask

A tree grows in Illustris TNG:  the galaxy-halo connection learnedby boosted decision trees

By carol cuesta

A tree grows in Illustris TNG:  the galaxy-halo connection learnedby boosted decision trees

  • 672