Photometric Redshifts (in HELP)

Kenneth Duncan

Cosmic Census - Oct 2017

Leiden Observatory

e.g. CANDELS UDS: Galametz et al. (2013)

You have your nice new multi-wavelength catalog, now what...

Recipe 1:

Template fitting photo-z estimates

Step 1: The code

EAZY

Brammer et al. (2008)

LePhare

Arnouts et al. (1999)

Ilbert et al. (2006)

PhotoZ

Bender et al. (2001)

Hyper-Z

Bolzonella et al. (2000)

ZEBRA

Feldmann et al. (2006)

BPZ

Benitez (2000)

Step 1: The code

Total citations: ~2800

Then

Now

Step 2: The Templates

Step 3: zeropoint offsets and additional smoothing errors

Additional rest-frame errors

Corrections to the observed zeropoints

Brammer et al. (2008)

Dust

AGB Stars?

PAH/Dust emission/AGN?

Step 4: priors (optional)

Brammer et al. (2008)

Benitez (2000)

Magnitude

Spectral type

Recipe 2:

Training based photo-z estimates

(aka machine learning)

Aside: Motivations for ML-based Photo-z's

Euclid

LSST

Aside: Motivations for training (ML) based Photo-z's

1. Speed

Euclid: ~1.5 billion galaxies

LSST: ~10 billion galaxies

Estimated time to run EAZY on all sources (on a desktop machine):

~2+ years (Euclid)

~14+ years (LSST)

Motivations for training (ML) based Photo-z's

2. Improvements in accuracy

Sanchez et al. (2014)

Weak Lensing requirements:

Scatter

\sigma_{z}/(1+z) < 5\%

\langle z \rangle / (1+z) < 0.2\%

Bias

Step 1: Select your training sample

i.e. a representative subset of your sample with spectroscopic redshifts

Step 2: Pick your favourite regression/classification algorithm

Neural Networks

Self-organizing Maps (SOMs)

Deep learning

Support Vector Machines (SVM)

Naive Bayes

Gaussian Processes

Generalized Linear Models

Bayesian Network

k-Nearest Neighbour

Boosted Decision Trees

Randomised Forests

Relevance vector machines

Radial basis function networks

Normalised inner product nearest neighbour

Directional neighbourhood fitting

Voronoi tesselation density estimator

Non-conditional density estimation

Neural Networks

Self-organizing Maps (SOMs)

Deep learning

Support Vector Machines (SVM)

Naive Bayes

Gaussian Processes

Generalized Linear Models

Bayesian Network

k-Nearest Neighbour

Boosted Decision Trees

Randomised Forests

Relevance vector machines

Radial basis function networks

Normalised inner product nearest neighbour

Directional neighbourhood fitting

Voronoi tesselation density estimator

Non-conditional density estimation

Step 3: Train your regression/classification algorithm

Step 4: Apply to your science sample

magic happens somewhere here

Pros and Cons of ML Photo-z's

Pro:

Fast and scalable
Entirely empirical:
no concern about template choice
photometry systematics less of a problem
Simple to include extra information:
properties such as size and morphology can help break degeneracies

Con:

Entirely dependent on spectroscopic training sample
Struggle more with inhomogeneous datasets (e.g. missing filters)
Difficult to physically interpret solutions - e.g. rest-frame colours

Final step: (For all photo-z methods)

Fraction of spectroscopic redshifts within given confidence interval

Dahlen et al. (2012)

!

7/11 submitted photo-z estimates significantly overconfident for 1-sigma errors

Calibrating redshift pdfs

Wittman et al. (2016)

Improving photo-z estimates even more...

the wisdom of crowds

Combine multiple photo-z estimates

Dahlen et al. (2012)

= Median of all photo-z estimates

= Median of best 5 photo-z estimates

Photometric redshift strategy for HELP

Gory details presented in...
Duncan et al. (2017a, 1709.09183)
and Duncan et al. (2017b, in prep)

Overall strategy for HELP

Run photo-z estimates using 3 different template libraries:

- eazy templates (stellar only)
- Salvato et al. XMM-COSMOS library (stellar and AGN/QSO)
- Michael Brown’s ‘Atlas of Galaxy SEDs’ (stellar and AGN)
Separate galaxies and AGN dominated sources where possible (optical/IR/X-ray selection) -> optimise magnitude priors and calibration procedure for each set
Combine individual estimates to produce consensus P(z) using hierarchical Bayesian Combination

Overall strategy for HELP

Run photo-z estimates using 3 different template libraries:

- eazy templates (stellar only)
- Salvato et al. XMM-COSMOS library (stellar and AGN/QSO)
- Michael Brown’s ‘Atlas of Galaxy SEDs’ (stellar and AGN)

a) Zeropoint offset calculated separately for each individual template set

b) Lazy parallelisation of eazy, field split into many chunks and run in parallel.

Overall strategy for HELP

2. Separate galaxies and AGN dominated spectra where possible - optimise magnitude priors and calibration procedure for each set

Overall strategy for HELP

Run photo-z estimates using 3 different template libraries:

- eazy templates (stellar only)
- Salvato et al. XMM-COSMOS library (stellar and AGN/QSO)
- Michael Brown’s ‘Atlas of Galaxy SEDs’ (stellar and AGN)
Separate galaxies and AGN dominated sources where possible (optical/IR/X-ray selection) -> optimise magnitude priors and calibration procedure for each set
Combine individual estimates to produce consensus P(z) using hierarchical Bayesian Combination

What HELP will produce

1. Photometric redshift catalogs, including:

- Primary and secondary solutions

- Calibrated uncertainty estimates

What HELP will produce

1. Photometric redshift catalogs, including:

- Primary and secondary solutions

- Calibrated uncertainty estimates

- A range of corresponding diagnostic plots for each field

What HELP will produce

2. Selection functions:

For a source with a given set of photometric properties...

a) what is the probability of a photo-z estimate existing in the HELP database

b) what is the probability of a reliable* photo-z estimate existing in the HELP database

*a very flexible definition

Where HELP can help in future...

Compilation and homogenisation of datasets make machine learning estimates a more viable option for some fields

Incorporating targeted ML estimates can dramatically improve estimates for AGN

Summary

Producing consistent high quality photo-zs for 1300sq.deg of the sky is a challenge...but manageable

The heterogeneous nature of the datasets makes template fitting the only feasible starting point

Bayesian combination of multiple redshift estimates provides near optimal solutions across multiple fields/source types

Calibrate your photo-z errors!

Photometric redshifts in HELP

By Kenneth Duncan

Photometric redshifts in HELP

Review of photometric redshifts past, present and future. For the Lorentz Workshop Jun 20th-24th

Photometric Redshifts (in HELP)

Kenneth Duncan

You have your nice new multi-wavelength catalog, now what...

Recipe 1:

Template fitting photo-z estimates

Step 1: The code

EAZY

LePhare

PhotoZ

Hyper-Z

ZEBRA

BPZ

Step 1: The code

Then

Now

Step 2: The Templates

Step 3: zeropoint offsets and additional smoothing errors

Step 4: priors (optional)

Magnitude

Spectral type

Recipe 2:

Training based photo-z estimates

Aside: Motivations for ML-based Photo-z's

Euclid

LSST

Aside: Motivations for training (ML) based Photo-z's

1. Speed

Motivations for training (ML) based Photo-z's

2. Improvements in accuracy

Step 1: Select your training sample

Step 2: Pick your favourite regression/classification algorithm

Step 3: Train your regression/classification algorithm

Step 4: Apply to your science sample

magic happens somewhere here

Pros and Cons of ML Photo-z's

Pro:

Con:

Final step: (For all photo-z methods)

Fraction of spectroscopic redshifts within given confidence interval

!

Calibrating redshift pdfs

Calibrating redshift pdfs

Improving photo-z estimates even more...

the wisdom of crowds

Combine multiple photo-z estimates

Photometric redshift strategy for HELP

Overall strategy for HELP

Overall strategy for HELP

Overall strategy for HELP

Overall strategy for HELP

What HELP will produce

What HELP will produce

What HELP will produce

What HELP will produce

Where HELP can help in future...

Compilation and homogenisation of datasets make machine learning estimates a more viable option for some fields

Summary

Photometric redshifts in HELP

More from Kenneth Duncan