Some high-level take-aways from the MLSS Africa 2019

Topics & "editor's choice"

Kernel Methods Arthur Gretton
Reinforcement Learning Benjamin Rosman
Ethical AI Bettina Berendt
Variational Inference David Blei
Causal Inference Bernhard Schölkopf & Ferenc Huszar
How to write a great research paper Panel
Monte Carlo Inference Iain Murray
Big Spaces John Skilling
Optimal Transport Marco Cuturi
Big Data & Astronomy Michelle Lochner
Data Science in Practice McElory Hoffmann
Auto. Diff. for ML Niko Brümmer
Probabilistic Thinking Shakir Mohamed
NLP necessary for ML ? Sharon Goldwater
Data Compression Christian Steinruecken

Causal Inference

Schölkopf & Huszar

Schölkopf

Theoretical background
ways of (wrongly) doing causal inference
structural causal models

Huszar

practical considerations
using causal inference to solve ML problems
estimating the relevant quantities from data

Causal Inference

causality compl. probability theory (cf. Simpson's paradox)
treatment A better for small kidney stones
treatment A better for large kidney stones
treatment B better overall (!!!)
Reichenbach's Principle:
"If X and Y are dependent, then there exists Z causally influencing both. X and Y are conditionally independent given Z."
Structural Causal Model:
- DAG giving causal influences (edges) between observables (vertices)
- X = f(PA(X), U)
local causal Markov condition:
each X independent of non-descendants given parents -> structural causal model

Causal Inference Schölkopf

Taken from Ferenc Huszar's Causal Inference in Everyday Machine Learning Slides from MLSS Africa 2019

Causal Inference Ferenc Huszar

Taken from Ferenc Huszar's Causal Inference in Everyday Machine Learning Slides from MLSS Africa 2019

Causal Inference Ferenc Huszar

Taken from Ferenc Huszar's Causal Inference in Everyday Machine Learning Slides from MLSS Africa 2019

Causal Inference Ferenc Huszar

Kernel Methods

Arthur Gretton

IPMs, MMD, Kernel Trick

use features to distinguish distributions
use MMD as divergence measure w/ kernel trick
IPM view:
- find "witness" or discriminative function with maximally different expected values wrt both dist.
- witness functions from RKHS -> MMD
- bounded Lipschitz witness functions -> Wasserstein

Kernel Methods Arthur Gretton

Taken from Arthur Gretton's Kernel Methods Part 1 Slides from MLSS Africa 2019

Kernel Methods Arthur Gretton

Taken from Arthur Gretton's Kernel Methods for Hypothesis Testing and Sample Generation Slides from MLSS Africa 2019

Dependence Detection

COCO: max. singular value of feature covariance

HSIC: sum of singular values = MMD(PXY,PXPY) with prod. kernel

HSIC = 1/n^2 trace(KL)

Optimal Transport

Cuturi

Monge

"transport" continuous (probability) mass distribution to another distribution, subject to cost function (in space)
solution is a deterministic mapping in space

Kantorovic

distribute discrete mass distribution to another distribution, subject to cost function (in space)
solution is a probabilistic mapping/joint distribution in space (Kantorovic Relaxation)

Taken from Marco Cuturi's A Primer on Optimal Transport Slides from MLSS Africa 2019

Optimal Transport Marco Cuturi

Taken from Marco Cuturi's A Primer on Optimal Transport Slides from MLSS Africa 2019

Some high-level take-aways from the MLSS Africa 2019

Some high-level take-aways from the MLSS Africa 2019

Topics & "editor's choice"

Causal Inference

Schölkopf & Huszar

Schölkopf

Huszar

Causal Inference

Causal Inference Schölkopf

Causal Inference Ferenc Huszar

Causal Inference Ferenc Huszar

Causal Inference Ferenc Huszar

Kernel Methods

Arthur Gretton

IPMs, MMD, Kernel Trick

Kernel Methods Arthur Gretton

Kernel Methods Arthur Gretton

Dependence Detection

Optimal Transport

Cuturi

Monge

Kantorovic

Optimal Transport Marco Cuturi

Optimal Transport Marco Cuturi

Optimal Transport Marco Cuturi

MLSS Africa

More from Johannes Leugering