Computational Geophysicist at the Sydney Informatics Hub, University of Sydney
All models are wrong
Testing your assumptions
Dr. Ben Mather
University of Sydney
Essentially, all models are wrong,
but some are useful.
- Box & Draper
Empirical Model-Building and Response Surfaces (1987)
A Newspoll conducted shortly before the federal election predicted a Labor victory 53% to the Coalition's 47% on a two-party preferred preference
After election night
When it comes to the opinion polling, something’s obviously gone really crook with the sampling
both internally and externally.
- ABC political editor Andrew Probyn
Line of best fit
When do we make assumptions?
- Everyday life
- Whenever we interpret data
- When we predict something based on data
- Nature of trends
- Constant, linear, quadratic, etc.
- Correlation length scales
- Distinct populations in data
- Socio-economic classes, smokers
- geochemists, palaeontologists, flat earthers
- Presence of bias in a sample group
- People who respond to Newspoll surveys
- When we predict something based on prior experience
Good scientists will...
- Make an objective observation.
- Infer something (a hypothesis) from that observation.
Good scientists will not...
- Formulate a hypothesis
- Find / assume all data that fits their hypothesis
Some useful assumptions
- Newton's 3 laws of motion
- Greenhouse effect
- The first dice-roll has no effect on the second dice-roll
- The temperature in Newtown is the same as that in Marrickville
- John Farnham will perform at least one more goodbye tour
There are a lot of words here and most of them mean the same things.
Machine Learning = Inference
Start with the basics
- Does it pass the common sense test?
- "Bad" models can also tell you something interesting.
- Are there alternatives?
- What are you going to do with your model?
Generate 50%, 95%, 99% confidence intervals using randomly drawn models
There may be many solutions that fit the same set of observations.
- Formally describes the link between observations, model, & prior information.
- Where these intersect is called the posterior
example of an ill-posed problem
example of a well-posed problem
- Use the data to "drive" the model.
- Infer what input parameters you need to satisfy your data and prior information
Model being solved
Compare data & priors
We can estimate the value of pi with monte carlo sampling.
Python code to run simulation
Monte Carlo sampling
Markov-Chain Monte Carlo sampling (MCMC)
MCMC with gradient
MCMC with gradient (caveat emptor!)
Heat flow data
- Assimilate heat flow data
- Vary rates of heat production and geometry of each layer to match data
- Plug m and d into Bayes' theorem
Heat flow in Ireland
- Ascertain the difference between reconstructions
- Does not take into account data uncertainty
- Sensitivity analysis / "bootstrapping"
There are known knowns;
there are things we know we know.
We also know there are known unknowns; that is to say we know there are some things we do not know.
But there are also unknown unknowns - the ones we don't know we don't know.
- Donald Rumsfeld
Former US Secretary of Defense
- Europeans thought all swans were white... until they came to Australia
- How can you ever model what you can't imagine?
- How can you test assumptions without rare events that prove them wrong?
Dr. Ben Mather
Madsen Building, School of Geosciences,
The University of Sydney, NSW 2006
By Ben Mather