Data Quality
Problem
We want to know the distribution of companies.
Assumption
Assumption: The distribution is lognormally distributed with constant average.
It can be described with the average
Bad Solution 1
Say we have a random sample.
The real average is the Orbis average
Problem
That's obviously not true
What's going on
Rich countries -> Larger companies -> Increase Orbis average
Rich countries -> Better quality -> Decrease Orbis average
Good Solution
Real average ∝ GDP/(number of Firms)
Estimate real average using intrinsic factors.
EXTRAPOLATE TO other countries.
FOR EUROPE
Orbis average
EStimated average
We have
-
Orbis average
-
Estimated real average
We can check theIR relationship based on completeness
Subtitle
EStimated QUALITY
Moreover
If we assume Orbis starts adding the largest company, then the second largest...
We get that the relationship between the real average and the Orbis average depends linearly on the completeness (on log-log scale).
And it's nice when theory agrees.
Bonus
How data is added
BoNUS
board size by revenue
Revenue
Data quality
By Javier GB
Data quality
- 1,197