Data Quality

Problem

We want to know the distribution of  companies.

Assumption

Assumption: The distribution is lognormally distributed with constant average. 

It can be described with the average

Bad Solution 1

Say we have a random sample.

 

The real average is the Orbis average

Problem

That's obviously not true

What's going on

Rich countries -> Larger companies -> Increase Orbis average

 

Rich countries -> Better quality -> Decrease Orbis average

Good Solution

Real average ∝ GDP/(number of Firms)

 

Estimate real average using intrinsic factors.

 

EXTRAPOLATE TO other countries.

FOR EUROPE

Orbis average

EStimated average

We have

  1. Orbis average

  2. Estimated real average

We can check theIR relationship based on completeness

Subtitle

EStimated QUALITY

Moreover

If we assume Orbis starts adding the largest company, then the second largest...

 

We get that the relationship between the real average and the Orbis average depends linearly on the completeness (on log-log scale).

 

And it's nice when theory agrees.

Bonus

How data is added

BoNUS

board size by revenue

Revenue

Data quality

By Javier GB

Data quality

  • 1,197