Data quality in interlocking directorates
Javier Garcia-Bernardo
SUNBELT, April 10th, 2016
@javiergb_com / @UvACORPNET
1. Interlocking directorates
Corporate networks
Corporate networks
2. Types of missing data
2.1. Missing fields
2.2. Missing nodes
2.1. Fields missing
Employment Turnover Sector ID
2.2. Nodes missing
2.2. Nodes missing
2.2. Nodes missing
ORBIS data (200 million companies)
Observed average revenue
3. How to know where
your data stands?
3.1. Exploration: Comparison with external sources
3.2. Explanation: Distribution approach
Company data quality
Many small companies are missing
3.1. Exploration
0-9 10-19 20-49 50-249 GE250
Interactive visualizations
3.1. Exploration
Code: https://github.com/uvacorpnet/interactive_visualizations
Our data is biased toward big companies
- Higher GDP/capita ➙ Larger average companies
- Higher GDP/capita ➙ Higher quality
- Higher quality ➙ Smaller observed average (since we have the small ones)
3.1. Exploration
Our data is biased toward big companies
- Higher GDP/capita ➙ Larger average companies
- Higher GDP/capita ➙ Higher quality
- Higher quality ➙ Smaller observed average (since we have the small ones)
Results in lack of correlation:
3.1. Exploration
Distribution approach:
3.2.1. Data follows lognormal distribution (loc and scale).
3.2.2. The lognormal distributions have constant scale.
3.2.3. Macro-economics to estimate location parameter.
3.2.4. Assess completeness
3.2. Explanation
3.2.1. Data follows lognormal distribution.
- Slope 1 relationship VAR[X] vs E[X] = constant scale
- Constant scale: Linear relationship between E[X] and location
3.2.2. The lognormal distributions have constant scale.
- Use macro-economic indicators to find average and location parameter
3.2.3. Macro-economics to estimate location parameter.
AVERAGE IN THE DATABASE
ESTIMATED AVERAGE
3.2.4. Assess completeness
- We have 1) observed average 2) estimated average.
- The relationship between both is proportional to completeness under reasonable assumptions
Company revenue
- We know which type of companies are missing.
- We know the directors associated to the type of companies that are missing.
- We can recreate companies and their directors and measure the impact on network measures (in progress).
4. Conclusions
We’re a multidisciplinary team, bringing together political science, computer science, network science, sociology, and based at the Amsterdam Insitute for Social Science Research.
We're hiring two PhD positions on corporate control and network analysis (see corpnet.uva.nl or @UvACORPNET for details)
Follow us on Twitter:
@javiergb_com
@UvACORPNET
Check our website: http://corpnet.uva.nl/
Javier Garcia-Bernardo
garcia@uva.nl
Methodology class
By Javier GB
Methodology class
Presentation for Sunbelt 2016
- 1,409