Open access & transparency
of data and models

Jan Philipp Dietrich
dietrixyzch@pik-xyzpotsdam.de

Open Source

Open Access of models & tools

What means Open Source?

6. No Discrimination Against Fields of Endeavor

1. Free Redistribution

2. Source Code

3. Derived Works

...

5. No Discrimination Against Persons or Groups

...

Copyleft

  • Code derivatives must be published under the same license
  • What is open stays open
  • Examples:



  • useful to..
    • ..establish software as common good
    • ..prevent the use in proprietary software

Non-Copyleft

  • Code derivatives can have a different license
  • What is open can be used in any way
  • Examples:

 

 

 

  • useful to ..
    • ..avoid license conflicts
    • ..broaden the range of applications

Two types of Open Source

  • technically yes, otherwise no
  • in addition code should be..
    • ..owned by you
    • ..properly accessible
    • ..documented
    • ..citable
  • license of code contributions should be clarified


Applying a license and thats it?

  • all code owner need to agree to the license
  • ownership has to be clarified
  • code owner are depending on the contract..
    • ..code developer
    • ..employer (institute, university, company...)

Code ownership

Source: Mimi and Eunice  | CC-BY-SA Nina Paley

Contributors License Agreement (CLA)

  • clarifies inbound license
  • useful to bundle code ownership for..
    • .. change of license (e.g. to solve compatibility problems)
    • .. legal representation 
  • in many cases not required

Code ownership - inbound license

GitHub default: inbound = outbound

Outbound license
(code sharing)

Inbound license
(code contributions)

\neq
  • version management strongly recommended
    • development history
    • option to revert changes
  • publicly accessible code repository (e.g. GitHub project)
    • simplifies collaboration
    • provides some feedback about usage
  • decentralized repo (git) better for sharing than centralized (svn)




     
  • recommendation: git in combination with GitHub

Code accessibility

Repository

User

Documentation

powered by goxygen » github.com/pik-piam/goxygen

  • documentation important for code transparency
  • in-code documentation recommended (e.g. doxygen, roxygen, goxygen)
*' @equations

q13_cost_tc(i2) ..
  v13_cost_tc(i2) =e= sum(ct, i13_land(i2) * i13_tc_factor(ct,i2)
                     * vm_tau(i2)**i13_tc_exponent(ct,i2)
                     * (1+pm_interest(i2))**15);

*' Relative technological change costs `v13_cost_tc` are calculated as a
*' heuristically derived power function of the land use intensity `vm_tau` 
*' for the investment-yield-ratio (see figure below) multiplied by the 
*' initial, regional crop areas in 1995 `pm_land_start` and shifted 15 
*' years into the future using the region specific interest rate 
*' `pm_interest`.
*'
*' ![Investment-yield ratio in relation to $\tau$-factor
*' [@dietrich_forecasting_2014]](tcc_regression.png){ width=60% }
*'
*' The shifting is performed because investments into technological change
*' require on average 15 years of research before a yield increase is 
*' achieved, but the model has to see costs and benefits concurrently in 
*' order to take the right investment decisions (see also 
*' @dietrich_forecasting_2014). Investment costs scale with crop area as a 
*' wider areal coverage means typically also higher variety in biophysical
*' conditions and therefore more research required for the same overall 
*' intensity boost.

q13_tech_cost_annuity(i2) ..
 v13_tech_cost_annuity(i2) =e= (vm_tau(i2)/pc13_tau(i2)-1) * v13_cost_tc(i2)
                               * pm_interest(i2)/(1+pm_interest(i2));

*' In order to get the full investments required for the desired 
*' intensification the relative technological change costs are multiplied 
*' with the given intensification rate. These full costs are then 
*' distributed over an infinite time horizon by multiplication with the 
*' interest rate `pm_interest(i)` (annuity with infinite time horizon).

q13_tech_cost(i2) ..
  vm_tech_cost(i2) =e= v13_tech_cost_annuity(i2) + pc13_tech_cost_past(i2);

*' Additionally, the technological change costs coming from past investment
*' decisions are added to the technological change costs of the current 
*' period.

Code citation

> citation("madrat")

Dietrich J, Baumstark L and Giannousakis A (2018). _madrat: May All Data 
be Reproducible and Transparent (MADRaT)_. doi: 10.5281/zenodo.1115490 
(URL: http://doi.org/10.5281/zenodo.1115490), R package version 1.44.0,
<URL: https://github.com/pik-piam/madrat>.

get your work cited!

Digital Object Identifiers (DOI) key for data or software citations

Services such as zenodo.org host code/data and provide DOI

GitHub Integration allows automatic DOI generation for every release

Open Data

Open Access of Data

Open Data vs Open Source

same idea but different licenses

Copyleft

  • Copyleft = Share-Alike (SA)
  • Examples:


     
  • BY = Attribution (Author needs to be mentioned)

 

Non-Copyleft

  • No Share-Alike
  • Examples:
  • central term: database
  • a data point cannot be protected, a database can!
  • CreativeCommons.org (CC) data license toolkit

Databases and Derivatives

problems with fuzzy definition

Often unclear whether copyright rules apply or not:

  • Is it a database or just a data set?
  • Is it a derivative of a database or not?
    (e.g. model outputs)

Workaround for ambiguous cases: publish data processing routine instead of data

Source: Mimi and Eunice  | CC-BY-SA Nina Paley

Standardized input processing

powered by madrat » github.com/pik-piam/madrat

retrieveData: bundle data as required by the model

readSource: read source data and (dis-)aggregate to ISO country level

calcOutput: calculate data sets as required by the model

Open Access  and Transparency in FABLE

What is required?

Decisions to take

Decision

Open vs Non-Open

Copyleft vs Non-Copyleft

 

Recommended licenses

Contributors License Agreement (CLA)

Code distribution

Data distribution

Input data treatment

Output data license

 

 

MAgPIE (as example)

Open

Copyleft for model
Non-Copyleft for tools

copyleft: AGPL | non-copyleft: BSD2

model: yes | tools: no

GitHub

not decided

publication of data preparation routines

not specified (depends on application)

 

 

Further Reading