Open access & transparency
of data and models
Jan Philipp Dietrich
dietrixyzch@pik-xyzpotsdam.de
Open Source
Open Access of models & tools
What means Open Source?
6. No Discrimination Against Fields of Endeavor
1. Free Redistribution
2. Source Code
3. Derived Works
...
5. No Discrimination Against Persons or Groups
...
Copyleft
- Code derivatives must be published under the same license
- What is open stays open
- Examples:
- useful to..
- ..establish software as common good
- ..prevent the use in proprietary software
- ..establish software as common good
Non-Copyleft
- Code derivatives can have a different license
- What is open can be used in any way
- Examples:
- useful to ..
- ..avoid license conflicts
- ..broaden the range of applications
Two types of Open Source
- technically yes, otherwise no
- in addition code should be..
- ..owned by you
- ..properly accessible
- ..documented
- ..citable
- license of code contributions should be clarified
Applying a license and thats it?
- all code owner need to agree to the license
- ownership has to be clarified
- code owner are depending on the contract..
- ..code developer
- ..employer (institute, university, company...)
Code ownership
Source: Mimi and Eunice | CC-BY-SA Nina Paley
Contributors License Agreement (CLA)
- clarifies inbound license
- useful to bundle code ownership for..
- .. change of license (e.g. to solve compatibility problems)
- .. legal representation
- in many cases not required
Code ownership - inbound license
GitHub default: inbound = outbound
Outbound license
(code sharing)
Inbound license
(code contributions)
- version management strongly recommended
- development history
- option to revert changes
- publicly accessible code repository (e.g. GitHub project)
- simplifies collaboration
- provides some feedback about usage
-
decentralized repo (git) better for sharing than centralized (svn)
- recommendation: git in combination with GitHub
Code accessibility
Repository
User
Documentation
powered by goxygen » github.com/pik-piam/goxygen
- documentation important for code transparency
- in-code documentation recommended (e.g. doxygen, roxygen, goxygen)
*' @equations
q13_cost_tc(i2) ..
v13_cost_tc(i2) =e= sum(ct, i13_land(i2) * i13_tc_factor(ct,i2)
* vm_tau(i2)**i13_tc_exponent(ct,i2)
* (1+pm_interest(i2))**15);
*' Relative technological change costs `v13_cost_tc` are calculated as a
*' heuristically derived power function of the land use intensity `vm_tau`
*' for the investment-yield-ratio (see figure below) multiplied by the
*' initial, regional crop areas in 1995 `pm_land_start` and shifted 15
*' years into the future using the region specific interest rate
*' `pm_interest`.
*'
*' ![Investment-yield ratio in relation to $\tau$-factor
*' [@dietrich_forecasting_2014]](tcc_regression.png){ width=60% }
*'
*' The shifting is performed because investments into technological change
*' require on average 15 years of research before a yield increase is
*' achieved, but the model has to see costs and benefits concurrently in
*' order to take the right investment decisions (see also
*' @dietrich_forecasting_2014). Investment costs scale with crop area as a
*' wider areal coverage means typically also higher variety in biophysical
*' conditions and therefore more research required for the same overall
*' intensity boost.
q13_tech_cost_annuity(i2) ..
v13_tech_cost_annuity(i2) =e= (vm_tau(i2)/pc13_tau(i2)-1) * v13_cost_tc(i2)
* pm_interest(i2)/(1+pm_interest(i2));
*' In order to get the full investments required for the desired
*' intensification the relative technological change costs are multiplied
*' with the given intensification rate. These full costs are then
*' distributed over an infinite time horizon by multiplication with the
*' interest rate `pm_interest(i)` (annuity with infinite time horizon).
q13_tech_cost(i2) ..
vm_tech_cost(i2) =e= v13_tech_cost_annuity(i2) + pc13_tech_cost_past(i2);
*' Additionally, the technological change costs coming from past investment
*' decisions are added to the technological change costs of the current
*' period.
Code citation
> citation("madrat")
Dietrich J, Baumstark L and Giannousakis A (2018). _madrat: May All Data
be Reproducible and Transparent (MADRaT)_. doi: 10.5281/zenodo.1115490
(URL: http://doi.org/10.5281/zenodo.1115490), R package version 1.44.0,
<URL: https://github.com/pik-piam/madrat>.
get your work cited!
Digital Object Identifiers (DOI) key for data or software citations
Services such as zenodo.org host code/data and provide DOI
GitHub Integration allows automatic DOI generation for every release
Open Data
Open Access of Data
Open Data vs Open Source
same idea but different licenses
Copyleft
- Copyleft = Share-Alike (SA)
- Examples:
- BY = Attribution (Author needs to be mentioned)
Non-Copyleft
- No Share-Alike
- Examples:
- central term: database
- a data point cannot be protected, a database can!
- CreativeCommons.org (CC) data license toolkit
Databases and Derivatives
problems with fuzzy definition
Often unclear whether copyright rules apply or not:
- Is it a database or just a data set?
- Is it a derivative of a database or not?
(e.g. model outputs)
Workaround for ambiguous cases: publish data processing routine instead of data
Source: Mimi and Eunice | CC-BY-SA Nina Paley
Standardized input processing
powered by madrat » github.com/pik-piam/madrat
retrieveData: bundle data as required by the model
readSource: read source data and (dis-)aggregate to ISO country level
calcOutput: calculate data sets as required by the model
Open Access and Transparency in FABLE
What is required?
Decisions to take
Decision
Open vs Non-Open
Copyleft vs Non-Copyleft
Recommended licenses
Contributors License Agreement (CLA)
Code distribution
Data distribution
Input data treatment
Output data license
MAgPIE (as example)
Open
Copyleft for model
Non-Copyleft for tools
copyleft: AGPL | non-copyleft: BSD2
model: yes | tools: no
GitHub
not decided
publication of data preparation routines
not specified (depends on application)
Further Reading
This presentation - slides.com/jandietrich/fable-open-access
PIK Guidelines
pik-potsdam.de/members/dietrich/GuidelinesOpenSource
pik-potsdam.de/members/dietrich/GuidelinesOpenData
Open Source
Open Data
How can FABLE ensure open access & transparency of data and models
By Jan Dietrich
How can FABLE ensure open access & transparency of data and models
- 154