Trondheim: A lot of coffee, cables, coding, ...
LAN party, TG99
Working door-to-door with ground breaking Web start-ups
Linpro made open source Web solutions
Håkon Wium Lie at Opera invented CSS
TrollTech made QT incl. KHTML aka WebKit aka Chrome/Safari/Edge
Waldemar Thranes gate 98, Oslo
Travelling and reproducibility
Trondheim → Birmingham → Manchester
As a computer science MSc student
I need to Reproduce a neural network model of a brain structure
so that I can use it in a robot that could learn sequences to navigating a maze
Paper: No code, only equations.
Implementations details missing (e.g. noise)
Can you really trust a C++ reimplementation?
def calc_GP(self, i):
sum = 0.0
for j in range(self.inputs*2):
sum += self.w[i,j] * self.STN[j]
noise = random.uniform(-0.25, 0.25)
result = sum - self.effect * self.STR[i] + noise
return self.sigmoid(result)
It is the view of this author that the computer science community has an tendency for preferring mathematics in papers even when a source code extract could provide a better explanation, be more understandable and known to be correct.
The lack of included source code leads to incomplete publications that can be hard or impossible to reproduce. This problem is, to a great extent, more present in computer science than in other sciences.
Søiland (2006)
Noy & Goble (2022):
https://doi.org/10.1145/3528574
Computer Science is still behind other disciplines in thinking about reproducibility and data sharing
2008
Needed domain name, got a new surname
http://soiland-reyes.com/
Carole Goble, Rishi, Donal
Stuart
Paolo
Mannie
Sean
Alex
Rob
Antoon
Jits
Alan
Katy
Pinar
Matt
Alasdair
Robert
Finn
Khalid
..before it became the "mega lab"
Shoaib
Can we capture and share the computational method?
Taverna (2003—2020)
Workflows combine computational tools almost like LEGO blocks
Permalink: https://s.apache.org/existing-workflow-systems
Please add new entries at the bottom. Try to include links to website and/or source code and ideally DOI for publications.
Permalink: https://s.apache.org/existing-workflow-systems
Please add new entries at the bottom. Try to include links to website and/or source code and ideally DOI for publications.
In addition to this list, actively developed free/open-source systems should be registered at https://workflows.community/systems
See also: https://github.com/pditommaso/awesome-pipeline
2014: CWL idea at Bioinformatics Open Source Conference
2025: ~14 implementations
cwlVersion: v1.0
class: Workflow
inputs:
inp: File
ex: string
outputs:
classout:
type: File
outputSource: compile/classfile
steps:
untar:
run: tar-param.cwl
in:
tarfile: inp
extractfile: ex
out: [example_out]
compile:
run: arguments.cwl
in:
src: untar/example_out
out: [classfile]
CWL is a YAML-based standard for workflows
Reusable tool definitions
Executed in Docker containers
Explicit inputs and outputs - can be rewired
Recording computational processes
Photo from https://openprovenance.org/
Workflow description
Execution provenance
Time travel:
Common Workflow Language modelled on wfdesc, while CWL provenance came later
Identified the need for multiple implementation levels
<prov:wasGeneratedBy>
<prov:entity prov:ref="ex:ent1"/>
<prov:activity prov:ref="ex:act1"/>
<prov:time>2017-10-26T21:32:52Z</prov:time>
<ex:port>p1</ex:port>
</prov:wasGeneratedBy>
wasGeneratedBy(ent1, act1,
2017-10-26T21:32:52Z, ex:port="p1")
:ent1
a prov:Entity;
prov:wasGeneratedBy :act1;
prov:generatedAtTime "2017-10-26T21:32:52Z"^^xsd:dateTime ;
ex:port "p1" .
"wasGeneratedBy": {
"ex:gen1": {
"prov:entity": "ent1",
"prov:activity": "act1",
"prov:time": "2017-10-26T21:32:52Z",
"ex:port": "p1"
},
},
{ "@context": { .. },
"@id": "ent1",
"@type": "prov:Entity",
"ex:port": "p1",
"prov:generatedAtTime": "2017-10-26T21:32:52Z",
"prov:wasGeneratedBy": {
"@id": "act1",
"@type": "prov:Activity"
}
}
PROV-N
PROV-XML
PROV-JSON
PROV-O Turtle
PROV-O JSON-LD
<Alice> <knows> <Bob> .
A triple express a relationship between two named resources.
<Alice> <knows> <Bob> .
<Alice> <knows> <Charlie> .
<Alice> <plays> <Tennis> .
<Bob> <knows> <Charlie> .
<Bob> <plays> <Football> .
<Charlie> <plays> <Tennis> .
Multiple triples with shared resources form a graph
<http://example.com/Alice> <knows> <http://example.com/Bob> .
<http://example.com/Alice> <knows> <http://example.com/Charlie> .
<http://example.com/Alice> <plays> <http://dbpedia.org/resource/Tennis> .
<http://example.com/Bob> <knows> <http://example.com/Charlie> .
<http://example.com/Bob> <plays> <http://dbpedia.org/resource/Football> .
<http://example.com/Charlie> <plays> <http://dbpedia.org/resource/Tennis> .
URLs on the Web can make global identifiers
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Leadóg"@ga .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis"@sv .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@es .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "テニス"@ja .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@cs .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "网球"@zh .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Теннис"@ru .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@pl .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis"@de .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Teniso"@eo .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tennis"@en .
<http://dbpedia.org/resource/Tennis> <http://www.w3.org/2000/01/rdf-schema#label> "Tenis"@eu .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tennis is a racket sport that is played either individually against a single opponent (singles) or between two teams of two players each (doubles). Each player uses a tennis racket that is strung with cord to strike a hollow rubber ball covered with felt over or around a net and into the opponent's court. The object of the game is to manoeuvre the ball in such a way that the opponent is not able to play a valid return. The player who is unable to return the ball validly will not gain a point, while the opposite player will. Tennis is an Olympic sport and is played at all levels of society and at all ages. The sport can be played by anyone who can hold a racket, including wheelchair users. The modern game of tennis originated in Birmingham, England, in the late 19th century as lawn tennis. It had close connections both to various field (lawn) games such as croquet and bowls as well as to the older racket sport today called real tennis. The rules of modern tennis have changed little since the 1890s. Two exceptions are that until 1961 the server had to keep one foot on the ground at all times, and the adoption of the tiebreak in the 1970s. A recent addition to professional tennis has been the adoption of electronic review technology coupled with a point-challenge system, which allows a player to contest the line call of a point, a system known as Hawk-Eye. Tennis is played by millions of recreational players and is a popular worldwide spectator sport. The four Grand Slam tournaments (also referred to as the majors) are especially popular: the Australian Open, played on hardcourts; the French Open, played on red clay courts; Wimbledon, played on grass courts; and the US Open, also played on hardcourts."@en .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Ténis (português europeu) ou tênis (português brasileiro) é um esporte de origem inglesa, disputado em quadras geralmente abertas e de superfícies sintéticas, cimento, saibro ou relva. Participam no jogo dois oponentes ou duas duplas de oponentes, podendo ser mistas (homens e mulheres) ou não. A quadra é dividida em duas meia-quadras por uma rede, e o objetivo do jogo é rebater uma pequena bola para além da rede (para a meia-quadra adversária) com ajuda de uma raquete. Para marcar um ponto é preciso que a bola toque no solo em qualquer parte dentro da quadra adversária incluindo as linhas que demarcam o campo do oponente, fazendo com que o adversário não consiga devolver a bola antes do segundo toque, ou que a devolva para fora dos limites da outra meia-quadra. O desporto assim possui aspectos de ataque (rebater bem a bola, dificultando a devolução do adversário) e defesa (bom posicionamento em quadra, antecipação do lance adversário etc). O tênis possui um intricado sistema de pontuação, que subdivide o jogo em games/jogospt e sets/partidas pt. Grosso modo, um game é um conjunto de pontos (15-30-40-game) e um set é um conjunto de games (1-2-3-4-5-set). Cada game tem um jogador responsável por recolocar a bola em jogo: fazer o serviço ou sacar. No tênis de competição, é comum que o jogador que serve fature o game, já que tem a vantagem do ataque e dita o ritmo do jogo. Desta forma, uma das estratégias de jogo é tentar inverter esta vantagem durante a troca de bola ou durante a defesa fazer com que o adversário, através de erros, perca os games em que está sacando. Ganha o jogobr/encontropt aquele que atingir um número de sets pré-definido — geralmente 2 sets, sendo de 3 sets para os grandes torneios masculinos."@pt .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Il tennis, il cui nome può essere tradotto in italiano con un termine arcaico non più in uso di “schiaffo palla” e storicamente noto col nome di \"pallacorda\", è uno sport della racchetta che vede opporsi due o quattro giocatori (due contro due, incontro di doppio) in un campo da gioco diviso in due metà da una rete alta 0,914 m al centro e 1,07 m ai lati. È considerato uno sport completo e armonico (sebbene fisicamente in parte asimmetrico perché un braccio viene escluso dal movimento del corpo) in quanto sono richieste al contempo ottime capacità fisiche (coordinazione, velocità e resistenza sulla distanza) e mentali (tecnica nei colpi, tattica, visione di gioco, riflessi e intuito)."@it .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tenis (angl. tennis < angl. tenes, tenetz < fr. tenez! = berte, držte! (imperativ pl. slova držet)), označovaný také jako bílý sport, je míčová hra pro 2 nebo 4 hráče. Varianta se 2 hráči se nazývá , varianta se 4 hráči pak čtyřhra. Hraje se také smíšená čtyřhra, při které v každé dvojici hraje jedna žena a jeden muž. Tenisový míč Soupeři stojí proti sobě, na obdélníkovém hřišti (tenisovém dvorci) a pokoušejí se odrazit tenisový míček tenisovou raketou do pole tak, aby jej soupeř nemohl vrátit nebo aby se s míčkem trefil vedle tenisového dvorce (do autu)."@cs .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "كرة المضرب (أو التنس الأرضي في الترجمات الحرفية) نوع من رياضات الراح والتي يتنافس فيها لاعبان في مباريات فردية، أو فريقان مكونان من لاعبين في مباريات زوجية. كلٌ منهم يحمل مضربا ليستخدمه في ضرب الكرة فوق الشبكة نحو منطقة الخصم. وعدد الضربات ليس محددا، إنما النتيجة تحدد الرابح. كرة المضرب أو التنس هو رياضة أولمبية تُلعب على جميع مستويات المجتمع وفئاته ولجميع الأعمار، يمكن لأي شخص أن يمسك بمضرب ويضرب الكرة بما في ذلك ذوو الاحتياجات الخاصة ومستخدمو الكراسي المتحركة. نشأت لعبة التنس الحديثة في برمنغهام بإنجلترا في أواخر القرن التاسع عشر كرياضة تلعب على العشب، لذا فهي وثيقة الصلة بكل من الألعاب الميدانية كالكريكت والبولينغ، إضافة إلى رياضة المضرب القديمة. تغيرت قواعد كرة المضرب الحديثة قليلاً منذ تسعينيات القرن التاسع عشر، باستثناء قاعدتين: من 1908 إلى 1961 كان على اللاعب أن يضع قدمًا واحدة على الأرض في جميع أوقات المباراة، وفي السبعينيات اعتمد الشوط الفاصل (تاي بريك). ومن الإضافات الحديثة إلى اللعبة على المستوى الاحترافي اعتماد تقنية المراجعة الإلكترونية (عين الصقر) إلى جانب نظام تحدي النقاط . يلعب كرة المضرب الملايين من لاعبين، وهي رياضة مشهورة ومنتشرة في جميع أنحاء العالم، وتحظى البطولات الكبرى لكرة المضرب بمتابعة جماهيرية وإعلامية كبيرة، تضعها في مرتبة متقدمة بين الأحداث الرياضية الأكثر أهمية ومتابعة في العالم سنويا. وأهم هذه البطولات: رولان غاروس أو بطولة فرنسا المفتوحة على الملاعب الرملية، بطولة أستراليا المفتوحة على الملاعب الصلبة، بطولة ويمبلدون على الملاعب العشبية، وبطولة الولايات المتحدة المفتوحة على الملاعب الصلبة."@ar .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Teniso estas rakedsporto, kutime ludata de du homoj, sed ankaŭ foje de kvar homoj. Specifa para ludo estas la ambaŭseksa para ludo, kiam ĉiu teamo konsistas el unu viro kaj unu virino. Estas principe tri kategorioj de tenisejo, nome la gazona, malmola kaj batita terenoj. Dumlude, la ludantoj ĉiam devas revenigi la pilkon per siaj rakedoj al kontraŭa kampo, provante samtempe malebligi tion al oponanto(j)."@eo .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tenisa Ingalaterran eta XIX. mendean sortutako kirola da, pilota batekin eta bi edo lau jokalarien artean jokatua. Tenis jokalariei tenislari ere deritze. Mende haren amaieran arautu zuten: Sare batek banatzen duen zelai batean jokatzen da, jokalariak erraketez baliatzen dira pilota sarearen gainetik aurkariaren zelaira botatzeko, partidek ez dute iraupen jakinik (beharrezko puntuazioa lortzean bukatzen da), prestakuntza fisikoa eta teknika behar dira tenisaren ezaugarriak eta arautegia betetzeko."@eu .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/ontology/abstract> "Tenis atau bola tampel adalah olahraga yang biasanya dimainkan antara dua pemain atau antara dua pasangan masing-masing dua pemain. Setiap pemain menggunakan raket untuk memukul bola karet. Tujuan permainan adalah memainkan bola dengan cara tertentu sehingga pemain lawan tidak dapat mengembalikan bola tersebut. Tenis adalah salah satu cabang olahraga Olimpiade dan dimainkan pada semua tingkat masyarakat di segala usia. Olahraga ini dapat dimainkan oleh siapa saja, termasuk orang-orang yang menggunakan kursi roda. Permainan tenis modern berasal dari Birmingham, Inggris pada akhir abad ke-19 sebagai \"tenis lapangan rumput\". Peraturan tenis berubah sedikit sejak 1890-an. Dua perubahan kecil adalah sejak 1908 hingga 1961 pemain yang melakukan service (pukulan pertama) harus menjaga salah satu kakinya tetap di tanah hingga service berpindah dan adopsi sistem tie-break pada 1970-an. Tambahan terakhir yang diterapkan pada tenis profesional adalah teknologi tinjauan ulang elektronik. Tenis dimainkan oleh jutaan orang sebagai olahraga rekreasi dan juga merupakan olahraga tontontan populer di seluruh dunia.Empat kejuaraan tenis terkemuka adalah Australia Terbuka yang dimainkan di lapangan keras, Prancis Terbuka yang dimainkan di lapangan tanah liat, Wimbledon yang dimainkan di lapangan rumput, dan AS Terbuka yang dimainkan juga di lapangan keras."@in .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/venue> <http://dbpedia.org/resource/Tennis_court> .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/equipment> <http://dbpedia.org/resource/Net_sport> .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/equipment> <http://dbpedia.org/resource/Tennis_ball> .
<http://dbpedia.org/resource/Tennis> <http://dbpedia.org/property/equipment> <http://dbpedia.org/resource/Racket_(sports_equipment)> .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Bola tenis adalah bola yang dirancang untuk olahraga tenis. Bola tenis berwarna kuning fluoresens pada acara olahraga besar, tetapi dalam permainan rekreasi bisa berwarna apa saja. Bola tenis ditutupi dengan serat berserat yang memodifikasi sifat aerodinamisnya, dan masing-masing memiliki lekukan lonjong melengkung putih yang menutupinya. Bola tenis diisi dengan udara dan permukaannya ditumbuhi oleh kompon karet berserat seragam. Keterlambatan terasa pemisahan aliran di lapisan batas yang mengurangi hambatan aerodinamik dan memberi bola sifat terbang yang lebih baik."@in .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Tenisový míč či tenisák je vyrobený z pevné gumové duše, která je obalena meltonem a nahuštěná vzduchem. Jen míče značky Tretorn jsou zcela vyplněny pórovitou hmotou. Jedná se o míč určený pro tenis. Povrch je jednobarevný, dříve byl bílý či oranžový. Existují však i míčky vícebarevné, například zeleno-růžové. Barva dnešního míčku závisí na osvětlení a člověku, který se na míček dívá. Někteří ho vidí neonově žlutý, jiní světle zelený a další zeleno-žlutý."@cs .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "كرة التنس هي كرة مصممة لممارسة رياضة التنس. كرات التنس لونها صفراء مُتفتحة في الأحداث الرياضية الكبرى، ولكن في اللعب الترفيهي يمكن أن يكون أي لون تقريبًا. وتغطي كرات التنس بواسطة لباد ليفي الذي يعدل من مرور الكرات في الهواء. تمتلئ كرات التنس بالهواء وتظهر على السطح بواسطة مركب مطاطي مغطى باللباد. يؤخر فصل التدفق في الطبقة الحدودية مما يقلل من السحب الديناميكي الهوائي ويمنح الكرة خصائص طيران أفضل. غالبًا ما تحتوي الكرات على رقم بالإضافة إلى اسم العلامة التجارية. وهذا يساعد على تمييز مجموعة من الكرات عن مجموعة أخرى من نفس العلامة التجارية في ملعب مجاور."@ar .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "テニスボール(英: tennis ball)とは、狭義には、テニス(狭義のテニス、すなわち、硬式テニスともいうローンテニス)用に作られたボール。厚いに毛羽のあるフェルトを被せた球の形で、フェルトによる特有の反発性をもつ。art-antsを呼称する言葉でもある。 また、広義の「テニス」にはソフトテニス(en. 軟式テニス、軟式庭球)も含まれることから、ソフトテニスのボールも含む。"@ja .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Der Tennisball ist das Spielgerät, mit dem Tennis gespielt wird. Er wird mit Hilfe der Tennisschläger gespielt."@de .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Een tennisbal is een bal die wordt gebruikt bij tennis. Er wordt tegenaan geslagen met een tennisracket en met de bal worden punten gescoord. Er zijn verschillende soorten tennisballen voor verschillende baansoorten en weertypen. De meest voorkomende kleur is fluorescerend geel met een lichtgekleurde kromme streep."@nl .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Una pelota de tenis es un objeto esférico de pequeño diámetro (especialmente en comparación con la medida de otras pelotas) que se utiliza en la práctica del deporte homónimo. Para impulsarla se utiliza una raqueta. Un profesional puede hacer que llegue a coger una velocidad de 263 km/h. Se recubre con fieltro (en general de color amarillo fluorescente con rayas de separación blancas), puesto que es el «color más visible por el ojo humano» y permite una trayectoria más correcta, en comparación con una totalmente lisa. De media, una pelota de tenis pesa de 56 a 58 gramos."@es .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Tennisboll är en typ av boll som används inom sporten tennis. Officiella bollar skall vara vita eller gula (träningsbollar kan ha andra färger) och ha en diameter mellan 65,4–73,0 mm (2,58–2,87 tum). De skall väga mellan 56,0 och 59,4 gram. Bollarna ska studsa 135–147 cm efter att ha släppts från en höjd av 254 cm på ett betonggolv. Den moderna tennisbollen tillverkas av två likformade delar av format gummi som sammanfogats till ett ihåligt klot. Två hantelformade filtade delar av flanell eller nylon limmas utanpå bollen och ger den dess karakteristiska utseende."@sv .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "A tennis ball is a ball designed for the sport of tennis. Tennis balls are fluorescent yellow in organised competitions, but in recreational play can be virtually any color. Tennis balls are covered in a fibrous felt which modifies their aerodynamic properties, and each has a white curvilinear oval covering it."@en .
<http://dbpedia.org/resource/Tennis_ball> <http://www.w3.org/2000/01/rdf-schema#comment> "Теннисный мяч — предназначен для тенниса. Он примерно 6,7 см (2,63 дюйма) в диаметре. Теннисные мячи на крупных спортивных соревнованиях желтые, но могут быть практически любого цвета. Теннисные мячи покрыты войлоком, который модифицирует их аэродинамические свойства, и имеют белую волнистую линию, которая охватывает снаряд."@ru .
Linked Data: following the URLs to find more triples
Linked Data standards
..used by >50% of all websites?
2010–: Research Objects
Model for Research Objects
Reused multiple ontologies
Implemented by workflow systems and supporting RO services
Data (and metadata) should be:
Machine readable data & metadata
Community agreement on standards
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
To be Findable:
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
To be Accessible:
A1. (meta)data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
To be Reusable:
R1. meta(data) are richly described with a plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
The FAIR Principles have propelled the global debate about better data stewardship in data-driven and open science, and they have triggered funding bodies (e.g., UKRI, Wellcome Trust, USA NIH, EC Horizon 2020) to mandate FAIR data in their data policies.
Supervisors: Paul Groth, Carole Goble
Peter Sefton at Open Repositories 2019
Describe and package data collections, datasets, software etc. with their metadata
Platform-independent object exchange between repositories and services
Support reproducibility and analysis: link data with codes and workflows
Transfer of sensitive/large distributed datasets with persistent identifiers
Propagate provenance, citations and existing metadata
Reuse existing standards, but hide their complexity
Vocabulary: A set of types with properties
schema.org is used by almost half of all popular websites -- mainly to improve search engine ranking?
Explaining by example for developers: Linked Data by stealth
Executions of workflows can be
tracked using a profile of RO-Crate
Interoperable reproducibility
of computational methods
Developed as a community task force, 7+ implementations
Multiple implementation levels
Green & Ritchie (2023)
Trusted Research Environments (TRE) ensure safe access to sensitive data
TREs make a locked down computational space (e.g. a VM) and enforce disclosure control before outputs leaves the premises.
Federated Analytics with Five Safes RO-Crate
Federated Analytics with Five Safes RO-Crate
RO-Crate used a common vocabulary, so it was easy to extend for human actions
By using Web technologies and explaining by example,
Linked Data technology can be used by most developers
Profiles as a lightweight approach to ontologies, a "just enough" formalisation of usage patterns and domain vocabularies
Linked Data as basis:
formalisations, extension mechanisms, graph queries, validation, integration/mapping, ++
(only use these if you need to!)
The RO-Crate team is:
RO-Crate team
{
"@context": "https://w3id.org/ro/crate/1.1/context",
"@type": "CreativeWork",
"@id": "#talk",
"name": "Challenging Linked Data Practices for FAIR Research Objects and Computational Workflows",
"description": "A presentation discussing the role of Linked Data and RO-Crate in the era of LLMs, emphasizing their necessity for FAIR computational workflows.",
"author": {
"@type": "Person",
"name": "[Your Name]",
"affiliation": "[Your Institution]",
"orcid": "[Your ORCID]"
},
"datePublished": "2025-02-19",
"keywords": ["RO-Crate", "Linked Data", "FAIR", "Computational Workflows"],
"subjectOf": {
"@type": "Event",
"name": "[Conference or Workshop Name]",
"startDate": "[Event Date]",
"location": {
"@type": "Place",
"name": "[Event Location]"
}
},
"relatedMaterial": [
{
"@type": "CreativeWork",
"name": "RO-Crate Specification",
"url": "https://www.researchobject.org/ro-crate/"
}
]
}
{
"@context": [
"https://schema.org/",
"https://w3id.org/ro/crate/1.1/context"
],
"@graph": [
{
"@id": "ro-crate-metadata.json",
"@type": "CreativeWork",
"conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
"about": {"@id": "./"},
"description": "RO-Crate Metadata for the presentation: Challenging Linked Data Practices for FAIR Research Objects and Computational Workflows"
},
{
"@id": "./",
"@type": "Dataset",
"name": "Presentation RO-Crate",
"hasPart": [
{"@id": "presentation.pdf"},
{"@id": "#presentation"}
],
"license": {"@id": "https://creativecommons.org/licenses/by/4.0/"},
"publisher": "FAIR Symposium 2023",
"datePublished": "2023-10-15"
},
{
"@id": "#presentation",
"@type": ["PresentationDigitalDocument", "File"],
"name": "Challenging Linked Data Practices for FAIR Research Objects and Computational Workflows",
"description": "A critical analysis of Linked Data practices in FAIR workflows and Research Objects.",
"author": {"@id": "#alice"},
"dateCreated": "2023-10-10",
"keywords": ["FAIR", "Linked Data", "RO-Crate", "Workflows"],
"about": [
{"@id": "https://doi.org/10.0000/example"},
{"@id": "#event"}
]
},
{
"@id": "#alice",
"@type": "Person",
"name": "Alice Researcher",
"affiliation": "FAIR Institute",
"identifier": "https://orcid.org/0000-0000-0000-0000"
},
{
"@id": "#event",
"@type": "Event",
"name": "FAIR Symposium 2023",
"startDate": "2023-10-15",
"location": "Virtual"
},
{
"@id": "presentation.pdf",
"@type": "File",
"name": "Slide deck (PDF)",
"encodingFormat": "application/pdf",
"contentSize": "2.5 MB"
}
]
}
"With the rise of LLMs, why do we still need structured data formats like RO-Crate and Linked Data?"
Reliability & Consistency:
LLMs generate probabilistic outputs, which can be inconsistent or hallucinate metadata. Structured data (e.g., JSON-LD in RO-Crate) ensures machine-actionable, predictable, and domain-specific metadata, critical for reproducibility and FAIR compliance.
Interoperability:
LLMs lack built-in semantics for cross-system data exchange. Formats like RO-Crate use standards (e.g., schema.org, PROV-O) to enable automated integration across repositories, workflow engines, and research infrastructures.
Context Preservation:
LLMs may strip contextual relationships (e.g., authorship, funding, dependencies). RO-Crate explicitly captures provenance, licensing, and dependencies as structured graphs, preserving research context.
Computational Actionability:
Structured metadata allows tools (e.g., workflow runners, validators) to programmatically resolve dependencies, verify licenses, or trace data lineage without human intervention—LLMs alone cannot do this.
Synergy with LLMs:
LLMs can enhance RO-Crate creation (e.g., auto-annotating files) or querying (natural language to SPARQL). However, they rely on structured data as a ground-truth source for training and reasoning.
"With the rise of LLMs, why do we still need structured data formats like RO-Crate and Linked Data?"
Reliability & Consistency:
LLMs generate probabilistic outputs, which can be inconsistent or hallucinate metadata. Structured data (e.g., JSON-LD in RO-Crate) ensures machine-actionable, predictable, and domain-specific metadata, critical for reproducibility and FAIR compliance.
Interoperability:
LLMs lack built-in semantics for cross-system data exchange. Formats like RO-Crate use standards (e.g., schema.org, PROV-O) to enable automated integration across repositories, workflow engines, and research infrastructures.
Context Preservation:
LLMs may strip contextual relationships (e.g., authorship, funding, dependencies). RO-Crate explicitly captures provenance, licensing, and dependencies as structured graphs, preserving research context.
Computational Actionability:
Structured metadata allows tools (e.g., workflow runners, validators) to programmatically resolve dependencies, verify licenses, or trace data lineage without human intervention—LLMs alone cannot do this.
Synergy with LLMs:
LLMs can enhance RO-Crate creation (e.g., auto-annotating files) or querying (natural language to SPARQL). However, they rely on structured data as a ground-truth source for training and reasoning.
"With the rise of LLMs, why do we still need structured data formats like RO-Crate and Linked Data?"
Reliability & Consistency:
LLMs generate probabilistic outputs, which can be inconsistent or hallucinate metadata. Structured data (e.g., JSON-LD in RO-Crate) ensures machine-actionable, predictable, and domain-specific metadata, critical for reproducibility and FAIR compliance.
Interoperability:
LLMs lack built-in semantics for cross-system data exchange. Formats like RO-Crate use standards (e.g., schema.org, PROV-O) to enable automated integration across repositories, workflow engines, and research infrastructures.
Context Preservation:
LLMs may strip contextual relationships (e.g., authorship, funding, dependencies). RO-Crate explicitly captures provenance, licensing, and dependencies as structured graphs, preserving research context.
Computational Actionability:
Structured metadata allows tools (e.g., workflow runners, validators) to programmatically resolve dependencies, verify licenses, or trace data lineage without human intervention—LLMs alone cannot do this.
Synergy with LLMs:
LLMs can enhance RO-Crate creation (e.g., auto-annotating files) or querying (natural language to SPARQL). However, they rely on structured data as a ground-truth source for training and reasoning.
What about the below JSON-LD of an RO-Crate, can you summarise it for me?
(..)
Title: "BY-COVID - WP5 - Baseline Use Case: SARS-CoV-2 vaccine effectiveness assessment - Local outputs and comparative analysis"
Description:
The dataset contains local outputs and comparative analysis from a federated study across multiple European regions (Aragon, Spain; Brussels and Wallonia, Belgium; Finland). It evaluates the effectiveness of SARS-CoV-2 vaccines in preventing infections using observational, retrospective longitudinal data.
Research Question:
"How effective have the SARS-CoV-2 vaccination programmes been in preventing SARS-CoV-2 infections?"
Study Design:
Emulates a hypothetical Target Trial (TT) using real-world data, with a federated analysis approach to ensure data privacy and security.
Conforms To:
The RO-Crate adheres to multiple standards, including:
License: CC-BY-4.0
Publisher: Zenodo
Temporal Coverage: March 1, 2020 – December 20, 2022
Spatial Coverage:
Aragon, Spain (GeoNames 3336899)
Brussels-Capital, Belgium (GeoNames 2800867)
Wallonia, Belgium (GeoNames 3337387)
Finland (GeoNames 660013)
Keywords: COVID-19, Comparative effectiveness, International comparison, Causal inference, SARS-CoV-2
Funder: European Commission
Main Workflow:
Title: "BY-COVID WP5 T5.2 Baseline Use Case"
Programming Language: Quarto Markdown
Inputs: Vaccine effectiveness data (CSV files)
Outputs: Interactive HTML reports and Excel files
Containerization: Docker and Singularity containers were used for execution in secure environments.
Regional Workflows:
Aragon, Spain: Executed manually in a Docker container due to secure environment constraints.
Brussels and Wallonia, Belgium: Executed in a manually configured R environment.
Finland: Executed in a Singularity container.
Comparative Analysis:
Aggregated results from regional workflows into an interactive HTML report using a Quarto script.
Interactive HTML Reports:
Each region produced 6 interactive reports covering:
Data Quality Assessment (DQA)
Validation
Imputation
Matching
Descriptive Analysis
Survival Analysis
Excel Files:
Summarized results for each region (e.g., results-survival-analysis-ESP.xlsx
).
Interactive HTML Report:
Aggregated results from all regions into a single interactive report.
Quarto Script:
Used to generate the comparative analysis report.
R Environment:
renv/
: R environment with activation scripts.
renv.lock
: Lockfile for reproducible package versions.
.Rprofile
: R profile for environment configuration.
Provenance:
Detailed execution logs for each workflow run, including:
Container Images: Docker and Singularity images used for execution.
Execution Times: Start and end times for each run.
Agents: Researchers who executed the workflows (e.g., Marjan Meurisse, Francisco Estupiñán-Romero).
Action Status: All workflows were completed successfully.
Linked Data:
The RO-Crate uses JSON-LD to link entities (e.g., workflows, datasets, authors, organizations) using standardized vocabularies like schema.org, EDAM, and PROV-O.
Interoperability:
The crate is designed to be interoperable with tools like WorkflowHub and Zenodo, ensuring FAIR (Findable, Accessible, Interoperable, Reusable) compliance.
FAIR Compliance:
The RO-Crate ensures that the research outputs are FAIR by providing structured metadata, provenance, and reusable workflows.
Reproducibility:
The use of containers (Docker, Singularity) and reproducible environments (R, renv) ensures that the analysis can be rerun in the future.
Privacy and Security:
The federated approach and secure processing environments (SPEs) ensure that sensitive health data is not exposed.
This RO-Crate exemplifies how structured data formats like JSON-LD and standards like RO-Crate enhance reproducibility, interoperability, and FAIRness in research. While LLMs can assist in generating or querying metadata, structured data remains essential for ensuring machine-actionability, provenance tracking, and long-term preservation of research outputs
Main Workflow:
Title: "BY-COVID WP5 T5.2 Baseline Use Case"
Programming Language: Quarto Markdown
Inputs: Vaccine effectiveness data (CSV files)
Outputs: Interactive HTML reports and Excel files
Containerization: Docker and Singularity containers were used for execution in secure environments.
Regional Workflows:
Aragon, Spain: Executed manually in a Docker container due to secure environment constraints.
Brussels and Wallonia, Belgium: Executed in a manually configured R environment.
Finland: Executed in a Singularity container.
Comparative Analysis:
Aggregated results from regional workflows into an interactive HTML report using a Quarto script.
Interactive HTML Reports:
Each region produced 6 interactive reports covering:
Data Quality Assessment (DQA)
Validation
Imputation
Matching
Descriptive Analysis
Survival Analysis
Excel Files:
Summarized results for each region (e.g., results-survival-analysis-ESP.xlsx
).
Interactive HTML Report:
Aggregated results from all regions into a single interactive report.
Quarto Script:
Used to generate the comparative analysis report.
R Environment:
renv/
: R environment with activation scripts.
renv.lock
: Lockfile for reproducible package versions.
.Rprofile
: R profile for environment configuration.
Provenance:
Detailed execution logs for each workflow run, including:
Container Images: Docker and Singularity images used for execution.
Execution Times: Start and end times for each run.
Agents: Researchers who executed the workflows (e.g., Marjan Meurisse, Francisco Estupiñán-Romero).
Action Status: All workflows were completed successfully.
Linked Data:
The RO-Crate uses JSON-LD to link entities (e.g., workflows, datasets, authors, organizations) using standardized vocabularies like schema.org, EDAM, and PROV-O.
Interoperability:
The crate is designed to be interoperable with tools like WorkflowHub and Zenodo, ensuring FAIR (Findable, Accessible, Interoperable, Reusable) compliance.
FAIR Compliance:
The RO-Crate ensures that the research outputs are FAIR by providing structured metadata, provenance, and reusable workflows.
Reproducibility:
The use of containers (Docker, Singularity) and reproducible environments (R, renv) ensures that the analysis can be rerun in the future.
Privacy and Security:
The federated approach and secure processing environments (SPEs) ensure that sensitive health data is not exposed.
This RO-Crate exemplifies how structured data formats like JSON-LD and standards like RO-Crate enhance reproducibility, interoperability, and FAIRness in research. While LLMs can assist in generating or querying metadata, structured data remains essential for ensuring machine-actionability, provenance tracking, and long-term preservation of research outputs