CERN Analysis Preservation

Marco Neumann (@crepererum)
Tech Student

The Whale, the Frame, the Data and (some) Preservation

  • data for plots ➡ sometimes @ CDS
  • raw data ➡ selected ones @ opendata
  • everything between:
    • software ➡ scattered somewhere
    • parameters ➡ in some DBs of the experiments
    • knowledge/logbook ➡ gone?

Meet CERN Analysis Preservation!

The Whale

master
main-2.1
overlay-data

🖳

git clone --depth=1 \
    https://github.com/inveniosoftware/invenio.git
cd invenio
docker-compose -f docker-compose-dev.yml up

The Frame

=

Old Invenio

Module Seperation

New Invenio

The Data

002054343 001__ 2054343
002054343 003__ SzGeCERN
002054343 005__ 20150922075343.0
002054343 035__ $$9arXiv$$aoai:arXiv.org:1509.05996$$d2015-09-22$$h2015-09-22T05:19:11Z$$marXiv$$tfalse$$uhttp://export.arxiv.org/oai2
002054343 037__ $$aarXiv:1509.05996
002054343 041__ $$aeng
002054343 100__ $$aFedorov, Aleksey
002054343 245__ $$aTowards events recognition in a distributed fiber-optic sensor system: Kolmogorov-Zurbenko filtering
002054343 269__ $$c20 Sep 2015
002054343 260__ $$c2015
002054343 300__ $$a4 p
002054343 500__ $$aComments: 4 pages, 4 figures
002054343 520__ $$aThe paper is about de-noising procedures aimed on events recognition in signals from a distributed fiber-optic vibration sensor system based on the phase-sensitive optical time-domain reflectometry. We report experimental results on recognition of several classes of events in a seismic background. A de-noising procedure uses the framework of the time-series analysis and Kolmogorov-Zurbenko filtering. We demonstrate that this approach allows revealing signatures of several classes of events.
002054343 540__ $$barXiv$$uhttp://arxiv.org/licenses/nonexclusive-distrib/1.0/
002054343 595__ $$aLANL EDS
002054343 595__ $$aTitle has not been found in the knowledge base. Please add "INT J OPEN INFORM TECHN" 
002054343 65017 $$2arXiv$$aOther Fields of Physics
002054343 65027 $$2arXiv$$aOther Fields of Physics
002054343 65027 $$2arXiv$$bDetectors and Experimental Techniques
002054343 695__ $$9LANL EDS$$aphysics.optics
002054343 695__ $$9LANL EDS$$aphysics.data-an
002054343 695__ $$9LANL EDS$$aphysics.ins-det
002054343 690C_ $$aARTICLE
002054343 700__ $$aAnufriev, Maxim
002054343 700__ $$aZhirnov, Andrey
002054343 700__ $$aNesterov, Evgeniy
002054343 700__ $$aNamiot, Dmitry
002054343 700__ $$aPnev, Alexey
002054343 700__ $$aKarasik, Valery
002054343 773__ $$c19$$oInt. J. Open Inform. Techn. 3, 19 (2015)$$pInt. J. Open Inform. Techn.$$v3$$y2015
002054343 8564_ $$uhttp://arxiv.org/pdf/1509.05996.pdf$$yPreprint
002054343 916__ $$sn$$w201538
002054343 960__ $$a13
002054343 980__ $$aArticle

Metadata

Data

describes

is less important than

is more important than

IS

{
  "experiment": "CMS",
  "analysis_title": "CMS-ANA-2012-049",
  "creator":[
     {"givenname": "John",
      "family_name": "Ellis",
      "identifiers": [
           {"identifier": "INSPIRE-00146525",
            "scheme": "INSPIRE"},
           {"identifier": "J.R.Ellis.1",
            "scheme": "BAI"}],
      "email": "john.ellis@cern.ch"}
      ],
   "detector_final_state": 
[
      {"physics_object": "electron",
       "count": "1",
       "definition": "https://twiki.cern.ch/cms/tight-electron",
       "cuts": [{"eta": "ECAL",
		 "pT": ">20 GeV"}],
       "transverse_energy": ">20 GeV",
       "charge": "0",
       "trigger": [
	    {"name": "HLT_T1_test_blah",
	     "run_period": [{
		"start_run": "160501",
		"end_run": "160520"}],
	     "efficiency": "https://twiki.cern.ch/cms/HLT_T1_test_blah/efficiency_measurement"
	   }],
       "veto": [
           {"physics_object": "electron",	
            "definition": "https://twiki.cern.ch/cms/loose-electron",
            "cuts": [{"eta": "ECAL",
		      "pT": "> 20 GeV"}]
    }
]
}],
  "primary_dataset":[
      {"@type": "dcat:Dataset",
       "title": "/Mu/Run2010B-Apr21ReReco-v1/AOD",
       "description": "Mu primary dataset in AOD format from RunB of 2010",
       "licence": "CC0 waiver",
       "persistent_identifiers": [{
           "identifier": "10.7483/OPENDATA.CMS.B8MR.C4A2",
           "scheme": "DOI"}], 
       "issued": "2011-04-26 11:32:43",
       "modified": "2011-05-02 21:22:30",
       "available": 2014,
       "run_number": 146242,
       "dataset_id": 1853590,
       "type": "data",	
       "nevents": "32376291",
       "nlumis": 40485,
       "nfiles": 2979,
       "nblocks": 63,
       "extend": 3208262517610
}],
  "MC_dataset":[
      {"@type": "dcat:Dataset",
       "title": "/ZGToLLG_8TeV-madgraph/Summer12_DR53X-PU_S10_START53_V7A-v1/AODSIM",
       "description": "/ZGToLLG_8TeV-madgraph/Summer12_DR53X-PU_S10_START53_V7A-v1/AODSIM Monte Carlo simulation data from 2012",
       "licence": "CC0 waiver",
       "persistent_identifiers": [{
           "identifier": "10.7483/OPENDATA.CMS.B9MR.C4A2",
           "scheme": "DOI"}],
       "issued": "2012-09-08 16:16:28",     
       "modified": "2012-09-13 11:20:50",
       "available": 2016,
       "run_number": 1,
       "dataset_id": 5877905,
       "type": "mc",	
       "nevents": "6588161",
       "nlumis": 43002,
       "nfiles": 568,
       "nblocks": 3,
       "extend": 2173079702860
}],
  "keyword": "measurement",
  "AOD_processing": [{
        "input_data": [{
		"filename": "/Mu/Run2010B-Apr21ReReco-v1/AOD",
		"url": "http//:whatever.com"}],
        "os": [{
		"name": "SLC",
		"version": "5.0"}],
	"software": [{
		"name": "CMSSW",
		"version": "5_2_5",
		"global_tag": "FT_R_42_V10A::All"}],
	"user_code": [{
		"url": "https://github.com/cernopendata/opendata.cern.ch",
		"tag": "no tag"}],
	"run_instructions": [{"type": "readme.rst",
			     "url": "https://github.com/calderona/WW8TeV/blob/master/README.md"}],	
	"output_data": "root://eospublic.cern.ch//eos/opendata/cms/Run2010B/MuOnia/AOD/Apr21ReReco-v1/0000/04E3EC08-B077-E011-ACBD-00215E21DA98.root",
	"comments": "whatever, blah, blurb"}],
  "documentation": [{
	"CADI_ID": "AN2010/264",
	"url": "http://cern.ch/cms/cadi/an2010/264",
	"keyword": "analysis",
	"comment": "I don't know what to write"}], 
  "internal_discussion": [{
		"url": "https://hypernews.cern.ch/HyperNews/CMS/get/SMP-14-016.html"}],
  "presentations": [{
		"url": "https://indico.cern.ch/event/352113/contribution/0/material/slides/0.pdf"}],
  "publication": [{
		"persistent_identifiers": [
			{"identifier": "10.1103/PhysRevLett.111.196401",
			 "scheme": "DOI"},
			{"identifier": "1309.2778",
			 "scheme": "arXiv number"}],
		"journal_title": "Phys. Rev. Lett.",
		"journal_year": 2013,
		"journal_volume": 111,
		"journal_issue": 19,
		"journal_page": 196401}]	
}

Schema Registry

available as Invenio-JSONSchemas

auto-generated forms:

drop-downs:

(basic) validation:

search:

Record Editor

own high-performance implementation

validation:

type handling:

list ordering:

+ keyboard shortcuts, auto-collaps,
(basic) patch generation

10x faster than JSON Editor

Analysis Preservation

includes A+ TLS setup 😎

OAuth login

CERN branding

shiny vector logos

strip down

thanks to CDS 😉

CMS Statistics Questionnaire

Logo?

* not my design

The small parts

security improvements

  • safe serialization,
  • TLS support,
  • read-only code access,
  • proper RNG,
  • check session before asking Redis

frontend work

  • SVG logos
  • UX improvements
  • better look
  • small bug fixes

documentation

  • Docker on nearly every setup and use case
  • TLS config
  • binding vs URL

misc

  • unbreak master with Leo
  • faster tests
  • a lot of bug fixes
  • install coffee machine
  • operate jukebox

?

Sathorn under construction, CC BY-SA 2.0, 2015, m-louis, https://www.flickr.com/photos/m-louis/18849197524/

Mueller House Condominiums Framing, CC BY-SA 2.0, 2011, Garreth Wilcock, https://www.flickr.com/photos/gjmj/6334296225/

Houses all in a row, CC BY 2.0, 2007, karol m, https://www.flickr.com/photos/byrdiegyrl/1097366731/

All icons, logos, plots, graphics and trademarks are the property of their respective owners.

Thank you!