NAIA

Ntuples for AMS-Italy Analysis

Introduction to the framework and data-format

Motivations

Resource optimization

With multiple groups producing each their own set of ntuples, lots of data is replicated on disk, which results in a waste of resources.

Also, groups are competing for computing resources for ntuple production.

1.

2.

Code exchange

Using the same data makes it easier to exchange selections and algorithms/procedures.

3.

Reproducibility / readability

Most often custom data formats are produced in a custom way, with a custom processing.

Additionally, in many cases, only the code owner can easily understand what's going in the analysis code.

# MOTIVATION

Driving principles

# PRINCIPLES

Don't throw anything out
- This means processing and saving all the events from the original AMS-Root files
Don't require network access
- All the needed data should be inside NAIA files (e.g. no online access to RTI csv files on cvmfs)
Try to cover at least 90% of use-cases
- Variable list comes from an internal survey including every analysis group
- For special kind of analyses needing specialized variables, we plan to support user-defined Tree-friending

Driving principles

# PRINCIPLES

Don't read what you don't need
- Only perform I/O reads when variables are accessed. Allow to skip uninteresting events before branch reading even occurs.
Easy to understand
- Code should be readable and expressive.
- Variable name and usage should make clear what the intention of the programmer is, at least to an intuitive level.
- Function names should be descriptive and hint at what the result of the function is. 
Easy to use
- Automatic installation for local development. CVMFS binary releases for usage on clusters.

Getting started

# STARTING

Requirements:

A C++ compiler with full C++14 support
(tested with GCC 9.3.0 and higher)
CMake version 3.13 or higher
ROOT version 6.22 or higher compiled with C++14 support
(suggested 6.26/02)

This mainly applies if you want to install NAIA on your personal machine. For distributed use (CNAF / CERN) all requirements and NAIA binaries are distributed via CVMFS

/cvmfs/ams.cern.ch/Offline/amsitaly/public/install/x86_64-centos7-gcc9.3/naia

and the correct environment can be setup with a dedicated script

/cvmfs/ams.cern.ch/Offline/amsitaly/public/install/x86_64-centos7-gcc9.3/naia/v1.0.0/setenvs/setenv_gcc6.26_cc7.sh

Getting started

# STARTING

If you are building NAIA on your machine the installation is quite easy

# clone NAIA code 
git clone ssh://git@gitlab.cern.ch:7999/ams-italy/naia.git -b v1.0.1 # (clone via SSH)
# setup build and final install directories 
mkdir naia.build naia.install
# build NAIA
cd naia.build
cmake ../naia -DCMAKE_INSTALL_PREFIX=../naia.install
make all install

To use the NAIA ntuples your project will need:

the headers in naia.install/include
the naia.install/lib/libNAIAUtility.so library
the naia.install/lib/libNAIAContainers.so library
the naia.install/lib/libNAIAChain.so library

The NAIA data model

# DATA MODEL

Our data model starts with the NAIAChain object

This is the main way to open a NAIA rootfile, it will take care of loading all the relevant TTrees and setting up what we call the "read-on-demand" mechanism (more on this later)

Example:

// ...
#include "Chain/NAIAChain.h"

int main(int argc, char const *argv[]) {
  // Create a chain object
  NAIA::NAIAChain chain;
  // add one (or more) file to it
  chain.Add("somefile.root");
  // setup the read-on-demand mechanism // N.B: important and mandatory!
  chain.SetupBranches();
}

The NAIA data model

# DATA MODEL

Once your chain is created and ready to use, you can easily loop over all the events in the chain, with the help of the Event class

// ...
#include "Chain/NAIAChain.h"

int main(int argc, char const *argv[]) {

  NAIA::NAIAChain chain;
  chain.Add("somefile.root");
  chain.SetupBranches();

  // Event loop!
  for (Event& event : chain){
    // your analysis here :)
  }
}

(you can use the NAIAChain::GetEvent() method for index-based looping, if needed)