Jaime Arias, Pierre-Antoine Bouttier, Marcelo Forets

## What is a Hackathon?

General principle

• As in a marathon, an intensive effort...

• ...to try to solve a problem...

• Implying several people (and a competition).

In real life

• A team (and no competition)...

• ...to try to solve a problem (generally in computer science)...

• ... with a limited amount of time.

## And "our Hackathon"?

Originally, the exercice was proposed by GENCI.

Details

• Around each national or regional HPC center

• A team of students/researchers/engineers

• 2 days

• To transform/optimize a real numerical code to run it efficiently on HPC clusters

## The Grenoble Team...

Jaime Arias (Research engineer, Mistis, Inria)

Marcelo Forets (Post-doc researcher, Tempo, Verimag)

## And the subject.

• A proposition (among others) from L. Simula (Pr. at ENS Lyon)

• A code that helps to study what are the mechanisms to find an optimal income tax in the context of 2 countries playing Nash game.

Etienne Lehmann, Laurent Simula, Alain Trannoy; Tax me if you can! Optimal Nonlinear Income Tax Between Competing Governments, The Quarterly Journal of Economics, Volume 129, Issue 4

The code was written in the Mathematica language

...We were (and, in fact, also are) not specialists

of these scientific research fields!

## What were our objectives?

• To rewrite the code in a more "HPC friendly" language...

• ...With the ulterior motive to make it easier to develop and run.

• To run it on a GriCAD HPC cluster and to see how we made an incredible work in improving greatly its performances (principally run time).

Obstacles : Time, unknown (for us) scientific context, symbolic to numerical computing.

## The main lines of our work process

• Translation of the original code in Python

• Test and refactoring the new code

• Optimizing the performance (CPU time)

• Crossing the fingers (all along the process, in fact)

## Spoiler

We have produced a Python code which:

• Gives closed numerical results than the original code

• Runs on the Froggy machine (a GriCAD HPC cluster)

• Exploits multiple cores (placed on a unique node)

• Goes 20 times faster than the original code

## Focus

Now, we focus on some specific aspects of this exercise:

• What are the collaborative tools that we have used?

• What was our strategy about testing and optimizing our code?

• What have we learnt?

## Collaborative Tools for Scientists

• Write and execute code
• Nice visualization capabilities
• Real-time conversations for the team
• Revision control system
• Collaborative LaTeX edition
• Responsive (no lags, etc.)
• Comfortable for reading and reviewing code
• Enhanced text edition (eg. Markdown, Sphinx)

Some desirable features:

## Git and Gitlab

• Most widespread revision control system
• Gitlab: web interface for project management
• "Issues" tracker
• Local server is available (Univ. Grenoble Alpes)

Real-time collaborative LaTeX edition

Jupyter notebooks with chat embedded

# CoCalc

(Collaborative Calcuation in the Cloud)

Linux terminal

# Terminal

(how we actually launched our app)

• Secure authentication (SSH), can be used outside the university
• Steps:
1. Connect to CIMENT
2. Connect to the computing server (Froggy)
3. Load the required modules (module avail)
4. Run the computation (with oarsub)

## Joblib, a python library to use parallel for loops using multiprocessing

• Joblib: running Python functions as Python jobs on several cores

• Aim: to provide tools to easily achieve better performance and productivity when working with long running jobs

• Easy to use! pip installable; docs & examples easy to google

## Joblib, usage and performances

Results in a 8

cores node (Froggy)

• An "embarrasingly parallel" illustrative example:

Parallel code

Serial code

## From Mathematica to Python

Mathematica code:

• Symbolic computation
• Nested loops
• Sequential code
• Recomputation of the same values, 0 functions
• IO + plotting + computation operations mixed
• Notebook
• Lack of name convetion

## Python Package

Python code:

• Vectorized code (numpy)
• Parallel code (joblib)
• Parametric
• Functions and modules
• Python 2 and 3
• Micro testing (variable's value)

## Performance

# cores Mathematica Python
1 ~107 s ~14 s
4 - ~6 s

## What have we learnt?

Short answer: a lot of things.

• Collaborative tools and real-time uses (CoCalc, git, python notebooks, markdown/latex editing)

• Symbolic to numerical computing

• Hardware and software architecture of a HPC cluster

## What have we learnt?

• We did not know each other

• We do not have the same scientific or technical interests

• We felt that our work was useful

• We had a good time

## And now?

Maybe a good idea to renew and promote this kind of exercise

• Locally (in the Grenoble-Alpes university environment)

• Regularly (once a year?)

• To adress to all scientific and technical communities

## To sum up ...

• This exercise is an excellent framework to learn!
• Develop technical skills
• Develop communicational skills
• Interdisciplinary
• Promote the use of HPC CIMENT's infrastructure
• Apply this activity with students of different levels
• Interact with the team that proposed this project, further develop our initial Python implementation