Introduction to version control with git and GitHub

Stefano Moia

École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland;    physiopy (https://github.com/physiopy)

11.03.2022

smoia
@SteMoia
s.moia.research@gmail.com
Stefano Moia, 2022

1. Version control systems and git workflow

2. Going public: Licence, DOI, contributors

3. Automation: CI/CD, tests, and releases

Does any of these situations look familiar?

I can't work on that project now because my colleague/friend/dog is working on [a different part than what I'd modify of] it at the moment...

Version Control

Version Control

Version Control

Version control systems are a way to manage and track changes to files.

Content

Aggregation/delivery

A classic git(Hub) flow

Create branch "dev"

Commit

Merge dev into main

Diverging main: conflict?

Merge main into dev

Initialise repository
"Main" branch

Main

Dev

Bug

A classic git(Hub) flow

Create branch "dev"

Commit

Merge dev into main

Diverging main: conflict?

Merge main into dev

Initialise repository
"Main" branch

Fork ("upstream" vs "origin")

Pull from upstream

Merge origin/main into dev

Clone (local repository)

Pull Request

Pull from *

Push to *

Main

Dev

Upstream

Main

Origin

Dev

Main

(local)

Pull requests and Reviews

Pull requests and Reviews

Some suggestions for...

  • Keep you contribution small and focused
  • Make your contribution as clear as possible
  • Use a review as a learning experience
  • Be patient: reviewers might ask you some more work than you expected, but it's always to improve your work.
  • Be kind and patient
  • Don't limit your review to the apparent changes - depending on the importance of the review, take the time to look at how the whole project might change.
  • Keep your review to what's necessary for the contribution - if it would be nice to ..., open an issue (or think about making the contribution yourself).

... Authors

... Reviewers

File history

Take home #1

Working with git(Hub) allows you to:

  1. work in parallel on new features without disrupting the "main" version of your project

  2. track changes in time.

Bonus: it can force a team to double check projects!

1. Version control systems and git workflow

2. Going public: Licence, DOI, contributors

3. Automation: CI/CD, tests, and releases

License your work

A work that is not licensed is not public (paradox!)

There are many (open source) licences to pick up from, not only code-related.

www.choosealicense.org

The licence should be in the first commit you make.

Personal picks for science: Apache 2.0 and CC-BY-ND-4.0
(consider L-GPLv3.0 and CC-BY-4.0 too)

Make your project identifiable (and citable)

{
    "license": "Apache-2.0", 
    "title": "physiopy/phys2bids: BIDS formatting of physiological recordings",
    "upload_type": "software",
    "creators": [
      [...]
        {
            "orcid": "0000-0002-7796-8795",
            "affiliation": "Florida International University", 
            "name": "Katie Bottenhorn"
        }, 
      [...]
    ], 
    "access_right": "open"
}

Publish!

If your project is software related, think about publishing it.

While there are various journals that can be targeted for a software publication, JOSS is free and completely integrated in GitHub.

Not all contributions are the same...

Independently from its kind, projects can have different types of contributions.

Different communities can have different requirements or follow different workflows.

Enters the contributors' guidelines
(and a code of conduct).

... but all should be recognised

Depending on the community and the governance scheme, contributions might be recognised differently. Explicitly writing down how the community does it helps new contributors to join.

One way of recognising contributors is the all-contributors specification.

Take home #2

Go public:

  1. licence your work,

  2. assign a DOI to it,

  3. recognise all contributions!

1. Version control systems and git workflow

2. Going public: Licence, DOI, contributors

3. Automation: CI/CD, tests, and releases

Let automation do the hard work

CI/CD workflows are your friend.

  • Continuous Integration: frequently integrating new changes into the main branch of a tool. Normally, workflows run automatic steps at each integration, e.g. automatic testing.
     
  • Continous Deployment: frequently deploying (releasing) new versions of a tool using automated workflows (e.g. right after integration).

Workflows can require a bit more work to be set up, but they can save a lot of time and energy in the long run!

Let automation do the hard work

CI/CD workflows are your friend.

  • Continuous Integration: frequently integrating new changes into the main branch of a tool. Normally, workflows run automatic steps at each integration, e.g. automatic testing.
     
  • Continous Deployment: frequently deploying (releasing) new versions of a tool using automated workflows (e.g. right after integration).

Workflows can require a bit more work to be set up, but they can save a lot of time and energy in the long run!

Test your code

Testing a project is as important as developing it.

Arguably, it's even more important, so spend time on it!

There are multiple types of tests:

  • User tests: a person (user) uses the tool and make a report on it.
  • Automated tests: developers write tests to be run after each change.
    • Unit tests: they test (new) parts of the project on their own.
    • Integration/End-to-end tests: they test the project as a whole.

Release

Making your project public is important: it will attract more users/readers and (possibly) more contributors.

Create a "release", and if the project is code-related, package it and distribute it!

Let's not reinvent the wheel

Take advantage of the marketplace: there is a very high probability that what you are looking for is already available.

Take home #3

Set up workflows to test, build, and release/publish your project.
They will not only help you, but also increase the stability and reliability
of your outcome.

That's all folks!

Thanks to...

...you for the (sustained) attention!

... the physiopy contributors

... the MIP:Lab @ EPFL

Stefano Moia, 2022

1. Working with git(Hub) allows you to work in parallel on new features without disrupting the "main" version of your project and track changes in time.

2. Go public: licence your work, assign a DOI to it, and recognise all contributors!

3. Set up workflows to test, build, and release/publish your project. They will not only help you, but also increase the stability and reliability of your outcome.

Take home messages

Any question [/opinions/objections/...]?

Oh, and don't forget!

Oh, and don't forget!