CABLE SVN-Git Transition

User Training 2023

What's happening?

  • We are transitioning CABLE development from SVN to Git.
  • This will streamline the model contribution and release process.
  • The SVN repository will be put into READ ONLY mode in November 2023.
  • All development from November 2023 will be managed through GitHub.

What is Git?

Git is a distributed version control system (DVCS) that enables collaborative development on shared codebases.​

It is one of the most widely-used version control systems in the world.

© Jason Long, CC Attribution 3.0

What is GitHub?

GitHub is a web-based platform which hosts Git repositories, permitting additional collaborative features like issue tracking, release management, continuous integration etc.​

GitHub is not Git, but has become synonymous with it. The two words are often used interchangeably in the wild.

© GitHub inc.

Git vs. SVN

Git SVN
Distribution Distributed, each developer has a complete copy. Centralised, there is only one copy.
Structure / History Every copy IS a repository itself and has the whole history. Central repository holds the history.
Commits Hash-based. Sequential.
Branching / Merging Quick and (mostly) painless. Move involved, requires planning.
Online / Offline Commits can be made locally offline before pushing to a remote/central repository. Connection to central repository required to commit changes.

Key concepts (Git)

Concept Description
Repository (repo) A place where code lives.
Branch A parallel line of development, allowing you to work on code in isolation from the main codebase.​
Commit A snapshot of the changes made to a repository.​
Merge The act of reconciling changes made in one branch into another.
Clone A copy of a remote repository for local development.​
Checkout The act of switching to a different version of a branch, file etc.
Pull The act of pulling changes from one repository to another (i.e. remote to local).​
Push The act of pushing changes from one repository to another (i.e. local to remote).​

There are many more concepts surrounding Git, but let’s just start with the basics.​

Branching

  • Branching allows parallel development isolated from the main codebase.
  • Branches are used to develop new features, address bugs, prepare releases etc.
  • They are merged back into the main codebase once work is complete.
  • Once merged, they can be safely removed.

Merging

  • Bringing the changes from one branch to another.
  • 2 main use cases for merging:
    • Update a feature branch with new commits from the main branch.
    • Bring changes from a feature branch back into the main branch. Usually done via the GitHub interface and not locally by command line.

3 Ways to create a Git repository

  1. Locally.
  2. Creating a new repository remotely (i.e. GitHub), then cloning it.
  3. Cloning an existing remote repository.

 

We will cover (3) in the training today.

 

Further reading:

  • https://www.atlassian.com/git/tutorials/setting-up-a-repository/git-init
  • https://docs.github.com/en/get-started/quickstart/create-a-repo

Key concepts (GitHub)

Concept Description
Repository (repo) A shared, remote place where code lives on GitHub​.
Issue An issue/task describing a proposed change to that repository.​
Commit A commit pushed to the remote repository with a meaningful message – typically in response to an issue.​
Pull Request A request to pull proposed changes from one branch to another on GitHub.​
Review Constructive discourse/feedback on a proposed change​.

Again, there are many more concepts surrounding GitHub, but let’s just start with the basics.​

Basic Commands

  • git init
  • git clone
  • git fetch
  • git add
  • git commit
  • git merge
  • git status

 

 

https://git-scm.com/docs

  • git log
  • git pull
  • git push
  • git checkout
  • git branch
  • git rm

git init

# Create a directory to hold your repository.
mkdir -p ~/myrepo

# Move into it.
cd ~/myrepo

# Initialise the repository.
git init

git clone / git fetch

# Clone a remote repository.
git clone <repo> [<dir>] 

# Download commits, files and refs
# from a remote repo to your local copy.
git fetch

git add

# Do some work...
echo "Hello, my name is Ben." >> ben.txt

# Add file(s) to the staging index.
git add ben.txt

# or add all files matching a pathspec (regular expression).
git add *.txt

git commit

# Commit all staged files to the repository.
git commit -m "Added ben.txt. Fixes #N."

# Note: it is good practice to reference the GitHub issue (#N)
# This activates links all over GitHub.

git merge

# Merge BRANCH into the currently active branch.
git merge BRANCH

# Most merges will be handled through GitHub.
# ...but it is good to know.

git status / git log

# Check the status of the repository.
git status

# Show the log of commits.
git log

# BONUS: Figure out who broke things and when!
git blame <pathspec>

git pull / git push

# Pull changes from a remote repository.
git pull

# Push changes to a remote repository.
git push

git checkout

# Create a branch from the current repository/branch.
# and switch to it.
git checkout -b <BRANCH>

# Re-checkout code (wiping local changes).
git checkout <pathspec>

git branch

# List all of the branches in your local repository.
git branch

# Create a new branch.
git branch <branch>

# Delete a local branch (safe).
# Only works if the branch as already been merged.
git branch -d <branch>

git rm

# Remove a file from the git repository.
# (Stop tracking and delete it)
git rm <pathspec>

.gitignore

A special file in the repo root that specifies which files to explicitly ignore from version control.

 

Typically used to ignore big files, executables and build outputs.

 

https://docs.github.com/en/get-started/getting-started-with-git/ignoring-files

# Example Python .gitignore file.
# https://github.com/github/gitignore/blob/main/Python.gitignore

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

Making commits

There is a 3-step process to making a commit in Git.

 

In Git, you must:

  1. MAKE your changes.
  2. STAGE them for a commit.
  3. COMMIT them

 

It is also good practice to use git status and git log before / after you have staged or committed changes.

# Example commit

# Make the change.
echo "Hello, my name is James" >> james.txt

# Show unstaged changes.
git status

# Stage it for commit.
git add james.txt

# Show staged changes.
git status

# Commit it to your local repository.
git commit -m "My changes."

# Show commit log.
git log

Remote repositories

  • A shared remote repository enables collaborative development.
  • Developers push/pull changes to/from remote repositories to share and update code.

Collaborative workflows.

There are a number of different workflow options for git.

  • GitFlow
  • Fishbone
  • OneFlow

 

We use a variant of OneFlow.

 

https://www.endoflineblog.com/oneflow-a-git-branching-model-and-workflow

Best practices

  • Branches should be short-lived to remain current.
  • Stay on message, keep the changes relevant to the issue.
  • Raise subsequent issues for additional work later.
  • Run tests BEFORE committing changes.
  • Create smaller, more granular commits.
  • Regular/frequent commits with meaningful (semantic) messages.
    • Extra points if you reference a GitHub issue.
  • Pull changes and reconcile conflicts locally BEFORE pushing to remotes.
  • Clean up stale branches, especially when merged into main branch.
  • ALWAYS work in feature branches.
  • Main branches should be push-protected (CABLE's will be).

Live Exercise

(what we're all here for)

Live Exercise

We will show you how to:

  • Set up Git for the training.
  • Clone the training repository.
  • Create an issue on GitHub.
  • Create a branch on GitHub (you need to be a collaborator)
  • Make changes / commit to your local copy.
  • Push changes to the remote and kick off review process.
  • Deal with merge conflicts.

Prerequisites

  • You know how to use a terminal / connect to ARE (Terminal).
  • You have an account at NCI.
  • You have a GitHub account.
  • You have SSH keys setup on github.
  • You are a collaborator on the training repository.
    • https://github.com/CABLE-LSM/land-git-training-2023

If you answered “NO” to any of these, raise your hand and a helper will get you started.

Set up SSH Keys

# If you already have a public key, skip to 3.
cat ~/.ssh/id_rsa.pub

# 1. On your machine, or gadi terminal
ssh-keygen -t rsa -b 4096

# 2. Fill in details.

# 3. Cat out your public key and copy it
cat ~/.ssh/id_rsa.pub

# 4. Go here, create a new key and paste in the public key.
# https://github.com/settings/keys

# ... now you should be able to clone.

Setting up Git (if you haven't already)

# Connect to Gadi (or access the ARE terminal)
ssh -Y userid@gadi.nci.org.au

# Configure Git with your details
git config --global user.name "FIRST LAST"
git config --global user.email "first.last@example.com"

# Set automatic tracking against remote branches
git config --global --add --bool push.autoSetupRemote true

# Create a working directory
mkdir -p ~/work

Clone the repository

# Move into your work directory
cd ~/work

# Clone the repository
git clone git@github.com:CABLE-LSM/land-git-training-2023.git

# Move into the repository
cd land-git-training-2023

Create an issue on GitHub

  1. Navigate to the repository on GitHub.
  2. Create an issue.
  3. Create a branch from the issue.
  4. Enter the commands into your terminal to bring down the branch.

 

"origin" is the default name for the remote repository.

Do some work and commit

# Create a file and enter some text. For example:
echo “Hello, my name is Ben” >> ben.txt

# Check the status of the repository
git status

# Stage the changes..
git add ben.txt

# Look at the status of the staged changes.
git status

# Commit the changes.
# Reference your open issue with "Fixes #N" where N is the issue number. For example:
git commit -m “Added ben.txt. Fixes #N”

# Take a look at the state of the repository (to see the commit has been created), view the log
git status
git log

# Push the changes to the remote repository.
git push

Create a pull request

  1. Navigate to GitHub.
  2. Create a pull request.
  3. Select your branch as the source, main as the destination.
  4. Ask for a review.
  5. Wait for a reviewer to approve the change and merge.
  6. User to confirm their own merge once approved.

 

CABLE-LSM will provide a template for pull requests.

But what about conflicts?

(always plan on the plan not going according to plan)

Merge conflicts

A merge conflict is when Git can not merge 2 versions of the code together automatically.

 

A conflict is resolved when the file looks exactly as the developer wants. This is called resolving the conflict.
 

Merge conflicts

Merge conflicts can arise when:

  • The same lines of code are modified by 2 or more developers simultaneously.
  • Changes have been made in both branches you are trying to merge.
  • Files have been renamed, moved or removed without proper commands.
  • Whitespace / formatting changes (Windows users, we're looking at you).
  • One developer deletes code another is currently working on.
  • The commit is so big or convoluted that Git can’t figure it out.
  • … and so on

Merge conflicts - continued

You typically won't know you have a merge conflict until:

  • You attempt to push/pull changes to/from a remote repository.
  • You are attempting to merge different branches locally.

 

Always pull first (to get recent changes), then push.

 

Git marks the conflict with a "yours" vs "theirs" syntax.

Dealing with merge conflicts

  1. Open the conflicting file(s).
  2. Reconcile the conflicts by editing the offending code and commit boundary syntax.
  3. Stage and commit the changes.
  4. Push the changes.

 

Some editors (i.e. VSCode) offer a GUI to do this for you.

<<<<<<<< HEAD (yours)
Hello, my name is James.
========
Hello, my name is Bond.
>>>>>>>> branch-name (theirs)
Hello, my name is James, James Bond.

Resolve the conflict

Let's break things, then fix them!

Working in pairs:

  1. Create an issue and branch as before.
  2. Both of you check out the SAME BRANCH and make DIFFERENT changes to the SAME LINE of the SAME FILE. Add and commit with different messages.
  3. Player 1 pushes first.
  4. Player 2 pushes second - what happens?
  5. Player 2 reconciles the conflict, add, commit and push.
  6. Follow the review process as normal.

Housekeeping

# GitHub is configured to delete merged branches.
# ...but you have to clean up locally.

# List local branches that have been merged on remote.
git branch --merged

# Delete your local branch
git branch -d BRANCH

We made it!

Any questions?

"Advanced" topics

  • Rebase
  • Undoing changes
  • GitHub Actions

Rebase

  • Avoid manually merging by "rebasing" the current branch from another.
  • Results in a more linear commit history but creates NEW commits to update the current branch.
  • Can get messy with long-lived branches.
  • NEVER rebase public history!!!

 

https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase

Undoing changes

# Combine current changes with the previous commit.
# or amend the message.
git commit --amend

# Re-checkout files from the last commit (resetting changes).
git checkout <pathspec>

# Un-stage files and/or revert any changes to those files.
git reset

# Revert changes back to a particular point.
# (creates inverse commits)
git revert

Note: Sometimes it is cleaner to just create a new commit correcting any issues. 

 

We all make mistakes.

GitHub Actions

Automated workflows on GitHub:

  • Testing / code linting.
  • Coverage.
  • Documentation generation.
  • Release and code publication.
Made with Slides.com