¡Hola!

"Hoy aprendo Git con @jotarios_, @gracenikole"

 

#ACM   #CiberSecurityClub   #IEEE

Jorge

Martin

Grace

Content

  • Why use a VCS?
  • What is Git?
  • Main stages and states: Modified, staged, Committed
  • Git setup: System, Global, Local config
  • Git basics: status, add, commit, push, pull, etc
  • Branching model: branch, checkout
  • Git internals: how does git work?
  • Git objects: how does git store data?
  • Git refs: how does git keep track of data?
  • Extras: keeping efficiency and consistency

Download for free: https://git-scm.com/book

0
 Advanced issue found
 
 

Authors:

Scott Chacon,

Ben Straub

Control Version System

Did you...

controlled the versions using the filename?

my_awesome_project.zip

my_awesome_project2.zip

my_awesome_project_final_version.zip

my_not_awesome_project.zip

my_awesome_este_si_es.zip

my_awesome_rosita_de_guadalupe.zip

Control Version System

Did you...

wanna see a previous version of you project?

BUT, YOU CAN'T!

Control Version System

Did you...

lose files and didn't have a backup?

 

Control Version System

Did you...

share your project using Email, Drive, Whatsapp, Messenger, Photo, etc?

 

Control Version System

Did you...

merge your work with your teammates work?

 

Control Version System

  • Versioning
     
  • Code/Project History
     
  • Team collaboration
     
  • You can see who messed things up

What is Git?

"Git is a free and open-source distributed version control system designed to handle everything from small to very large projects with speed and efficiency".

Definition taken from https://git-scm.com/

Main local stages

working directory, index & working branch

Your project

Working directory/area

index

staging area

Your branch

Working branch

? ? ?

? ? ?

committed

modified

staged

Your file is...

You are in...

Let's go!

# Local (or project) scope
$ git config user.name “Jorge Rios”
$ git config user.email “jrios@utec.edu.pe”
# Global scope
$ git config --global user.name “Jorge Rios”
$ git config --global user.email “jrios@utec.edu.pe”
$ git config --list

Our first repository

$ git init

Initialized empty Git repository in {PATH}/.git/
master *
$ git add --all
master *
$ git commit -m "Add my text on README.md"
master *
$ git status

Your project

Working directory/area

index

staging area

Your branch

Working branch

git add

You are in...

git commit

Your project

Working directory/area

index

staging area

Your branch

Working branch

git add

You are in...

git commit

origin/master

origin

Remote repository

git push

Your project

Working directory/area

index

staging area

Your branch

Working branch

You are in...

origin/master

origin

Remote repository

git pull

More porcelain commands

$ git diff
$ git commit -- {file}

See my changes

Unmodifying a modified file in the working directory

$ git reset HEAD -- {file}

Revert staged changes

Branching model

Branching model

master *
$ git checkout -b feature/blizzard-logo
master *
$ git checkout feature/blizzard-logo
feature/blizzard-logo *
$ git branch -d feature/blizzard-logo

Create a new branch

Use a branch

Delete a branch

Let's fork a created project!

https://bit.ly/repolink

$ git clone https://github.com/gracenikole/py-project.git

Cloning 'py-project'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 0), reused 2 (delta 0), pack-reused 3
Receiving objects: 100% (6/6), done.

Clone your repository

Create a branch

master*
$ git branch develop

Change your branch

master* 
$ git checkout develop

Changed to branch 'develop'
and add your name in README.md file

This is a shortcut for branch and checkout

master* 
$ git checkout -b develop

Changed to branch 'develop'

List all your branches

develop* 
$ git branch

* develop
  master

Add and commit your changes

develop* 
$ git add  --all
divide* 
$ git commit -m "Update {readme.md}"

Change your branch again :D

develop* 
$ git checkout master

Changed to branch 'master'

And... merge

master* 
$ git merge develop

Updating a3d68e0..656ec87
Fast-forward
 README.md | 10 +++++++++-

Now, you can delete the branch

master* 
$ git branch -d develop

Deleted branch develop (656ec87)

And push

master* 
$ git push 
$ git init

Initialized empty Git repository in {PATH}/.git/

Initialize a new Git repository

$ tree .git
.git
├── branches
├── config
├── description
├── HEAD
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
│   ├── ...
│   ├── pre-receive.sample
│   └── update.sample
├── info
│   └── exclude
├── objects
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags
$ vim testing_blob.py

What is a Git object?

# file: testing_blob.py
print("Hello world from Git")
$ git hash-object -w testing_blob.py

5eefe3945ef60476ef9a4f37f4f3d653ef2316fe
.git
├── branches
├── config
├── description
├── HEAD
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
|   ├── ...
│   ├── pre-receive.sample
│   └── update.sample
├── info
│   └── exclude
├── objects
│   ├── 5e
│   │   └── efe3945ef60476ef9a4f37f4f3d653ef2316fe
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags
$ git cat-file -p 5eefe3945ef60476ef9a4f37f4f3d653ef2316fe
print("Hello world from Git")
$ git cat-file -t 5eefe3945ef60476ef9a4f37f4f3d653ef2316fe
blob

Discovering the git object

Retrieving the content

Git internals

So far we have seen only the "front end" of Git. These are called the porcelain commands. Now it's time to look at the plumbing, the part that does all the heavy lifting

Basic concepts

  • Git internals work and act as minimal filesystem
  • This is a  context addressable file system, where files are stored as key-value pairs based on their content
  • Version control is run on top of this and is what we have seen so far

Git file structure

.git
├── branches
├── config
├── description
├── HEAD
├── hooks
│   ├── applypatch-msg.sample
│   ├── commit-msg.sample
|   ├── ...
│   ├── pre-receive.sample
│   └── update.sample
├── info
│   └── exclude
├── objects
│   ├── 5e
│   │   └── efe3945ef60476ef9a4f37f4f3d653ef2316fe
│   ├── info
│   └── pack
└── refs
    ├── heads
    └── tags

File system structure: Explained

  • HEAD: pointer to current branch and index
  • config/: project configuration
  • description/: GitWeb
  • info/: Excluded files info
  • hooks/: User hooks
  • objects/: Where objects are stored
  • refs/: Where pointers to commits are stored

Some key concepts...

We will need to define some concepts before moving on to really understand how each part of Git works

Object

We will consider an object to be a piece of data.

 

Ex. a .txt file, a .png image.

 

They "live" in a location

 

Pointer

A pointer is an object. But its data always contains a location of another object

 

Index

An index is a special type of metadata.

 

It will contain important locations within a database

A hash is a mathematical transformation defined as f(x), where x is some object.

 

This transformation convert the object of length n into a constant length series of hexadecimal digits

 

Git objects

  • Store as key-value (like dictionaries)
  • Either Blob, Tree or Commit
  • Lives on .git/object

The units of Git

Git object: structure

Git objects

Blob

  • Simplest type
  • Any type of single file
  • .cpp, .py, .exe, etc
#See content
git cat-file -p 6f2751951abc19u98

#Create object
git hash-object -w main.cpp

Git objects

Tree

  • Contains one or more entries
  • Each entry is composed of (mode, object type, key)
  • Like a directory in Unix or folder in Windows
#See tree
git cat-file -p 6abfe790271bd

#Add pointer to blob
git update-index --add --cacheinfo 100644 82142174bacd main.cpp

#Save tree
git write-tree

#Add pointer to tree
git read-tree --prefix=<tree_alias> 6abfe790271bd

Git tree: structure

Git objects

Commit

  • Commits are like trees but with additional information
  • They store "snapshots" of the project
  • They are used to save state information as well as user information and messages
# Commit a tree
git commit-tree abcd1234 -m "first commit"

# Or with a pipe
echo "first commit" | git commit-tree abcd1234 

#Commit a tree with a parent
git commit-tree abcd5678 -p abcd1234 -m "second commit"

Commit object structure

Git refs

Storing stories

  • Reference map, to make things simpler
  • Have the latestest changes ready
  • Lives in .git/refs

References

Commits and branches

  • Stores latest commit for each branch easily (for remote and local)
  • Stores tags for special commits
  • Lives on .git/refs

Commit map with branches

The HEAD

  • Keeps track of the HEAD of the commit history
  • Contains a pointer to the ref of the current branch

Ex.

ref: refs/heads/master

Remotes

Stores the heads information of different remote repositories.

 

Ex.

.git/refs/remotes/origin/master

.git/refs/remotes/fork/master

 

They are considered read-only since they get updated on push/pull

Tags

  • Lighweight: Essentially a ref to a specific commit to name historic changes.
  • Annotated: A tag object gets created with a pointer and tag information. Can also point to objects

 

Ex. Beta, ver0.1, etc

Lives in ./git/refs/tags

Extra features

Efficiency and data recovery

Packfiles

If I change a ; in a file with 10 million lines a new file gets, created. Could we store only the change while keeping our structure ?

#Converts objects into packs
git gc

Data efficiency

  • Objects are packed periodically if they exceed a certain configurable size
  • Refs can also be packed in ./git/packed-refs/ by the same criteria

Data recovery

  • Changes made to HEAD in a reset or a drop of a commit are kept in a log
  • Integrity checks can be executed to analyze for dangling objects
# See HEAD changelog
git reflog

# See data integrity
git fsck --full

GitHub
Student Developer Pack

Get for free from https://education.github.com/pack

References

  • Pro Git (2014) S. Chacon, B. Straub. Retrieved from https://git-scm.com/book/en/v2
     
  • I Git It 101 - Cristian Llanos from @FandangoLat
  • Git domentation: https://git-scm.com/docs/

GIT 102: Intro & Data Structures

By Jorge Rios

GIT 102: Intro & Data Structures

  • 109