

¡Hola!
"Hoy aprendo Git con @jotarios_, @gracenikole"

#ACM #CiberSecurityClub #IEEE




Jorge
Martin
Grace
Content
- Why use a VCS?
- What is Git?
- Main stages and states: Modified, staged, Committed
- Git setup: System, Global, Local config
- Git basics: status, add, commit, push, pull, etc
- Branching model: branch, checkout
- Git internals: how does git work?
- Git objects: how does git store data?
- Git refs: how does git keep track of data?
- Extras: keeping efficiency and consistency


Download for free: https://git-scm.com/book
Authors:
Scott Chacon,
Ben Straub

Control Version System
Did you...
controlled the versions using the filename?
my_awesome_project.zip
my_awesome_project2.zip
my_awesome_project_final_version.zip
my_not_awesome_project.zip
my_awesome_este_si_es.zip
my_awesome_rosita_de_guadalupe.zip

Control Version System
Did you...
wanna see a previous version of you project?
BUT, YOU CAN'T!

Control Version System
Did you...
lose files and didn't have a backup?

Control Version System
Did you...
share your project using Email, Drive, Whatsapp, Messenger, Photo, etc?

Control Version System
Did you...
merge your work with your teammates work?

Control Version System
-
Versioning
-
Code/Project History
-
Team collaboration
- You can see who messed things up

What is Git?
"Git is a free and open-source distributed version control system designed to handle everything from small to very large projects with speed and efficiency".
Definition taken from https://git-scm.com/

Main local stages
working directory, index & working branch

Your project
Working directory/area
index
staging area
Your branch
Working branch
? ? ?
? ? ?
committed
modified
staged
Your file is...
You are in...

Let's go!
# Local (or project) scope
$ git config user.name “Jorge Rios”
$ git config user.email “jrios@utec.edu.pe”
# Global scope
$ git config --global user.name “Jorge Rios”
$ git config --global user.email “jrios@utec.edu.pe”
$ git config --list

Our first repository
$ git init
Initialized empty Git repository in {PATH}/.git/
master *
$ git add --all
master *
$ git commit -m "Add my text on README.md"
master *
$ git status

Your project
Working directory/area
index
staging area
Your branch
Working branch
git add
You are in...
git commit

Your project
Working directory/area
index
staging area
Your branch
Working branch
git add
You are in...
git commit
origin/master
origin
Remote repository
git push

Your project
Working directory/area
index
staging area
Your branch
Working branch
You are in...
origin/master
origin
Remote repository
git pull

More porcelain commands
$ git diff
$ git commit -- {file}
See my changes
Unmodifying a modified file in the working directory
$ git reset HEAD -- {file}
Revert staged changes
Branching model


Branching model
master *
$ git checkout -b feature/blizzard-logo
master *
$ git checkout feature/blizzard-logo
feature/blizzard-logo *
$ git branch -d feature/blizzard-logo
Create a new branch
Use a branch
Delete a branch

Let's fork a created project!


https://bit.ly/repolink
$ git clone https://github.com/gracenikole/py-project.git
Cloning 'py-project'...
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 6 (delta 0), reused 2 (delta 0), pack-reused 3
Receiving objects: 100% (6/6), done.
Clone your repository

Create a branch
master*
$ git branch develop
Change your branch
master*
$ git checkout develop
Changed to branch 'develop'
and add your name in README.md file

This is a shortcut for branch and checkout
master*
$ git checkout -b develop
Changed to branch 'develop'
List all your branches
develop*
$ git branch
* develop
master

Add and commit your changes
develop*
$ git add --all
divide*
$ git commit -m "Update {readme.md}"

Change your branch again :D
develop*
$ git checkout master
Changed to branch 'master'

And... merge
master*
$ git merge develop
Updating a3d68e0..656ec87
Fast-forward
README.md | 10 +++++++++-
Now, you can delete the branch
master*
$ git branch -d develop
Deleted branch develop (656ec87)

And push
master*
$ git push


$ git init
Initialized empty Git repository in {PATH}/.git/
Initialize a new Git repository
$ tree .git

.git
├── branches
├── config
├── description
├── HEAD
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ ├── ...
│ ├── pre-receive.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

$ vim testing_blob.py
What is a Git object?
# file: testing_blob.py
print("Hello world from Git")
$ git hash-object -w testing_blob.py
5eefe3945ef60476ef9a4f37f4f3d653ef2316fe

.git
├── branches
├── config
├── description
├── HEAD
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
| ├── ...
│ ├── pre-receive.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── 5e
│ │ └── efe3945ef60476ef9a4f37f4f3d653ef2316fe
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

$ git cat-file -p 5eefe3945ef60476ef9a4f37f4f3d653ef2316fe
print("Hello world from Git")
$ git cat-file -t 5eefe3945ef60476ef9a4f37f4f3d653ef2316fe
blob
Discovering the git object
Retrieving the content

Git internals
So far we have seen only the "front end" of Git. These are called the porcelain commands. Now it's time to look at the plumbing, the part that does all the heavy lifting

Basic concepts
- Git internals work and act as minimal filesystem
- This is a context addressable file system, where files are stored as key-value pairs based on their content
- Version control is run on top of this and is what we have seen so far

Git file structure


.git
├── branches
├── config
├── description
├── HEAD
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
| ├── ...
│ ├── pre-receive.sample
│ └── update.sample
├── info
│ └── exclude
├── objects
│ ├── 5e
│ │ └── efe3945ef60476ef9a4f37f4f3d653ef2316fe
│ ├── info
│ └── pack
└── refs
├── heads
└── tags

File system structure: Explained
- HEAD: pointer to current branch and index
- config/: project configuration
- description/: GitWeb
- info/: Excluded files info
- hooks/: User hooks
- objects/: Where objects are stored
- refs/: Where pointers to commits are stored

Some key concepts...
We will need to define some concepts before moving on to really understand how each part of Git works

Object
We will consider an object to be a piece of data.
Ex. a .txt file, a .png image.
They "live" in a location
Pointer
A pointer is an object. But its data always contains a location of another object


Index
An index is a special type of metadata.
It will contain important locations within a database
A hash is a mathematical transformation defined as f(x), where x is some object.
This transformation convert the object of length n into a constant length series of hexadecimal digits

Git objects
- Store as key-value (like dictionaries)
- Either Blob, Tree or Commit
- Lives on .git/object
The units of Git

Git object: structure


Git objects
Blob
- Simplest type
- Any type of single file
- .cpp, .py, .exe, etc
#See content
git cat-file -p 6f2751951abc19u98
#Create object
git hash-object -w main.cpp

Git objects
Tree
- Contains one or more entries
- Each entry is composed of (mode, object type, key)
- Like a directory in Unix or folder in Windows
#See tree
git cat-file -p 6abfe790271bd
#Add pointer to blob
git update-index --add --cacheinfo 100644 82142174bacd main.cpp
#Save tree
git write-tree
#Add pointer to tree
git read-tree --prefix=<tree_alias> 6abfe790271bd

Git tree: structure


Git objects
Commit
- Commits are like trees but with additional information
- They store "snapshots" of the project
- They are used to save state information as well as user information and messages
# Commit a tree
git commit-tree abcd1234 -m "first commit"
# Or with a pipe
echo "first commit" | git commit-tree abcd1234
#Commit a tree with a parent
git commit-tree abcd5678 -p abcd1234 -m "second commit"

Commit object structure


Git refs
Storing stories
- Reference map, to make things simpler
- Have the latestest changes ready
- Lives in .git/refs

References
Commits and branches
- Stores latest commit for each branch easily (for remote and local)
- Stores tags for special commits
- Lives on .git/refs

Commit map with branches


The HEAD
- Keeps track of the HEAD of the commit history
- Contains a pointer to the ref of the current branch
Ex.
ref: refs/heads/master

Remotes
Stores the heads information of different remote repositories.
Ex.
.git/refs/remotes/origin/master
.git/refs/remotes/fork/master
They are considered read-only since they get updated on push/pull
Tags
- Lighweight: Essentially a ref to a specific commit to name historic changes.
- Annotated: A tag object gets created with a pointer and tag information. Can also point to objects
Ex. Beta, ver0.1, etc
Lives in ./git/refs/tags

Extra features
Efficiency and data recovery

Packfiles
If I change a ; in a file with 10 million lines a new file gets, created. Could we store only the change while keeping our structure ?
#Converts objects into packs
git gc

Data efficiency
- Objects are packed periodically if they exceed a certain configurable size
- Refs can also be packed in ./git/packed-refs/ by the same criteria

Data recovery
- Changes made to HEAD in a reset or a drop of a commit are kept in a log
- Integrity checks can be executed to analyze for dangling objects
# See HEAD changelog
git reflog
# See data integrity
git fsck --full

GitHub
Student Developer Pack

Get for free from https://education.github.com/pack

References
- Pro Git (2014) S. Chacon, B. Straub. Retrieved from https://git-scm.com/book/en/v2
- I Git It 101 - Cristian Llanos from @FandangoLat
- Git domentation: https://git-scm.com/docs/

GIT 102: Intro & Data Structures
By Jorge Rios
GIT 102: Intro & Data Structures
- 109