Loading

The Git's Guts

plipski

This is a live streamed presentation. You will automatically follow the presenter and see the slide they're currently on.

The Git's Guts

by Mikołaj Karebski @mkarebski

and Paweł Lipski @plipski

Also included: a bunch of prevalent misconceptions and nifty everyday tricks!

Software Engineer

(Kotlin, Golang)

mail: mkarebski@virtuslab.com

github: github.com/mkarebski

Contact - Mikołaj Karebski

Software Engineer (Devops, Scala)

and also a git hacker...

github.com/Virtuslab/git-machete

mail: plipski@virtuslab.com

github: github.com/PawelLipski

Contact - Paweł Lipski

Agenda

  • Commits
  • Objects
  • Branches
  • Tags
  • Reflogs
  • Garbage collection
  • Contact & Questions

M

Commits

P

Commits

Question: what properties of a commit (other than message) can you think of?

P

Commits

Parent(s) of Commit

Number of parents How it can be created
Zero root commit(s!) of the repo
One well... just "git commit"
Two regular merge (with 1 branch)
Three... or more (WTF?!) octopus merge (with 2+ branches) 

P

Commits

Parent(s) of Commit

P

GitHub's Octocat, named after octopus merges

Commits

Parent(s) of Commit

$ git log -1 2cde51f
commit 2cde51fbd0f310c8a2c5f977e665c0ac3945b46d
Merge: 7471c5c c097d5f 74c375c 04c3a85 5095f55 4f53477
2f54d2a 56d37d8 192043c f467a0f bbe5803 3990c51 d754fa9
516ea4b 69ae848 25c1a63 f52c919 111bd7b aafa85e dd407a3
71467e4 0f7f3d1 8778ac6 0406a40 308a0f3 2650bc4 8cb7a36
323702b ef74940 3cec159 72aa62b 328089a 11db0da e1771bc
f60e547 a010ff6 5e81543 58381da 626bcac 38136bd 06b2bd2
8c5178f 8e6ad35 008ef94 f58c4fc4 2309d67 5c15371 b65ab73
26090a8 9ea6fbc 2c48643 1769267 f3f9a60 f25cf34 3f30026
fbbf7fe c3e8494 e40e0b5 50c9697 6358711 0112b62 a0a0591
b888edb d44008b 9a199b8 784cbf8
Author: Mark Brown <[email redacted for privacy]>
Date:   Thu Jan 2 13:01:55 2014 +0000

    Merge remote-tracking branches [65 remote branch names]

P

Commits

Parent(s) of Commit

P

Commits

Tree vs DAG (directed acyclic graph)

P

Commits

Commit structure doesn't really constitute a tree in a general case.

 

Since each commit can have more than one parent, in fact the structure is a directed acyclic graph (DAG).

 

In a special case, however, commits would still form a tree as long as there are no merge commits in the entire repository.

Prevalent Misconception #1

P

Commits

Committer vs Author

Author + author date: set up only once when the commit is first created

 

Committer + commit date: updated every time when commit is "rewritten": amend, rebase, cherry-pick, ...

 

P

Commits

Nifty Everyday Trick #1

$ git log --pretty=fuller
commit 1ff36a94530ed96ae9cf41147922985337555f10 (HEAD -> some-branch, origin/some-branch)
Author:     Pawel Lipski <plipski@virtuslab.com>
AuthorDate: Thu Jan 24 19:00:38 2019 +0100
Commit:     Someone Else <selse@virtuslab.com>
CommitDate: Sat Jan 26 01:41:48 2019 +0100

    Craft a bunch of nifty hacks

commit 17cbd52bd16e89d96d10e51558ffb45351f17cd8 (develop)
Merge: 3c6020295 08b753152
Author:     Someone Else <selse@virtuslab.com>
AuthorDate: Fri Jan 25 11:31:21 2019 +0000
Commit:     Pawel Lipski <plipski@virtuslab.com>
CommitDate: Fri Jan 25 11:31:21 2019 +0000

P

Commits

Prevalent Misconception #2

The committer&author name&email are not verified in any way.

 

Users can even specify basically any author and committer, just a matter of setting the right git config.

 

There are other mechanisms for verifying authorship (signed tags/commits)... but remember SHA-1 has been SHAttered in Feb 2017 :/

P

Commits

Prevalent Misconception #2

P

git config --global user.name "John Doe"
git config --global user.email "john@doe.org"

# for the given repository
git config user.name "John Doe"
git config user.email "john@doe.org"

# per operation
....
git commit --author="John Doe <john@doe.org>" --no-edit

Commits

Committer vs Author

P

Objects

P

  • Commits
  • Trees
  • Blobs
  • Tags

Objects

The Real Guts of Git

P

Objects

P

Remember Linus Torvalds is primarly an OS/filesystem guy!

 

The underlying git storage is basically a very specialized FS... concepts like files, directories, symbolic links and file permissions are all reflected to some extent.

.git folder contents

Objects

.git folder contents

P

Someone really sucks at naming stuff...

Objects

.git folder contents

Is called Should rather be called
.git/refs/heads/ .git/refs/local_branches/
.git/refs/remotes/ .git/refs/remote_branches/
.git/HEAD .git/refs/HEAD (???)
.git/logs/ .git/reflogs/
.git/objects/
.git/index
that's ok :)

P

Objects

Internal structure

Objects are basically deflated/zlib-compressed text (for commits) or binary data (for trees/blobs)...  

...../.git/objects/xx/[a-f0-9]{38}

eg.:
...../.git/objects/c1/70510a828fd2c6d35f943b2a27b51605e5a450

M

Objects

Internal structure

DIY trick (available out of the box on most Linux distros):

  

$ pigz -d < .git/objects/c1/70510a828fd2c6d35f943b2a27b51605e5a450
commit 261<zero-byte>tree 3ee08f945d2d00b1be1c02e99bfd907eaa03ca19
parent d9229b06110638b0cc9c3dd143324d32c51229f8
author Pawel Lipski <pawel.p.lipski@gmail.com> 1552768066 +0100
committer Pawel Lipski <pawel.p.lipski@gmail.com> 1553727732 +0100

Migrate codebase to Python 3 (#35)

M

Objects

Internal structure

SHA-1 hash (that each object is identified by) is computed for unzipped contents, though:

 

 

pigz -d < .git/objects/c1/70510a828fd2c6d35f943b2a27b51605e5a450
<some contents...>

pigz -d < .git/objects/c1/70510a828fd2c6d35f943b2a27b51605e5a450 | sha1sum
c170510a828fd2c6d35f943b2a27b51605e5a450 -

M

Objects

Back to commits

Commits are just objects stored in .git/objects!

To view object contents in human-readable form, use plumbing command git cat-file -p <object-hash>...

$ git cat-file -p c170510a
tree 3ee08f945d2d00b1be1c02e99bfd907eaa03ca19
parent d9229b06110638b0cc9c3dd143324d32c51229f8
author Pawel Lipski <pawel.p.lipski@gmail.com> 1552768066 +0100
committer Pawel Lipski <pawel.p.lipski@gmail.com> 1553727732 +0100

Migrate codebase to Python 3 (#35)

Note the parent commits hash(es) and tree hash (3ee08f94)...

M

Objects

Trees

$ git cat-file -p 3ee08f94
100644 blob 88d4bacf2148a890a659544ba4c71293bc40ea6b	.gitignore
100644 blob 8b2fda574589bb659e8ad17ffcd27a71977f226c	.stestr.conf
100644 blob a7e2d1f420c6422ef20f9a22324ba29f9e1381f7	.travis.yml
100644 blob b06b13a98898da336b8273273a726d14840ad829	ISSUE_TEMPLATE.md
100644 blob 31da76bac0577cef7711016ab298847c5e403138	LICENSE
100644 blob 22ee63f4ab01807b3e336fec3ed3fe50fe9271cb	Makefile
100644 blob 101276cc905258d268086aba8066a1369ecfc6e6	README.md
100644 blob 5e9e418b32df4cac805d1a125872c9b1f48ebfce	RELEASE_NOTES.md
040000 tree ce13624259dbf791569f8a41526a6a54fa868ac7	completion
040000 tree bcbd4adea0a0c171bfb834072cf187fac9fe33aa	git_machete
040000 tree 9d938d03bd0f0fbf7b357abef7710b214c9f29ea	hook_samples
.....

M

Objects

Trees

They group files together and solve the problem of filenames.

 

tree == snapshot

(tree != changeset)

M

Objects

Prevalent Misconception #3

Git in principle does not store commits as changesets - even though that's what you see in diff/log!


Git generally stores snapshots.


Changesets (deltas) are only used for optimization (packs/packfiles) in long-term storage and also generated on the fly when pushing/pulling.

M

Objects

git init
echo 'hello world' > greeting.txt
git add greeting.txt
git commit -m 'initial commit'
git tag R1 -m R1
echo 'bye bye' > parting.txt
git add parting.txt
git commit -m 'added parting'
echo 'welcome' > greeting.txt
git add greeting.txt

Nifty not-so-Everyday Trick #2

M

Objects

Trees

M

Objects

Nifty not-so-Everyday Trick #2

M

Objects

Blobs

zlib compressed file contents, prepended with

"blob" <decimal-size><zero-byte>

M

Objects

Prevalent Misconception #4

Even though git stores whole snapshots (rather than just diffs), it generally doesn't take a lot of space to keep the entire repository.

For example, Mozilla reduced their repository size from 12GB to 300MB when they switch from svn to git.

M

Objects

Tags

To be continued later...

M

Branches

P

Branches

Local

Pointers stored in .git/refs/heads

 

Should really be called .git/refs/local_branches

P

Branches

Local

P

Branches

Prevalent Misconception #5

mikolaj@mikolaj:~/repos/git_internal/.git/refs/heads$ ls
master

mikolaj@mikolaj:~/repos/git_internal/.git/refs/heads$ file master
master: ASCII text

mikolaj@mikolaj:~/repos/git_internal/.git/refs/heads$ cat master
95474fce125caefa931e066f702bccf2821b3fbd

P

Branches

Prevalent Misconception #5

Branches don't really contain commits (that's not Mercurial/SVN/...).

 

They just point to a commit specified by its SHA-1.

 

Since commits have their parent(s), those parents have their parents etc., for each branch we can find a set of commits reachable from the commit it points to.

P

Branches

HEAD

P

Pointer to the current commit stored in .git/HEAD

Could be either other branch name or just commit SHA (detached HEAD)

$ cat .git/HEAD
ref: refs/heads/refactor/python-3

$ git checkout HEAD~1
Note: checking out 'HEAD~1'.

You are in 'detached HEAD' state. You can look around, ...
.....

$ cat .git/HEAD
d9229b06110638b0cc9c3dd143324d32c51229f8

Branches

Remote

Pointers stored in .git/refs/remotes

 

Should really be called .git/refs/remote_branches

 

The word remote typically denotes a remote repository, not a remote branch.

P

Branches

Prevalent Misconception #6

Remote branches (.git/refs/remotes) don't strictly reflect the current state of remote repository.

 

They simply store the state as of the latest fetch/pull.

 

Of course this still can be up to date if nothing has been modified (e.g. pushed) in the remote repository in the meantime.

P

Branches

Nifty Everyday Trick #3

Remove the remote branches from local repo that no longer exist in the remote repo!

Nothing is removed from remote repo itself.

$ git fetch --prune
From bitbucket.org:your-org/your-repo
 - [deleted]             (none)     -> origin/your-old-branch
 - [deleted]             (none)     -> origin/your-other-old-branch
 - [deleted]             (none)     -> origin/someone-elses-branch-you-only-checked-out-to-do-review

P

Tags

M

Tags

Tags are fixed pointers, while branches are moving pointers.

 

Lightweight tags are stored as references

 

.git/refs/tags/<tag_name> file's content is specific commit hash

Lightweight

M

Tags

Annotated tags are stored as objects (zlib-compressed, similar to commits) in .git/objects

 

Pointers to the annotated tag objects are stored within .git/refs/tags directory (as with lightweight).

 

Annotated

M

Reflogs

M

Reflogs

git log is an acyclic graph of commits, traversable by parent references.

 

git reflog is a list of every commit ever that the given reference (branch or HEAD) was pointing to.

 

The reflogs are stored in .git/logs and are mostly used after git reset or git checkout goes wrong.

M

Reflogs

mikolaj@pop-os:~/git-prez/repo1$ echo "Milk" >> shopping.txt
mikolaj@pop-os:~/git-prez/repo1$ echo "Bananas" >> shopping.txt 
mikolaj@pop-os:~/git-prez/repo1$ git add .
mikolaj@pop-os:~/git-prez/repo1$ git commit -m "Initial shopping list"
[master (root-commit) 888bef1] Initial shopping list
 1 file changed, 2 insertions(+)
 create mode 100644 shopping.txt
mikolaj@pop-os:~/git-prez/repo1$ git log
commit 888bef1b83b1990a7039e0f0c20e7f82cf946637 (HEAD -> master)
Author: Mikolaj Karebski <mikolaj.karebski@tesco.com>
Date:   Wed Jan 30 14:47:31 2019 +0100

    Initial shopping list
mikolaj@pop-os:~/git-prez/repo1$ git reflog ######### shorthand for: git reflog HEAD
888bef1 (HEAD -> master) HEAD@{0}: commit (initial): Initial shopping list

M

Reflogs

mikolaj@pop-os:~/git-prez/repo1$ echo "Oranges" >> shopping.txt 
mikolaj@pop-os:~/git-prez/repo1$ echo "Chocolate" >> shopping.txt 
mikolaj@pop-os:~/git-prez/repo1$ git add .
mikolaj@pop-os:~/git-prez/repo1$ git commit -m "Add Oranges & chocolate to shopping list"
[master 276d1b0] Add Oranges & chocolate to shopping list
 1 file changed, 2 insertions(+)
mikolaj@pop-os:~/git-prez/repo1$ git log
commit 276d1b037e7287ba7448c968b9763afa4e3654cf (HEAD -> master)
Author: Mikolaj Karebski <mikolaj.karebski@tesco.com>
Date:   Wed Jan 30 14:49:49 2019 +0100

    Add Oranges & chocolate to shopping list

commit 888bef1b83b1990a7039e0f0c20e7f82cf946637
Author: Mikolaj Karebski <mikolaj.karebski@tesco.com>
Date:   Wed Jan 30 14:47:31 2019 +0100

    Initial shopping list
mikolaj@pop-os:~/git-prez/repo1$ git reflog
276d1b0 (HEAD -> master) HEAD@{0}: commit: Add Oranges & chocolate to shopping list
888bef1 HEAD@{1}: commit (initial): Initial shopping list

M

Reflogs

mikolaj@pop-os:~/git-prez/repo1$ git rebase -i --root  ######## <-- squash
[detached HEAD 5cbaa64] Initial shopping list
 Date: Wed Jan 30 14:47:31 2019 +0100
 1 file changed, 4 insertions(+)
 create mode 100644 shopping.txt
Successfully rebased and updated refs/heads/master.
mikolaj@pop-os:~/git-prez/repo1$ git log
commit 5cbaa64b345631d02b7166261bcd9bb061ccd8b2 (HEAD -> master)
Author: Mikolaj Karebski <mikolaj.karebski@tesco.com>
Date:   Wed Jan 30 14:47:31 2019 +0100

    Initial shopping list
    
    Add Oranges & chocolate to shopping list
mikolaj@pop-os:~/git-prez/repo1$ git reflog
5cbaa64 (HEAD -> master) HEAD@{0}: rebase -i (finish): returning to refs/heads/master
5cbaa64 (HEAD -> master) HEAD@{1}: rebase -i (squash): Initial shopping list
255e373 HEAD@{2}: rebase -i (pick): Initial shopping list
251a141 HEAD@{3}: rebase -i (pick): Initial shopping list
fd86ee4 HEAD@{4}: rebase -i (start): checkout fd86ee436ed1d3b655d4edb62239afe9f77f66a7
276d1b0 HEAD@{5}: commit: Add Oranges & chocolate to shopping list
888bef1 HEAD@{6}: commit (initial): Initial shopping list

M

Reflogs

mikolaj@pop-os:~/git-prez/repo1$ git reset --hard 276d1b0
HEAD is now at 276d1b0 Add Oranges & chocolate to shopping list
mikolaj@pop-os:~/git-prez/repo1$ git log
commit 276d1b037e7287ba7448c968b9763afa4e3654cf (HEAD -> master)
Author: Mikolaj Karebski <mikolaj.karebski@tesco.com>
Date:   Wed Jan 30 14:49:49 2019 +0100

    Add Oranges & chocolate to shopping list

commit 888bef1b83b1990a7039e0f0c20e7f82cf946637
Author: Mikolaj Karebski <mikolaj.karebski@tesco.com>
Date:   Wed Jan 30 14:47:31 2019 +0100

    Initial shopping list
mikolaj@pop-os:~/git-prez/repo1$ cat shopping.txt 
Milk
Bananas
Oranges
Chocolate

M

Reflogs

mikolaj@pop-os:~/git-prez/repo1$ git reflog
276d1b0 (HEAD -> master) HEAD@{0}: reset: moving to 276d1b0
5cbaa64 HEAD@{1}: rebase -i (finish): returning to refs/heads/master
5cbaa64 HEAD@{2}: rebase -i (squash): Initial shopping list
255e373 HEAD@{3}: rebase -i (pick): Initial shopping list
251a141 HEAD@{4}: rebase -i (pick): Initial shopping list
fd86ee4 HEAD@{5}: rebase -i (start): checkout fd86ee436ed1d3b655d4edb62239afe9f77f66a7
276d1b0 (HEAD -> master) HEAD@{6}: commit: Add Oranges & chocolate to shopping list
888bef1 HEAD@{7}: commit (initial): Initial shopping list

M

Reflogs

Prevalent Misconception #7

Commit amend, rebase, cherry-pick etc. don't really modify any history.

 

They simply create a brand new history based on the existing one.

 

The old history will be still available via the reflogs (until they are GC'ed, which is usually in ca. 90 days).

M

Garbage collection

M

Garbage collection

git gc removes loose objects - the objects which are not reachable from any branch (or any reflog).

 

GC also compresses old blobs into packfiles (where they can be also stored as deltas, not only as snapshots!) and expires old reflog entries.

M

Questions

M

git machete

P

git machete

P

P

Made with Slides.com