What I learned from running git.kernel.org
by Konstantin Ryabitsev
Running git.kernel.org
- 300+ forks of the same repo
- Replicating to 3 geo-distributed servers
- Tweaking git-daemon to play nicely
- RAM, disk and processors
- Repack flags
- Repo and bundles
300 forks of
git/torvalds/linux.git
Each linux.git is about 1.8GB
Forks are efficient... sorta
- git clone --local
- uses hardlinks
- safe to use everywhere
- saves space until next repack
- git clone --shared
- sets up objects/info/alternates
- can result in repo corruption
- saves lots of space
git.kernel.org is 30GB
without alternates, 400GB
Using alternates carefully
- Avoid object cleanups in the "mother" repository
- "git repack -Ad" leaves loose objects intact
- Do not run "git gc", just "git repack"
- Object cleanups in "daughter" repositories are OK
- "git gc" is okay and encouraged
- "git repack -adl" to save space
- Grandchildren are a recipe for disaster
Grokmirror
replicating git repos sanely
(it can be done!)
Grokmirror highlights
- Works via git hooks
- Creates a manifest file that replicas can pull
- Updates only repos that changed
- Runs "git remote update" for changes
- Runs "git clone" for new repos
- Prunes junk
- Parallelizing and very efficient
- Will even fsck and repack for you
github.com/mricon/grokmirror
Still adding features
- Automatically recognize nearly-identical repos
- And set up alternates when warranted
- Recognize when running "git gc" is safe
- Add "--dissociate" clone support (new in git 2.3)
- Support both python 2.7 and 3.5
- Other features upon request
git-daemon tweaks
Because cloning linux.git eats up 1GB RAM
Hardware
- RAM, lots and lots of RAM
- Git-daemon will eat all of it, and then some
- Use haproxy in the front to combat abuse
- Don't use http caches, as they will likely break repos after gc/repack
- Use fast disks with good seek times
- Active repos will quickly create lots of loose objects all over the disk
- Repacking (deltas and compression) eats processing power
Useful repack flags
-
-b --pack-kept-objects are your best friends
- creates a bitmap index
- cuts down on "counting objects" time
- Git 2.0+ only
- -f in limited cases (we don't use it)
- Will not repack refs (tags, etc)
- Use a separate "git pack-refs" command
Repo and bundles
Android-specific
Repo
- A tool to keep track of hundreds of repos, using a manfiest (that is a git repo itself)
- Not for the same purposes as grokmirror (dev-oriented, not mirroring-oriented)
- Parallelizes clones and updates
- (resulting in abuse of your servers)
Repo bundles
- Neat feature that can use lookaside "git bundle" packs to offload most traffic to http cache
- Can be placed on akamai or other accelerators
- Only works with http:// clone URLs
- Can save you tons of expensive bandwidth if your project uses on repo at all
- See "git-bundle(1)" and Google's "repo" tool
Thank you!
Questions?
What I learned from running git.kernel.org
By Konstantin Ryabitsev
What I learned from running git.kernel.org
Hosting popular git repositories
- 8,609