Options for Git clone of a big repository

Cloning the Linux Kernel repository takes time. We don’t need every commit ever for our work. But we do need multiple branches. Here are some numbers for how long it takes to do various operations.

Bottom line up front: we want the blobless clone for our use case:

time git clone --filter=blob:none git@gitlab.com:AmpereComputing/linux/linux.git

Here’s how I came to that decision. First, the full clone

time git clone git@gitlab.com:AmpereComputing/linux/linux.git
Cloning into ‘linux’…

Updating files: 100% (81108/81108), done.

real 7m56.985s
user 22m41.394s
sys 6m26.205s

Now the shallow clone. For us, this is actually the wrong branch, but there should be significant overlap.

time git clone --depth=1     git@gitlab.com:AmpereComputing/linux/linux.git
Cloning into 'linux'...
...
Updating files: 100% (81108/81108), done.

real	0m55.075s
user	0m27.710s
sys	0m7.393s

Now a blobless clone

time git clone –filter=blob:none git@gitlab.com:AmpereComputing/linux/linux.git
Cloning into ‘linux’…

Updating files: 100% (81108/81108), done.

real 3m43.477s
user 5m37.896s
sys 2m35.509s

Now a treeless clone

time git clone –filter=tree:0 git@gitlab.com:AmpereComputing/linux/linux.git
Cloning into ‘linux’…

Updating files: 100% (81108/81108), done.

real 1m53.469s
user 1m5.809s
sys 1m1.048s

Combining treeless and shallow?

time git clone  --depth=1   --filter=tree:0   git@gitlab.com:AmpereComputing/linux/linux.git
Cloning into 'linux'...
...
Updating files: 100% (81108/81108), done.

real	1m11.402s
user	0m31.235s
sys	0m6.813s

What about a shallow clone since a certain tag:

time git clone   --shallow-exclude=v6.10   git@gitlab.com:AmpereComputing/linux/linux.git
Cloning into 'linux'...
...
Updating files: 100% (81108/81108), done.

real	0m56.481s
user	0m27.536s
sys	0m7.306s

We have a long build process that makes use of extensive cherry picking of branches. It starts with a git clone of the Linux Kernel, and I want to see the timing differences using the different cloning options.

Because the shallow clone does not include the tree information, we cannot use it to do the cherry-picking, and so I will restrict my testing to the full clone, blob-less, and tree-less variants.

Full clone:

real??61m17.366s
user??507m48.236s
sys??171m28.766s

Tree-less

real??69m8.975s
user??492m34.034s
sys??157m14.231s

Treeless was actually slower. Here is blob-less

real??52m9.143s
user??486m23.953s
sys??163m25.927s

9 Minutes faster. This makes sense: once it has the base tree synchronized, the only additional blobs it needs to sync are the ones specific to each of the topic branches. This limits the additional communication to one stream of blobs per topic branch. It does not need to synchronize the older blobs, which are what I assume was the additional cost of the full clone, but it already has the tree information it needs to perform the cherry-pick meta-data operations.

For our purposes, we are going to go with the blob-less option.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.