Question

What does "Filtering content" mean when doing a git clone?

I cloned a git repo and noticed a status line Filtering content which was very slow. This doesn't usually appear. What is it?

remote: Enumerating objects: 30, done.
remote: Counting objects: 100% (30/30), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 16592 (delta 6), reused 9 (delta 4), pack-reused 16562
Receiving objects: 100% (16592/16592), 14.14 MiB | 1.01 MiB/s, done.
Resolving deltas: 100% (7529/7529), done.
Checking out files: 100% (11475/11475), done.
Filtering content:   6% (115/1729), 390.32 MiB | 1.12 MiB/s
 46  23704  46
1 Jan 1970

Solution

 59

In git you can define "filters" that affect the process of moving files from the index to the work tree ("smudge" filters) and from the work tree to the index ("clean" filters). Typically you'll find a .gitattribute file that associates the filters with files at specific paths.

It used to be that this was always handled file by file during checkout or add operations. It can be more efficient to handle all of the "smudge' filters for a checkout in a more batched manner, and git added support for that relatively recently.

The use case that (I believe) drove that addition is called LFS. With LFS, large content is stored in a secondary repo, with small placeholders ("pointer files") replacing them in the core repo. The "smudge" filter downloads the real content and puts it in place of the pointer file. This is most likely what your repo is doing, and it can be a lengthy process.

In general, though, the 'filtering' status line just means that a batch of smudge filters is being run on the checked-out cotent.

2018-11-15