AaronCrane.co.uk

Tracked-but-uncommitted files with Git

Something I find awkward about Git is that it doesn’t seem to deal with the concept of a tracked but uncommitted file — that is, the situation you’d get into with CVS after running cvs add on a new file, but before committing that file to the central repository.

The difference between cvs add and git add is that, where CVS adds the file to the set of tracked files whose changes can be committed, Git adds the file’s current contents to the index of changes that are pending being committed.

I’m pretty sure that Git doesn’t have the ability to do the CVS-like thing; my understanding of the internals suggests strongly that the ability to do it can’t be directly supported, and even if I’m wrong, I can’t find anything that would suggest how you’d go about doing it.

I stipulate that my desire to do this may well be influenced by my experience with CVS, but it’s nonetheless something I believe is useful. With off-the-shelf Git tools, your options are:

  1. Don’t do the git add until you’re almost ready to commit. The problem with that is that when you run git diff, you don’t see the addition of the file you haven’t yet added.

  2. Run the git add while you’re still working on your change. Now git diff shows you not the whole diff between the index and your working tree, but the diff excluding your new file. Then you hack a bit further, and now git diff shows you some changes to your new file, but not the whole thing.

Neither of those is what I want.

I have a workaround for this.

First, the user interface: when I run a command git track new-file.c, that should tell Git to track new-file.c for changes, but without adding its data to the index.

Second, the sneaky trick. It can’t quite be done, but you can get very close. Given that my problem with using git add is what it does to your diffs, the technique is merely to add the empty file to the index under the new name. The new file is now in the index, so git diff will report on it. But the version in the index has no data, so git diff will always report every line in the file as an addition. That’s good enough for me.

This trick doesn’t quite do the right thing; specifically, we’ve now put data into the index that we know is wrong, and that has a reasonable chance of coming back to bite us in the future. In practice, it’s working for me, in the ways I’ve been using Git, but it seems worth pointing out the issue.

Third, the implementation. This is version is in Bash; it’s very simple, but adequate for most purposes. We need the Git hash of an empty file. In principle, it would be trivial to just embed the relevant 40-byte hex string into the program as a constant. But if your history contains no empty files, then you’d be liable to get spurious errors later on, saying

unable to find e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

So instead, call git hash-object and ask it to write an empty file to the object database:

sha1=$(git hash-object -w --stdin < /dev/null)

With that in hand, simply loop over all the filenames passed as arguments, massage them appropriately, and send them to git update-index:

for file; do
    mode=$(if [ -x "$file" ]; then echo 755; else echo 644; fi)
    echo -e "100$mode $sha1\t$file"
done |
git update-index --index-info

Fourth, integration with the rest of Git. Dropping that script into your ~/bin as git-track allows you to run git-track new-file.c. That’s almost perfect, but note that the command name contains a hyphen; we want the two-word version git track instead. Fortunately, that’s very easy to do: just add this to your ~/.gitconfig:

[alias]
    track = !git-track

Assuming you have a git-track in your $PATH, you can now do git track, just as if this new program shipped with Git.

Finally, an improved implementation. It would be nice for git track to accept directory names in the same way as git add. That’s not too hard — just get git ls-files to do the heavy lifting:

git ls-files -o --exclude-per-directory=.gitignore \
     --no-empty-directory "$@"

The only problem you run into is that, if you want to handle file names containing spaces and/or newlines, you have to jump through all the usual shell hoops to avoid accidental word splitting. My normal approach in such situations is to use a real programming language. So here’s a simple Perl implementation of git track which takes both files and directories as command-line arguments.

Share and enjoy!