Published at 22:36, Sun 27 Apr 2008
Something I find awkward about Git is that it doesn’t seem to deal
with the concept of a tracked but uncommitted file — that is, the situation
you’d get into with CVS after running cvs add on a new file, but
before committing that file to the central repository.
The difference between cvs add and git add is that, where CVS adds the
file to the set of tracked files whose changes can be committed, Git adds
the file’s current contents to the index of changes that are pending being
committed.
I’m pretty sure that Git doesn’t have the ability to do the CVS-like thing; my understanding of the internals suggests strongly that the ability to do it can’t be directly supported, and even if I’m wrong, I can’t find anything that would suggest how you’d go about doing it.
I stipulate that my desire to do this may well be influenced by my experience with CVS, but it’s nonetheless something I believe is useful. With off-the-shelf Git tools, your options are:
Don’t do the git add until you’re almost ready to commit. The problem
with that is that when you run git diff, you don’t see the addition of
the file you haven’t yet added.
Run the git add while you’re still working on your change. Now git
diff shows you not the whole diff between the index and your working
tree, but the diff excluding your new file. Then you hack a bit
further, and now git diff shows you some changes to your new file, but
not the whole thing.
Neither of those is what I want.
I have a workaround for this.
First, the user interface: when I run a command git track new-file.c, that
should tell Git to track new-file.c for changes, but without adding its
data to the index.
Second, the sneaky trick. It can’t quite be done, but you can get very
close. Given that my problem with using git add is what it does to your
diffs, the technique is merely to add the empty file to the index under
the new name. The new file is now in the index, so git diff will report
on it. But the version in the index has no data, so git diff will always
report every line in the file as an addition. That’s good enough for me.
This trick doesn’t quite do the right thing; specifically, we’ve now put data into the index that we know is wrong, and that has a reasonable chance of coming back to bite us in the future. In practice, it’s working for me, in the ways I’ve been using Git, but it seems worth pointing out the issue.
Third, the implementation. This is version is in Bash; it’s very simple, but adequate for most purposes. We need the Git hash of an empty file. In principle, it would be trivial to just embed the relevant 40-byte hex string into the program as a constant. But if your history contains no empty files, then you’d be liable to get spurious errors later on, saying
unable to find e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
So instead, call git hash-object and ask it to write an empty file to the
object database:
sha1=$(git hash-object -w --stdin < /dev/null)
With that in hand, simply loop over all the filenames passed as arguments,
massage them appropriately, and send them to git update-index:
for file; do
mode=$(if [ -x "$file" ]; then echo 755; else echo 644; fi)
echo -e "100$mode $sha1\t$file"
done |
git update-index --index-info
Fourth, integration with the rest of Git. Dropping that script into your
~/bin as git-track allows you to run git-track new-file.c. That’s
almost perfect, but note that the command name contains a hyphen; we want
the two-word version git track instead. Fortunately, that’s very easy to
do: just add this to your ~/.gitconfig:
[alias]
track = !git-track
Assuming you have a git-track in your $PATH, you can now do git track,
just as if this new program shipped with Git.
Finally, an improved implementation. It would be nice for git track to
accept directory names in the same way as git add. That’s not too hard —
just get git ls-files to do the heavy lifting:
git ls-files -o --exclude-per-directory=.gitignore \
--no-empty-directory "$@"
The only problem you run into is that, if you want to handle file names
containing spaces and/or newlines, you have to jump through all the usual
shell hoops to avoid accidental word splitting. My normal approach in
such situations is to use a real programming language. So here’s a
simple Perl implementation of git track which takes both
files and directories as command-line arguments.
Share and enjoy!