The runN program

Nearly a year ago, Mark Jason Dominus blogged about runN, a program he’d written. I’ve stol– uh, I mean, adapted his version for my own ends, with a couple of differences. This is a rationale for my changes.

Parallel execution

The idea of runN is that you give it a command and a list of arguments, and it runs the command on each of the arguments. In particular, it runs the commands in parallel, up to the maximum number of jobs set by a -n option. (Actually, my version uses -j instead, by analogy with the equivalent feature in Make, but that’s not an interesting difference.) MJD comments:

In the original implementation, the -n is mandatory, because I couldn’t immediately think of a reasonable default. The only obvious choice is 1, but since the point of the program was to run programs concurrently, 1 is not reasonable. But it occurs to me now that if I let -n default to 1, then this command would replace many of my current invocations of:

for i in ...; do cmd $i; done

which I do quite a lot. Typing runN cmd ... would be a lot quicker and easier.

My version of runN does indeed let -n default to 1, and I find that the overwhelming majority of my uses of it are of that sort — simplifying a for loop with a command that can only take a single argument.

In fact, I haven’t found myself wanting to use the “only n at a time” functionality at all. And on the few occasions when I’ve wanted the jobs to be run in parallel, I’ve known that there are sufficiently few arguments that running all of them in parallel is the right thing. I’m considering adding a feature to do that.

One possibility for the UI for a parallel-everything mode would be to extend the meaning of -n so that asking for 0 parallel jobs (which would be otherwise meaningless) runs all jobs in parallel. But it occurs to me that, given I’m not really using -n at all, I might be better off having runN default to serial execution, with a single boolean option to enable running everything in parallel.

Argument splitting

A perhaps-surprising feature of MJD’s runN is that each argument is split on whitespace before being passed to the command. His motivation for this was to allow things like this:

runN -n 1 -c ls foo bar '-l baz'

to mean the same as this:

ls foo
ls bar
ls -l baz

But, as he points out, that means you can’t use runN for arguments that contain spaces. He muses about perhaps having runN parse quoting constructs in the arguments, so that this:

runN -n 1 -c ls foo bar '"file with spaces"'

would do the same as this:

ls foo
ls bar
ls "file with spaces"

He explains his decision not to do that by saying:

I wasn’t sure I’d actually need it — only time will tell — and […] shell parsing is very complicated and error-prone

I fully agree with his reasoning there. In fact, I’d go further: the word-splitting feature itself is in the “you probably don’t need this” category. It does sound handy — run some command on each of these, except use this option for this one. But perhaps not handy enough: I omitted that feature from my version, and it doesn’t seem to be something I’ve felt the lack of. By contrast, the ability to use whitespace-containing arguments without doing anything special is something I value much more highly.

I’ve considered adding an option to enable splitting arguments on whitespace, but I haven’t felt the need for it. I think that’s more evidence that word splitting probably isn’t worth it.

Command options

The most significant point of difference between my current version and MJD’s original is in the handling of the command to run. MJD’s version has you say things like

runN -c ls foo bar

to run ls on each of foo and bar. I think there are two flaws with that design. One is simply that it’s irritating to have a required option: since there’s always one command, why not just take it as the first non-option argument, and avoid having to type the -c on each invocation?

The other, more substantive flaw is that you can’t specify any additional arguments to be used in all the command executions. For example, to unpack a tarball, you need to use a command like this:

tar -xzf foo.tar.gz

The -xzf options are required, and tar accepts only one tarball per invocation. But that means that MJD’s runN can’t straightforwardly be used to unpack several tarballs — there’s nowhere to put the -xzf.

My approach for dealing with this is to tweak the behaviour of the conventional double-hyphen (--) argument. Normally, the double hyphen is used to signal the end of the options to a command, so that you can have non-option arguments that happen to begin with a hyphen. In my version of runN, the double hyphen signals the end of the ‘fixed’ arguments to the command; everything after it is one of the arguments to iterate over. So unpacking several tarballs can be done like this:

my-runN tar -xzf -- *.tar.gz

However, to simplify the common case, if no double hyphen is found in the arguments, then just the first argument is used for the command name. So you can unpack several ZIP archives like this:

my-runN unzip *.zip

If you prefer, you can use the long form anyway:

my-runN unzip -- *.zip

This approach is still imperfect, though, because it’s now impossible to pass a real double hyphen to the command executed. I don’t have a good answer to that at the moment, save to point out that it’s very hard to write general-purpose wrapper programs that know nothing about the program being wrapped.

That’s pretty much it. You’re welcome to download and use my version, and I’d welcome any comments on my changes. I’d also like to express my gratitude to Mark for his original runN; I’ve been finding it very useful.

Update: Thanks to Mark James and Tim Massingham for independently pointing out a bug in my version of runN (not in MJD’s version) that ‘lost’ some arguments when running jobs in parallel. That bug is now fixed.