Nearly a year ago, Mark Jason Dominus blogged about
runN, a program
he’d written. I’ve stol– uh, I mean, adapted his version for my own ends,
with a couple of differences. This is a rationale for my changes.
The idea of
runN is that you give it a command and a list of arguments, and
it runs the command on each of the arguments. In particular, it runs the
commands in parallel, up to the maximum number of jobs set by a
(Actually, my version uses
-j instead, by analogy with the equivalent
feature in Make, but that’s not an interesting difference.) MJD comments:
In the original implementation, the
-nis mandatory, because I couldn’t immediately think of a reasonable default. The only obvious choice is 1, but since the point of the program was to run programs concurrently, 1 is not reasonable. But it occurs to me now that if I let
-ndefault to 1, then this command would replace many of my current invocations of:
for i in ...; do cmd $i; done
which I do quite a lot. Typing
runN cmd ...would be a lot quicker and easier.
My version of
runN does indeed let
-n default to 1, and I find that the
overwhelming majority of my uses of it are of that sort — simplifying a
for loop with a command that can only take a single argument.
In fact, I haven’t found myself wanting to use the “only n at a time” functionality at all. And on the few occasions when I’ve wanted the jobs to be run in parallel, I’ve known that there are sufficiently few arguments that running all of them in parallel is the right thing. I’m considering adding a feature to do that.
One possibility for the UI for a parallel-everything mode would be to extend
the meaning of
-n so that asking for 0 parallel jobs (which would be
otherwise meaningless) runs all jobs in parallel. But it occurs to me that,
given I’m not really using
-n at all, I might be better off having
default to serial execution, with a single boolean option to enable running
everything in parallel.
A perhaps-surprising feature of MJD’s
runN is that each argument is split
on whitespace before being passed to the command. His motivation for this
was to allow things like this:
runN -n 1 -c ls foo bar '-l baz'
to mean the same as this:
ls foo ls bar ls -l baz
But, as he points out, that means you can’t use
runN for arguments that
contain spaces. He muses about perhaps having
runN parse quoting
constructs in the arguments, so that this:
runN -n 1 -c ls foo bar '"file with spaces"'
would do the same as this:
ls foo ls bar ls "file with spaces"
He explains his decision not to do that by saying:
I wasn’t sure I’d actually need it — only time will tell — and […] shell parsing is very complicated and error-prone
I fully agree with his reasoning there. In fact, I’d go further: the word-splitting feature itself is in the “you probably don’t need this” category. It does sound handy — run some command on each of these, except use this option for this one. But perhaps not handy enough: I omitted that feature from my version, and it doesn’t seem to be something I’ve felt the lack of. By contrast, the ability to use whitespace-containing arguments without doing anything special is something I value much more highly.
I’ve considered adding an option to enable splitting arguments on whitespace, but I haven’t felt the need for it. I think that’s more evidence that word splitting probably isn’t worth it.
The most significant point of difference between my current version and MJD’s original is in the handling of the command to run. MJD’s version has you say things like
runN -c ls foo bar
ls on each of
bar. I think there are two flaws with that
design. One is simply that it’s irritating to have a required option: since
there’s always one command, why not just take it as the first non-option
argument, and avoid having to type the
-c on each invocation?
The other, more substantive flaw is that you can’t specify any additional arguments to be used in all the command executions. For example, to unpack a tarball, you need to use a command like this:
tar -xzf foo.tar.gz
-xzf options are required, and
tar accepts only one tarball per
invocation. But that means that MJD’s
runN can’t straightforwardly be
used to unpack several tarballs — there’s nowhere to put the
My approach for dealing with this is to tweak the behaviour of the
conventional double-hyphen (
--) argument. Normally, the double hyphen
is used to signal the end of the options to a command, so that you can
have non-option arguments that happen to begin with a hyphen. In my
runN, the double hyphen signals the end of the ‘fixed’
arguments to the command; everything after it is one of the arguments to
iterate over. So unpacking several tarballs can be done like this:
my-runN tar -xzf -- *.tar.gz
However, to simplify the common case, if no double hyphen is found in the arguments, then just the first argument is used for the command name. So you can unpack several ZIP archives like this:
my-runN unzip *.zip
If you prefer, you can use the long form anyway:
my-runN unzip -- *.zip
This approach is still imperfect, though, because it’s now impossible to pass a real double hyphen to the command executed. I don’t have a good answer to that at the moment, save to point out that it’s very hard to write general-purpose wrapper programs that know nothing about the program being wrapped.
That’s pretty much it. You’re welcome to download and use my version,
and I’d welcome any comments on my changes. I’d also like to express my
gratitude to Mark for his original
runN; I’ve been finding it very useful.
Update: Thanks to Mark James and Tim Massingham for independently
pointing out a bug in my version of
runN (not in MJD’s version) that
‘lost’ some arguments when running jobs in parallel. That bug is now