AaronCrane.co.uk

Unix for Perl programmers: pipes and processes

This paper accompanies a talk I gave at YAPC::EU 2009.

One of Perl’s well-known strengths is that it hides unnecessary complexity by making easy things easy. In the context of Unix programming, that means that, for example, capturing the output from a shell command is trivial:

my $ps_output = `ps`;

That runs the ps command, capturing the whole of its output, and stores that output into the variable $ps_output. Wonderfully simple.

However, Perl’s standard facilities aren’t always quite enough. For example, what if you want to run multiple processes in parallel? What if you need to both pipe input to a process and capture its output?

Fortunately, Perl also aims to make hard things possible: it gives you unfettered access to Unix’s low-level APIs. So as long as you know enough Unix to accomplish your goals, Perl will help you out.

And that’s where this paper comes in. It covers all the icky low-level bits of pipes, processes, and signals, from the point of view of a Perl programmer, so that if you ever need to do these things, you understand how they can be accomplished.

CPAN

One of the canonical virtues of a good programmer is laziness; and for Perl in particular, an important part of laziness is knowing how (and when) to reuse other people’s CPAN code. It shouldn’t be a surprise that CPAN already contains implementations of many of the techniques presented here. But since my goal is to promote understanding of how Unix really works under the hood, I take the unusual step of ignoring all CPAN modules.

How do I run a program?

We begin with a simple question, but one whose answers form the foundation of an understanding of the way processes work in Unix: how do I run a program, and wait for it to finish?

Pleasingly, Perl has a simple, built-in way to answer that question, at least in its simple forms. If you’re trying to run a program called update_web_server, you can just use that program name as an argument to Perl’s system builtin:

system 'update_web_server';

But this immediately presents a new question: how do you know if that succeeded? In Unix, every process exits with an exit status, a small integer which is made available to the process that invoked it. By deeply-ingrained convention, if a process exits with status 0, that means it successfully did everything it tried to; correspondingly, any non-zero status indicates failure.

(Many people find it surprising that zero means success and non-zero means failure, given that so many programming languages equate zero with falsity and non-zero with truth. The explanation is that, for most programs, there’s only one way to succeed, but many ways to fail. And a few programs take advantage of that; for example, grep(1) exits with 0 when it found a match, 1 when it didn’t, and 2 when it encountered an error, such as failure to open a file.)

Unsurprisingly, Perl offers a simple way of accessing the exit status of the commands you’ve run: the $? variable contains the exit status of the most recent command executed. (And, for that matter, system itself returns the same value.) However, $? isn’t simply the exit status itself: if the process terminated because the OS sent it an uncaught signal, that signal is also encoded in $?. The full details are found in perlvar; for now, it’s enough to note that $? is zero precisely when the process exited successfully.

The zero-for-success convention means that system can be awkward to use, if you need to detect failure. A simple way of doing that is as follows:

system('update_web_server') == 0
    or die "Failed to update web servers\n";

The next question is how to pass arguments to the program being run. In some cases, the most obvious approach is the best — just insert the arguments into your system call:

system 'update_web_server --all';

This can be problematic in more complicated situations, however. When system is given just a single string, it treats that string as a command to be executed by /bin/sh, the operating system shell. But suppose part of the string comes from untrusted user input:

my $host = $ARGV[0];
system "update_web_server --host=$host";

That looks straightforward enough, but what if $ARGV[0] is x; rm -rf /? Since single-argument system uses the shell to run your command, the shell will first run update_web_server with an argument of --host=x, then move on to running rm -rf /. Oops.

Fortunately, Perl helps with this, too: if you give system a list of arguments, rather than a single command string, it treats the list as a program name and its arguments, and runs that program directly, without involving a shell:

system 'update_web_server', "--host=$host";

How do I run a program in the background?

Suppose you need to run a program that you expect to take a long time, but carry on with your own work in the meantime. One option is to rely on the shell’s facilities for doing that — appending an ampersand to a simple command runs it in the background:

system 'update_web_server --all &';

However, as described in the previous section, this is risky if the command being run includes untrusted data. It is possible to handle that by applying the shell’s quoting rules to such data before embedding them into a command. But there is an alternative approach, and one which finally uses a low-level Unix facility: forking.

From a low-level point of view, forking is the only way to create a new process in Unix. The interesting thing about forking is that it doesn’t involve launching an executable program. On the contrary, every process is created as a near-identical clone — a fork — of its parent process.

The most obvious difference between the parent and the child is simply the process ID: the child is allocated a freshly-minted process ID by the kernel. In almost all other respects, they are identical — and in particular, they continue to run the same code. But they’re nonetheless independent, in the sense that either can freely make changes to their own variables and other data, without affecting the other.

(The parent/child terminology for the creator and created process is commonplace among Unix hackers, and is also enshrined in some technical details like the gloss for the getppid(2) system call. Terms like grandparent, grandchild, and sibling also see some use, with the obvious meanings.)

How does forking look from Perl code? First, the parent process calls the fork builtin, which is a thin wrapper round the fork(2) system call:

my $pid = fork;

In the parent process, the return value of fork is the process ID of the new child (or undef if the fork failed for some reason). However, in the freshly-created child process, the code continues in just the same way, except that the fork returned zero. So idiomatic forking code looks something like this:

my $pid = fork;
die "Failed to fork: $!\n" if !defined $pid;
if ($pid == 0) {
    # In the child process
}
else {
    # In the parent process; the new child has ID $pid
}

Or at least, code with that structure is idiomatic in C, but it’s quite a lot of boilerplate for Perl. The remainder of this paper will rely on a Perl function fork_child defined as follows:

sub fork_child {
    my ($child_process_code) = @_;

    my $pid = fork;
    die "Failed to fork: $!\n" if !defined $pid;

    return $pid if $pid != 0;

    # Now we're in the new child process
    $child_process_code->();
    exit;
}

That function takes one argument, a reference to some code that the child should run, and returns the ID of the child process that’s running it. (Or if forking failed, it just throws an exception.) The new child exits once the code reference returns.

So, to return to the question of how to run a program in the background: all we have to do now is to arrange for the child process to stop running the same code as the parent process, and instead to execute the program we want. Since the parent process is still running, that will be (just) sufficient to achieve the objective.

Perl lets you do that with its exec builtin, which behaves just like system, except that it replaces the currently-running code with the program you name. exec is the Perl interface to the execve(2) system call (or to one of its C-language friends like execv(3)). It’s easy to imagine that exec is system followed immediately by exit, and that may be a helpful mental model to begin with. But, as this paper attempts to make clear, that isn’t the whole story; for example, the process’s ID and parent process ID are not changed by exec.

Putting these bits together, running a program in the background looks roughly like this:

fork_child(sub {
    exec 'update_web_server', "--host=$host"
        or die "Failed to execute update_web_server: $!\n";
});

How do I get the exit status of a background program?

One obvious problem with that code is that it offers no way to find the exit status of your background program (once it actually exits, that is). Doing that requires waiting for background programs to exit. Perl’s built-in wait function offers a simple way of doing that:

my $pid = fork_child(sub {
    exec 'update_web_server', "--host=$host"
        or die "Failed to execute update_web_server: $!\n";
});

do_complicated_calculations();

waitpid $pid, 0;

waitpid, like the waitpid(2) system call it’s a wrapper for, pauses the current process until the child process given as its first argument has exited, and returns the process ID of that child. Then Perl makes the child’s exit status available in $?, as it does when system completes. You can also supply a process ID of −1 to wait for any remaining process.

Waiting for a child process is sometimes referred to as reaping it.

Notice that, from the point of the view of the kernel, there is very little distinction between a foreground process and a background process. The only real difference is whether the parent continues to do its own work before waiting for the child.

What happens if I don’t reap my processes?

If you start a background process and never wait for it, the kernel keeps a record of the process (and its exit status) indefinitely, because it has to assume that you might get round to waiting for it eventually. Processes that have terminated but not yet been reaped are called zombie processes. If a process still has zombie children when it exits, those processes are reparented: the kernel changes their parent to process ID 1 (init(8) on most Unix systems), one of whose jobs is to wait for such zombie processes.

In some situations, this can be problematic: the kernel must allocate resources for every zombie process. So if the parent launches many background processes without waiting for them, that can eventually lead to resource exhaustion. One approach to dealing with that is simply to always wait for your child processes. But it’s sometimes hard to find a good place in your program to do that.

Since it’s specifically child processes which become zombies, the way to avoid zombie creation is to ensure that you don’t have any child processes. Instead, you can make your fire-and-forget programs run as the grandchildren of the current process, by interposing a second fork:

sub run_and_forget {
    my ($command, @arguments) = @_;

    my $child_pid = fork_child(sub {
        # Now we're in the child process
        fork_child(sub {
            # And now we're in the grandchild
            exec $command, @arguments
                or die "Failed to execute $program: $!\n";
        });
    });

    waitpid $child_pid, 0;
}

The original process forks a child, and waits for it to exit. That child process immediately forks a grandchild, and then exits (so that the original process can reap it). Now the grandchild’s parent process no longer exists, so the kernel reparents it to process ID 1. The child process is reaped almost immediately, and init(8) will take care of reaping the grandchild (since it has no other parent). So there will be no zombie processes taking up needless resources.

How do I run multiple programs in parallel?

Given the ability to run a program in the background, running multiple programs in parallel is straightforward: all that’s required is to start them all in the background, and wait until they’ve all finished. It’s typically also a good idea to keep track of which process ID goes with which program (if only to improve error messages in the event of problems).

my %host_for_pid;
for my $host (hosts_to_update()) {
    my $pid = fork_child(sub {
        exec 'update_web_server', "--host=$host"
            or die "Failed to exec update_web_server: $!\n";
    });
    $host_for_pid{$pid} = $host;
}

while (keys %host_for_pid) {
    my $pid = waitpid -1, 0;
    warn "Failed to update host $host_for_pid{$pid}\n"
        if $? != 0;
    delete $host_for_pid{$pid};
}

How do I start a program in a different directory?

The separation between fork and exec in Unix initially strikes many people as unnecessary complexity when running a program. But it does have a significant advantage: the child process can change its own environment in arbitrary ways before it executes the desired program.

A simple example involves having the new program run with a different current directory. (In principle it’s possible to do this by having the parent process chdir to the desired directory before starting the child, and then return to its original directory afterwards; but in practice, that’s somewhat error prone, as the original directory might have been renamed in the meantime.)

my $pid = fork_child(sub {
    chdir $dir
        or die "Failed to cd to $dir: $!\n";
    exec 'tar', '-cf', $tar_file, '.'
        or die "Failed to execute tar: $!\n";
});

waitpid $pid, 0;

How do I capture the output of a program?

In most circumstances, the best way to capture a program’s output is to use Perl’s built-in features: either backticks, or the slightly more involved open-from-a-shell-command feature:

open my $ps, 'ps |' or die "Failed open from command ps: $!\n";
# Now you can read from $ps like any other filehandle

Or without using a shell to invoke the command, for safety when an argument contains untrusted data:

open my $ps, '-|', 'ps', $pid or die "Failed open from command ps: $!\n";

However, some related tasks require an understanding of precisely what happens when you do this, so it’s worth examining a reimplementation of this obvious facility.

Capturing output uses a pipe: a unidirectional channel for simple stream-oriented inter-process communication. The pipe is maintained inside the kernel, and is exposed to applications as a pair of file descriptors. (Unix file descriptors are small integers. They’re the low-level equivalent of Perl’s filehandles; indeed, a Perl filehandle can be regarded as a wrapper round a file descriptor, with optional buffering.) One of the file descriptors is the writing end of the pipe, and the other is the reading end; anything written to the writing end will appear on the reading end.

The ability to create a pipe is combined with an extremely important feature of forking that I glossed over earlier: a newly forked process inherits all the open files of its parent. So capturing process output involves the following steps:

  1. The parent process creates a pipe, and gets a pair of new file descriptors (wrapped in file handles, when you’re using Perl)
  2. Then it forks a child process, which inherits the pipe’s file descriptors
  3. The child closes the reading end of the pipe (because it only needs to write data)
  4. The child sends data to the writing end of the pipe, perhaps after executing a new program
  5. The parent closes the writing end of the pipe (because it only needs to read data)
  6. The parent reads from the reading end, blocking until the child’s data is available
  7. When the reading end produces an end-of-file, the parent reaps the child process

There’s one subtlety not reflected in that outline, though. The Unix convention is that programs default to writing to their standard output, file descriptor 1. But it’s overwhelmingly unlikely that the pipe just happens to be created with its writing end as file descriptor 1 (because that would mean that neither standard input nor standard output were open at the point the pipe was created).

So we also need a new step 3a, in which the child must arrange for file descriptor 1 to be connected to the writable end of the pipe. That can most easily be done with the dup2(2) system call, which arranges for a file descriptor of your choice to be a clone of some other file descriptor. Once the child has duplicated the writable end of the pipe as standard output, it can then close the original file descriptor (for tidiness, especially in the common case where it then executes some other program).

Unfortunately, Perl doesn’t provide direct access to dup2, so you have to import it from the standard POSIX module instead. (There is a dup system call which does something very similar, but it’s somewhat harder to use correctly for these purposes. There’s no Perl dup builtin, either, but you can get to the functionality with open.) The first argument to dup2 is the file descriptor number you want to clone (which can be found with the fileno builtin), and the second argument is the file descriptor number you want it to be cloned as.

Putting all that together, the code to duplicate that ps example looks like this:

use POSIX qw<dup2>;

# 1. Create both ends of a pipe
pipe my ($readable, $writable)
    or die "Failed to create a pipe: $!\n";

# 2. Create a child process
my $pid = fork_child(sub {
    # 3. Child closes the readable end of the pipe
    close $readable or die "Child failed to close pipe: $!\n";
    # 3a. Connect stdout to the writable end
    dup2(fileno $writable, 1)
        or die "Child failed to reopen stdout to pipe: $!\n";
    close $writable or die "Child failed to close pipe: $!\n";
    # 4. Execute a process which generates output
    exec 'ps' or die "Failed to execute ps: $!\n";
});

# 5. Parent closes the writable end of the pipe
close $writable or die "Failed to close pipe: $!\n";
# 6. Parent reads from the other end, blocking until data arrives
while (<$readable>) {
    print "ps says: $_";
}
close $readable or die "Failed to close pipe: $!\n";
# 7. No more output will arrive, so reap the child
waitpid $pid, 0;
die "ps failed\n" if $? != 0;

Step 5 here is particularly important, in a way that isn’t obvious. When the parent reads from the readable end of the pipe, it blocks until data is available. It won’t see an end-of-file condition until all the writable file descriptors for the pipe have been closed (because until all of them have closed, more data might be forthcoming). So if step 5 were omitted, even once the child process had exited (and closed its copy of the writable file descriptor), the parent would block waiting for itself to close the writable descriptor.

Note again the benefits Unix earns from separating program invocation into fork and exec: the child process can do arbitrary setup work between the two operations.

How do I send input to a program?

As with capturing output, it’s normally best to use Perl’s easy open-a-command feature:

open my $lpr, '| lpr'
    or die "Failed to pipe to lpr: $!\n";
open my $lpr, '|-', 'lpr', '-H', $server
    or die "Failed to pipe to lpr: $!\n";

But, again, it’s worth seeing what happens under the hood when you do that. Fortunately, it’s the straightforward reversal of capturing a program’s output, with the child duplicating the readable end of the pipe as the program’s standard input (file descriptor 0):

pipe my ($readable, $writable)
    or die "Failed to create a pipe: $!\n";

my $pid = fork_child(sub {
    close $writable or die "Child failed to close pipe: $!\n";
    dup2(fileno $readable, 0)
        or die "Child failed to reopen stdin from pipe: $!\n";
    close $readable or die "Child failed to close pipe: $!\n";
    exec 'lpr' or die "Failed to execute lpr: $!\n";
});

close $readable or die "Failed to close pipe: $!\n";
while (my $line = get_next_printer_line()) {
    print $writable $line;
}
close $writable or die "Failed to close pipe: $!\n";
waitpid $pid, 0;
die "lpr failed\n" if $? != 0;

It’s important to be aware that, just as reading from a pipe will block until data is available, so too writing to a pipe can block if nothing is trying to read from the other end. The kernel will buffer data written to a pipe, until something tries to read that data; but the buffer is of a fixed size. Once the buffer is full, a process trying to write to the pipe will be paused until another process reads data from it.

Unix implementations differ in the size of the buffer they use for pipes, but the minimum buffer size required by the POSIX standard is only 512 bytes.

How do I send input to a process and capture its output?

Given the ability to capture process output and to send input to a process, can the two be combined with a single process? The short answer is yes, but that’s not the whole story.

Bidirectional piping to a child process needs two pipes, one for sending and the other for capturing (because each one is a unidirectional communication channel). With that in mind, the code seems relatively simple: create both pipes in the parent, then fork, and have the child connect the pipes to standard input and standard output as appropriate before it execs the desired program:

pipe my ($send_readable, $send_writable)
    or die "Failed to pipe for sending: $!\n";
pipe my ($capture_readable, $capture_writable)
    or die "Failed to pipe for capturing: $!\n";

my $pid = fork_child(sub {
    close $send_writable or die "Failed to close pipe: $!\n";
    close $capture_readable or die "Failed to close pipe: $!\n";
    dup2(fileno $send_readable, 0)
        or die "Child failed to reopen stdin from pipe: $!\n";
    dup2(fileno $capture_writable, 1)
        or die "Child failed to reopen stdout to pipe: $!\n";
    close $send_readable or die "Failed to close pipe: $!\n";
    close $capture_writable or die "Failed to close pipe: $!\n";
    exec 'sort' or die "Failed to execute sort: $!\n";
});

close $send_readable or die "Failed to close pipe: $!\n";
close $capture_writable or die "Failed to close pipe: $!\n";
print {$send_writable} get_data();
close $send_writable or die "Failed to close pipe: $!\n";
while (my $line = <$capture_readable>) {
    print $line;
}
waitpid $pid, 0;
die "sort failed\n" if $? != 0;

This specific example is carefully constructed to work correctly regardless of what get_data returns. But there are some serious issues that can show up through merely, say, changing what program is run in the child.

This code writes the whole of the input for sort before trying to read any of sort’s output. sort, meanwhile, is slurping up all of that data; when the parent closes the writable end of the sending pipe, sort knows it’s received the whole lot, and starts actually sorting it. Then the parent starts reading from the capturing pipe; if there’s a lot of data, it probably blocks at this point, because it’ll take a while for all that data to be sorted correctly. Eventually sort finishes the hard bit of its work, and starts producing output, and the parent reads the sorted data.

The problem is that a pipe has a fixed-size buffer. Suppose that, instead of executing sort(1), that code executed a filter like tr(1). The crucial difference between sort and tr, for these purposes, is that sort must read its inputs in their entirety before it can generate any output at all (because the last input line read might need to be the first output line), while tr can easily generate output as it goes.

The parent process in this code tries to write everything before reading anything. As it writes data, the tr child process reads data from the other end of the sending pipe, and generates output to the capturing pipe. And it’s those writes that cause the problem: once the buffer for the capturing pipe fills up, the kernel will block the tr process when it tries to write the next piece of data, until something reads from the other end of the pipe. But the only process that can read from the capturing pipe is the parent — and the parent is still busy trying to write data to the sending pipe. That is:

  • Nothing is reading from the capturing pipe
  • So the tr child can’t write any more data to it
  • So it doesn’t try to read any more data from the sending pipe
  • So the parent gets blocked trying to write to that
  • … which is why it won’t try to read from the capturing pipe!

This is a classic form of deadlock: two processes are blocked, waiting for each other to yield a shared resource. The only way out once this state is reached is to interrupt one or both of the processes with a signal.

There are ways that this double-pipe trick can be used safely. One is when using a program that, like sort, slurps all of its input before generating any output. Another is when you can guarantee to be able to interleave activity on the two pipes in such a way that the deadlock never happens. For example, with sufficient care, you can use a pair of pipes to drive dc(1), an arbitrary-precision postfix-notation calculator. That’s because you can predict which commands you send to it will produce output; when you send such a command, you can stop writing to the sending pipe, and collect the right number of lines of output from the capturing pipe.

It’s sometimes possible to work around the problem by having the parent fork two processes, not just one. One will ultimately execute the desired program; the other is a clone of the original parent, used only for generating the data that the filter will read:

pipe my ($send_readable, $send_writable)
    or die "Failed to pipe for sending: $!\n";
pipe my ($capture_readable, $capture_writable)
    or die "Failed to pipe for capturing: $!\n";

my $writer_pid = fork_child(sub {
    close $capture_readable or die "Failed to close pipe: $!\n";
    close $capture_writable or die "Failed to close pipe: $!\n";
    close $send_readable or die "Failed to close pipe: $!\n";
    print {$send_writable} get_data();
    close $send_writable or die "Failed to close pipe: $!\n";
});

my $filter_pid = fork_child(sub {
    close $send_writable or die "Failed to close pipe: $!\n";
    close $capture_readable or die "Failed to close pipe: $!\n";
    dup2(fileno $send_readable, 0)
        or die "Child failed to reopen stdin from pipe: $!\n";
    dup2(fileno $capture_writable, 1)
        or die "Child failed to reopen stdout to pipe: $!\n";
    close $send_readable or die "Failed to close pipe: $!\n";
    close $capture_writable or die "Failed to close pipe: $!\n";
    exec 'tr', 'a-z', 'A-Z'
        or die "Failed to execute tr: $!\n";
});

close $send_readable or die "Failed to close pipe: $!\n";
close $send_writable or die "Failed to close pipe: $!\n";
close $capture_writable or die "Failed to close pipe: $!\n";
print while <$capture_readable>;

while (1) {
    my $pid = waitpid -1, 0;
    last if $pid == -1;
    next if $? == 0;
    my $process = $pid == $writer_pid ? 'writer'
                : $pid == $filter_pid ? 'filter'
                :                       'other';
    warn "$process failed\n";
}

However, that’s sometimes awkward in practice, because the process generating the output has no means of communication with the parent process. In particular, your output-generation code can’t update any variables in the parent process.

In most circumstances, a simpler approach is merely to use a temporary file for one side of the data transfer, using the file as an arbitrarily-large buffer. (Though this obviously has the disadvantages that the parent process must have sufficient permissions to create a temporary file, and that the filesystem it uses for the temporary must have enough space to store all the data needed.)

Deciding whether it’s the input or the output that should use a temporary depends on your application, and on the relative expected sizes of the data flowing in each direction. This version uses a temporary for sending data to the filter:

use File::Temp qw<tempfile>;

my $temp_fh = tempfile();
print {$temp_fh} get_data();
seek $temp_fh, 0, 0 or die "Rewind temporary: $!\n";

pipe my ($readable, $writable)
    or die "Failed to create pipe: $!\n";

my $pid = fork_child(sub {
    close $readable or die "Child close readable: $!\n";
    dup2(fileno $temp_fh, 0)
        or die "Child failed to reopen stdin from temporary: $!\n";
    dup2(fileno $writable, 1)
        or die "Child failed to reopen stdout to pipe: $!\n";
    exec 'tr', 'a-z', 'A-Z'
        or die "Failed to execute tr: $!\n";
});

close $writable or die "Failed to close pipe: $!\n";
print while <$readable>;

waitpid $pid, 0;
die "tr failed\n" if $? != 0;

This uses the File::Temp module’s tempfile function, which by default creates a temporary file, and deletes it immediately. This takes advantage of another Unixism: when a file is deleted by name, the file’s data is left intact until there are no longer any processes which have that file open. In this case, the parent process opens the file initially, and the child process will inherit the relevant file descriptor; so the temporary file’s data isn’t deleted until both processes have closed it.

Note that the parent process must explicitly seek to the start of the temporary file after writing to it. Once a file has been opened, all file descriptors for that file (regardless of whether they’re created with dup or by forking) will share the same file offset pointer. So if the seek were omitted, the child process would start reading the temporary file from the last point at which I/O happened on it — namely, the end of the data just written.

How do I capture a program’s error output?

The Unix convention is that error messages should be written to the standard error output — an additional output file descriptor, distinguished from standard output so that errors aren’t blithely sent down a pipeline to something that may not know how to deal with them. Typically, when you type a pipeline command at the shell prompt, all the processes retain your terminal as their standard error location; that means you see the error messages, and the outputs flow through the pipeline.

Capturing just standard error from a program is simple, given our code above for capturing standard output; since standard error is file descriptor 2, all that’s needed is to change which file descriptor the child process connects to the pipe. The child will then inherit the same standard output as the parent process.

Capturing a program’s standard output and standard error merged together is nearly as easy: connect both stdout and stderr in the child process to the pipe:

pipe my ($readable, $writable)
    or die "Failed to create a pipe: $!\n";

my $pid = fork_child(sub {
    close $readable or die "Child failed to close readable end: $!\n";
    dup2(fileno $writable, 1)
        or die "Child failed to reopen stdout to pipe: $!\n";
    dup2(fileno $writable, 2)
        or die "Child failed to reopen stderr to pipe: $!\n";
    close $writable or die "Child failed to close original: $!\n";
    exec $program, @args or die "Failed to execute $program: $!\n";
});

close $writable or die "Failed to close pipe: $!\n";
print while <$readable>;
close $readable or die "Failed to close pipe: $!\n";

waitpid $pid, 0;
die "$program failed\n" if $? != 0;

Capturing the two separately, on the other hand, is much trickier: the obvious approach (using one pipe for stdout and another for stderr) is fraught with risks of deadlock. The problem is how the parent process should decide which pipe to read from at any given point in time: if it tries to read from one that has no data available, it blocks; and meanwhile, the child process could be writing data to the other pipe until it blocks.

One way of dealing with that is to read from a given pipe only when the kernel believes it is ready from reading. From Perl, this normally involves using the standard IO::Select module, or perhaps the four-argument select builtin for which it’s a wrapper. The idea is that, instead of simply reading data from a file descriptor, you first call select on all the file descriptors you’re interested in. That waits until at least one of the file descriptors becomes ready, and then returns a list of all that are ready. Then you can read from all of those; and then loop until all the descriptors have reached end-of-file.

The big disadvantage of using select is that you can no longer use the ordinary Perl <> operator for reading from the file. Instead, you have to use the sysread builtin; that’s because <> buffers the input, which interferes with your ability to read from file descriptors when and only when they become ready.

Given that, in many circumstances it’s easier to use a temporary file for one of the file descriptors — probably standard error, on the assumption that there shouldn’t be enough error messages to make the disk-space requirements an issue.

my $temp_stderr = tempfile();

pipe my ($readable, $writable)
    or die "Failed to create pipe: $!\n";

my $pid = fork_child(sub {
    close $readable or die "Child close readable: $!\n";
    dup2(fileno $writable, 1)
        or die "Child failed to reopen stdout to pipe: $!\n";
    dup2(fileno $temp_stderr, 2)
        or die "Child failed to reopen stderr to temporary: $!\n";
    exec $program, @arguments
        or die "Failed to execute $program: $!\n";
});

close $writable or die "Failed to close pipe: $!\n";

# Process child's standard output
print "stdout: $_" while <$readable>;

waitpid $pid, 0;
die "$program failed\n" if $? != 0;

# Child has now exited, so no more output will appear on $temp_stderr
seek $temp_stderr, 0, 0;

# Process child's standard error
print "stderr: $_" while <$temp_stderr>;

How do I pass multiple open files to a child process?

Suppose you want to run a child program whose standard input, standard output, and standard error are all connected to the same places as those of the parent — but which also has an access to an additional open file, perhaps a temporary file. That sounds easy enough. The child should probably take the file descriptor number as an argument, so it knows where to find this extra file; if it’s written in Perl, open lets you create a filehandle which corresponds to that file descriptor:

my $file_descriptor = $ARGV[0];
open my $temp_fh, "<&=$file_descriptor"
    or die "Failed to fdopen $file_descriptor: $!\n";

Then the parent just needs to open the temporary, and fork/exec the child, right?

my $temp_fh = tempfile();
fork_child(sub {
    exec 'use_tempfile', fileno $temp_fh
        or die "Failed to exec use_tempfile: $!\n";
});

Sadly, that doesn’t work: you just get a message saying Failed to fdopen 3: Bad file descriptor, or some such. The reason this happens is that File::Temp explicitly arranges for the temporaries it creates to be inaccessible to child processes. (As it happens, security considerations make this a reasonable default for File::Temp, but it’s still somewhat surprising.)

The way it does that is by setting a the file descriptor flags for the file descriptor to include the FD_CLOEXEC flag — which tells the kernel to close this file descriptor whenever a new program is executed. That is, the child process created with fork initially has access to the temporary file, but the child’s file descriptor that is open on the temporary is closed as soon as exec succeeds. (FD_CLOEXEC stands for “file descriptor close on exec”.)

Meanwhile, the parent process hasn’t executed anything, so its version of the file descriptor is left unchanged.

So, to pass an open file to a child process, you need to explicitly remove the FD_CLOEXEC flag from the file descriptor that’s open on the temporary file. For safety, you should first get the current file descriptor flags, then clear FD_CLOEXEC in those flags, and finally set the descriptor’s flags to the new value:

use POSIX qw<F_SETFD F_GETFD FD_CLOEXEC>;

my $flags = fcntl $temp_fh, F_GETFD, 0;
$flags &= ~FD_CLOEXEC;
fcntl $temp_fh, F_SETFD, $flags;

You can do this either in the child process, before executing the new program, or in the parent process; the latter might be particularly useful if you need to pass the same open file to multiple different child programs.

This raises the question of why all the previous examples work even though they don’t do this. The answer is that file descriptor flags like FD_CLOEXEC are tied to a specific file descriptor number. Earlier code did things like this:

my $temp_fh = tempfile();

fork_child(sub {
    dup2(fileno $temp_fh, 1);
    exec $program, @arguments;
});

In this case, the file descriptor underlying the filehandle returned by tempfile() is closed when $program is executed. But it’s already been cloned as file descriptor 1 when that happens, and FD_CLOEXEC only applies to fileno $temp_fh, not to the clone.