Question

How does a pipe work in Linux?

How does piping work? If I run a program via CLI and redirect output to a file will I be able to pipe that file into another program as it is being written?

Basically when one line is written to the file I would like it to be piped immediately to my second application (I am trying to dynamically draw a graph off an existing program). Just unsure if piping completes the first command before moving on to the next command.

Any feed back would be greatly appreciated!

45 40430 45

1 Jan 1970

Solution

If you want to redirect the output of one program into the input of another, just use a simple pipeline:

program1 arg arg | program2 arg arg

If you want to save the output of program1 into a file and pipe it into program2, you can use tee(1):

program1 arg arg | tee output-file | program2 arg arg

All programs in a pipeline are run simultaneously. Most programs typically use blocking I/O: if when they try to read their input and nothing is there, they block: that is, they stop, and the operating system de-schedules them to run until more input becomes available (to avoid eating up the CPU). Similarly, if a program earlier in the pipeline is writing data faster than a later program can read it, eventually the pipe's buffer fills up and the writer blocks: the OS de-schedules it until the pipe's buffer gets emptied by the reader, and then it can continue writing again.

EDIT

If you want to use the output of program1 as the command-line parameters, you can use the backquotes or the $() syntax:

# Runs "program1 arg", and uses the output as the command-line arguments for
# program2
program2 `program1 arg`

# Same as above
program2 $(program1 arg)

The $() syntax should be preferred, since they are clearer, and they can be nested.

2009-07-02

Solution

Piping does not complete the first command before running the second. Unix (and Linux) piping run all commands concurrently. A command will be suspended if

It is starved for input.
It has produced significantly more output than its successor is ready to consume.

For most programs output is buffered, which means that the OS accumulates a substantial amount of output (perhaps 8000 characters or so) before passing it on to the next stage of the pipeline. This buffering is used to avoid too much switching back and forth between processes and kernel.

If you want output on a pipeline to be sent right away, you can use unbuffered I/O, which in C means calling something like fflush() to be sure that any buffered output is immediately sent on to the next process. Unbuffered input is also possible but is generally unnecessary because a process that is starved for input typically does not wait for a full buffer but will process any input you can get.

For typical applications unbuffered output is not recommended; you generally get the best performance with the defaults. In your case, however, where you want to do dynamic graphing immediately the first process has the info available, you definitely want to be using unbuffered output. If you're using C, calling fflush(stdout) whenever you want output sent will be sufficient.

2009-07-02