Talk:Fork–exec

Fork–exec vs. spawn[edit]

Why is this superior to something more intuitive like spawn in Windows? — Preceding unsigned comment added by SDX2000 (talk • contribs) 06:12, 15 December 2011 (UTC)[reply]

It's a different paradigm. fork()/exec() divides launching a new program into two steps, creating a new process, and executing a program from the filesystem, which can occur in isolation. Using fork() is a "lightweight" way to launch a process, similar to creating a thread (e.g: a web server can launch several identical processes/threads). (The lack of fork() on some platforms is a prime reason for the popularity of threads). Which means that the sentence

This is probably of limited use, so the child usually transfers to another program using the system call exec()

should probably be modified. 87.222.26.125 (talk) 10:54, 17 December 2011 (UTC)[reply]

The reason it's better is that each system call only does one thing and does it well: all fork does is create a process, all exec does is execute a program.

Programs doing one thing and doing it well isn't necessarily better: it's only better if the operating system also provides some way to combine those programs together. Unix does: there's | for communication and there's && and || for conditional execution and ; for unconditional.

Syscalls doing one thing and doing it well isn't necessarily better, either: it's only better if you have some way to control what order these syscalls are made in. A Turing-complete language such as C would allow this.

Consider what happens in Unix when you use & to background a process. Consider what happens when you don't: the forker (the shell) calls wait() to wait on the forkee terminating, then prints out a prompt and read()s for a command line to be typed.

And when you do use &, it's the exact same procedure except that it never calls wait(). Prior to the invention of fork, operating systems had an entire subsystem dedicated to running jobs in the background. In Unix it's just a handful of lines of code. So which is better?

Another example, consider setting an environment variable for all processes from that point onward:

x=123

awk

troff

ssh

versus setting it just for one awk instance:

x=123 awk

troff

ssh

In the first example, the setenv() system call is made by the shell. In the second example, it is made by the shell's child process, in which awk is to be run. This is possible because we are able to control when these system calls happen and what order they are made in, and *that* is possible because they are separate system calls. In a different OS, to set a variable for just one process, the OS itself would have to anticipate your intention and allow you to pass an argument to its system call to arrange this. The Unix way allows you to write whatever program you like, whether the OS's designer foresaw your need for that type of program or not: network servers, in the modern sense of the word, weren't thought of yet at the time that fork() was invented.

Also note that when you type a command without any specific redirection (|, <, <<, >, >>, $(), etc), the child process your program runs in just automatically inherits its standard streams from its parent process, which is the shell. This is just a natural feature of fork. The shell itself doesn't have any code that says "oh, no redirections, he must want the command to read from the terminal", it just sits there and fork is the one that connects the command to the terminal.

The very first command that runs on a Unix box, called init, opens /dev/tty for its std streams, and all other processes, because they were started by forking off init, inherit that particular configuration for free. So there's relatively few programs required to contain the boilerplate code of setting up their own std IO.

82.0.106.250 (talk) 17:43, 14 January 2014 (UTC)[reply]

`nice ffmpeg`: the shell forks itself, execs nice (the command), which calls nice (the syscall) to set the niceness (to the other users on the system) of the process it's running in, then execs ffmpeg in that same process, which has the same niceness.

Stuart Morrow (talk) 21:57, 28 January 2014 (UTC)[reply]

You can write high-availability servers that upgrade to the latest version without dropping any connections. The server can write out to disk everything that the next incarnation needs to know, then execs the new version into the current process. This keeps open all the file descriptors to the outside world, because the FD table is a characteristic of the process, not the program. And we're using the exact same process. Network connections are file descriptors. Which request is paired to which descriptor is part of the state that's written out to disk.

Stuart Morrow (talk) 13:59, 12 February 2014 (UTC)[reply]

So, long story short, they're two separate syscalls because there's plenty of reasons to use only one or the other, sometimes even both.

Stuart Morrow (talk) 21:45, 22 February 2014 (UTC)[reply]