Splice (system call): Difference between revisions
Undo the previous undo, it does have basic support for pipe -> user memory. It just does a copy, not a map. Older kernels did not, but that doesn't apply anymore. |
|||
Line 54: | Line 54: | ||
ret = pipe (filedes); |
ret = pipe (filedes); |
||
if (ret < 0) |
if (ret < 0) |
||
goto |
goto out; |
||
/* splice the file into the pipe (data in kernel memory). */ |
/* splice the file into the pipe (data in kernel memory). */ |
||
while (to_write > 0) { |
while (to_write > 0) { |
||
Line 78: | Line 78: | ||
close (filedes [0]); |
close (filedes [0]); |
||
close (filedes [1]); |
close (filedes [1]); |
||
out: |
|||
if (ret < 0) |
if (ret < 0) |
||
return -errno; |
return -errno; |
Revision as of 10:51, 22 June 2008
A splice() is a system call that copies data between a file handle and a pipe, or between a pipe and user space. It does so without actually copying the data, in contrast to other data copying techniques, thereby improving I/O performance.
Workings
splice() works by using the pipe buffer. The crucial service offered by splice is that one can move data from one file descriptor to another without incurring any copies from user space into kernel space, which is usually required to enforce system security and also to keep a simple and elegant interface for processes to read and write to files. The key insight that splice exploits is that a pipe buffer is effectively implemented as an in-kernel memory buffer that is opaque to the user space process. This means that the user process can splice the contents of a source file into this pipe buffer, without ever copying the contents of the source file into memory, then into kernel space, then splice the contents of the pipe buffer into the destination file, again without incurring any copies. By providing a means of loading and unloading data into and from an opaque in-kernel buffer, in the form of a pipe buffer, one now has access to an elegantly generalized method in which data can be moved from file to file without needlessly having to copy its contents into memory (since the program is performing no operations on the data, only moving it).
This is an article where Linus Torvalds explicitly describes splice(): [1]
Origins
The Linux splice implementation borrows some ideas from an original proposal by Larry McVoy in 1998 [2]. The splice system calls first appeared in the Linux kernel version 2.6.17 and was authored by Jens Axboe.
Prototype
The documentation on splice is currently scarce, and this is the current prototype that works in Linux 2.6.19.1:
/* Our call to splice (no header currently). */
static inline int splice(int fdin, loff_t *off_in, int fdout, loff_t *off_out,
size_t len, unsigned int flags)
{
return syscall(__NR_splice, fdin, off_in, fdout, off_out, len, flags);
}
Some constants that are of interest are:
/* Splice flags (not laid down in stone yet). */
#ifndef SPLICE_F_MOVE
#define SPLICE_F_MOVE 0x01
#endif
#ifndef SPLICE_F_NONBLOCK
#define SPLICE_F_NONBLOCK 0x02
#endif
#ifndef SPLICE_F_MORE
#define SPLICE_F_MORE 0x04
#endif
#ifndef SPLICE_F_GIFT
#define SPLICE_F_GIFT 0x08
#endif
#ifndef __NR_splice
#define __NR_splice 313
#endif
Example
This is an example of splice in action:
/* Transfer from disk to a log. */
int log_blocks (struct log_handle * handle, int fd, loff_t offset, size_t size)
{
int filedes [2];
int ret;
size_t to_write = size;
ret = pipe (filedes);
if (ret < 0)
goto out;
/* splice the file into the pipe (data in kernel memory). */
while (to_write > 0) {
ret = splice (fd, &offset, filedes [1], NULL, to_write,
SPLICE_F_MORE | SPLICE_F_MOVE);
if (ret < 0)
goto pipe;
else
to_write -= ret;
}
to_write = size;
/* splice the data in the pipe (in kernel memory) into the file. */
while (to_write > 0) {
ret = splice (filedes [0], NULL, handle->fd,
&(handle->fd_offset), to_write,
SPLICE_F_MORE | SPLICE_F_MOVE);
if (ret < 0)
goto pipe;
else
to_write -= ret;
}
pipe:
close (filedes [0]);
close (filedes [1]);
out:
if (ret < 0)
return -errno;
return 0;
}
Complementary system calls
sys_splice() is just 1 of three system calls that complete the splice() architecture. sys_vmsplice() can map an application data area into a pipe (or vice versa), thus allowing transfers between pipes and user memory where sys_splice() transfers between a file descriptor and a pipe. sys_tee() is the last part of the trilogy. It duplicates one pipe to another, enabling forks in the way applications are connected with pipes.