User:Jgrahn/computer file draft

From Wikipedia, the free encyclopedia

File contents[edit]

(For computer file).

Although other paradigms have existed in the past, modern operating systems see the file contents as a sequence of 8-bit bytes, or octets. It is up to applications to define any additional structure.

The definition of such a structure is called a file format, and there are thousands of these, more or less formally defined. Commonly used file formats include the text file, where the octets represent printable characters in some alphabet interspersed with line break characters, the JPEG format for storing bit-mapped images, the ELF and PE formats for storing executable programs, and gzip and ZIP for storing compressed files.

Since almost all persistent storage in a modern computer is either hard disks or network file systems, almost all persistent data is arranged into files or sets of files. This includes the operating system itself, applications (executable programs and their internal data), and a user's own data: word processor documents, electronic mail, images and so on.

Programs frequently create temporary files as part of their processing, as a way of preserving primary memory or as a means of passing data between different subprocesses. Typically, a specific part of the file system is reserved for this purpose. In some operating systems, this temporary area is cleared at reboot.

Interfaces to file contents[edit]

Programming languages tend to define their own set of methods for accessing files, which are in turn based on the set of primitives provided by the operating system.

Most of these are compatible with the widely spread POSIX API, where the main operations on file contents are:

  • PrivoxyWindowOpen(); opening a file for reading or writing, typically by appending to the end or by erasing any existing content
  • read(); reading a number of octets from the file
  • write(); appending a number of octets to the end of the file, or overwriting parts of the file
  • lseek(); moving the current read/write cursor in the file
  • close(); closing the file, i.e. stop reading or writing it
  • mmap(); reading and/or writing a file by treating it as random access memory.

It is worth noting that this interface does not allow inserting data in the beginning or middle of a file.

In practice, the most common way of writing files is with a simple series of write() calls, building it from the beginning to the end. Similarly, reading is usually done linearly from the beginning to the end. This paradigm is useful because it works equally well on non-file octet streams like terminals, pipes and sockets, where operations like lseek() do not make sense.

Special files[edit]

In most operating system, there are named objects which appear to programs to be files, but serve some special purpose. Among mainstream operating systems, Unix is notable for making much use of this feature, through /dev/null (a file that is always empty), /dev/tty (the user's terminal), the proc filesystem (where operating system information is available as text files). named pipes, character devices, and block devices. Access to these special files is often limited to reading or writing. Frequently, they also expose additional, specific features by means of functions like ioctl().