Shebang (Unix)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
A "shebang" or "hashbang" character sequence

In computing, a shebang (also called a sha-bang[1][2][3], hashbang[4][5], pound-bang[6][7], hash-exclam[8], or hash-pling[9][10]) is the character sequence consisting of the characters number sign and exclamation point (#!), when it occurs as the first two characters on the first line of a text file. In this case, the program loader in Unix-like operating systems parses the rest of the first line as an interpreter directive and invokes the program specified after the character sequence with any command line options specified as parameters. The name of the file being executed is passed as the final argument.[11]

For example, a file starting with the line:

#!/bin/sh

invokes the Bourne shell or a compatible shell. This is the standard starting line of a shell script.

The contents of the shebang line will be automatically ignored by the interpreter, because the # character is a comment marker in many scripting languages. Some language interpreters that do not use the hash mark to begin comments, such as Scheme, still may ignore the shebang line.[12]

The shebang or hashbang fragment identifiers are the one starting with exclamation mark (so that the combination ...url#fragment_id... in fact contains a shebang). They were suggested to be used in Ajax applications for stateful AJAX pages. Google Webmaster Central specifies that fragment identifiers starting with an exclamation mark (...url#!state...) are indexed specially by the Googlebot.

Contents

[edit] Syntax

The syntax of feature consists of the character sequence #!, i.e. the number sign and an exclamation point character. This initiating character sequence may be followed by whitespace, then followed by the (absolute) path to the interpreter program that will provide the interpretation. The shebang is looked for and used when a script is invoked directly (as with a regular executable), and largely to the end of making scripts look and act like regular executables, to the operating system and to the user.

[edit] Etymology and name history

The name shebang comes from an inexact contraction of SHArp bang or haSH bang, referring to the two typical Unix names of the two characters. Unix jargon uses sharp or hash (and sometimes, even, mesh) to refer to the number sign character and bang to refer to the exclamation point, hence shebang. Another theory on sh in shebang's name is from default shell sh, usually invoked with shebang.[13]

The initial two characters, "#!" of the interpreter directive have a range of jargon terms. One, "shebang",[14] is representative (with an American bias) but far from universal. An executable file starting with an interpreter directive is simply called a script, often prefaced with the name or general classification of the intended interpreter.

When asked about what he would call his feature (i.e. "What do you personally call that first line"), Dennis Ritchie answered:

From: "Ritchie, Dennis M (Dennis)** CTR **" <dmr@[redacted]>
To: <[redacted]@talisman.org>
Date: Thu, 19 Nov 2009 18:37:37 -0600
Subject: RE: What do -you- call your #!<something> line?

 I can't recall that we ever gave it a proper name.
It was pretty late that it went in--I think that I
got the idea from someone at one of the UCB conferences
on Berkeley Unix; I may have been one of the first to
actually install it, but it was an idea that I got
from elsewhere.

As for the name: probably something descriptive like
"hash-bang" though this has a specifically British flavor, but
in any event I don't recall particularly using a pet name
for the construction.

   Regards,
   Dennis

[edit] History

The shebang was introduced by Dennis Ritchie between Edition 7 and 8 at Bell Laboratories. It was also added to the BSD releases from Berkeley's Computer Science Research (present at 2.8BSD[15] and activated by default by 4.2BSD).[16] As AT&T Bell Laboratories Edition 8 Unix, and later editions, were not released to the public, the first widely known appearance of this feature was on BSD.

The lack of an interpreter directive, but support for shell scripts, is apparent in the documentation from Version 7 Unix in 1979, [17] which describes instead a facility of the Bourne shell where files with execute permission would be handled specially by the shell, which would (sometimes depending on initial characters in the script, such as ":" or "#") spawn a subshell which would interpret and run the commands contained in the file. In this model, scripts would only behave as other commands if called from within a Bourne shell. An attempt to directly execute such a file via the operating system's own exec() system trap would fail, preventing scripts from behaving uniformly as normal system commands.

In later versions of Unix-like systems, this inconsistency was removed. Dennis Ritchie introduced kernel support for interpreter directives in January 1980, for Version 8 Unix, with the following description:[18][19]

From uucp Thu Jan 10 01:37:58 1980
>From dmr Thu Jan 10 04:25:49 1980 remote from research
The system has been changed so that if a file being executed
begins with the magic characters #! , the rest of the line is understood
to be the name of an interpreter for the executed file.
Previously (and in fact still) the shell did much of this job;
it automatically executed itself on a text file with executable mode
when the text file's name was typed as a command.
Putting the facility into the system gives the following
benefits.

1) It makes shell scripts more like real executable files,
because they can be the subject of 'exec.'

2) If you do a 'ps' while such a command is running, its real
name appears instead of 'sh'.
Likewise, accounting is done on the basis of the real name.

3) Shell scripts can be set-user-ID.

4) It is simpler to have alternate shells available;
e.g. if you like the Berkeley csh there is no question about
which shell is to interpret a file.

5) It will allow other interpreters to fit in more smoothly.

To take advantage of this wonderful opportunity,
put

        #! /bin/sh

at the left margin of the first line of your shell scripts.
Blanks after ! are OK.  Use a complete pathname (no search is done).
At the moment the whole line is restricted to 16 characters but
this limit will be raised.

Kernel support for interpreter directives spread to other versions of Unix, and one modern implementation can be seen in the Linux kernel source in fs/binfmt_script.c.[20]

This mechanism allows scripts to be used in virtually any context normal compiled programs can be, including as full system programs, and even as interpreters of other scripts. As a caveat, though, some early versions of kernel support limited the length of the interpreter directive to roughly 32 characters (just 16 in its first implementation), would fail to split the interpreter name from any parameters in the directive, or had other quirks. Additionally, some modern systems allow the entire mechanism to be constrained or disabled for security purposes (for example, set-user-id support has been disabled for scripts on many systems).

Note that, even in systems with full kernel support for the #! magic number, some scripts lacking interpreter directives (although usually still requiring execute permission) are still runnable by virtue of the legacy script handling of the Bourne shell, still present in many of its modern descendants.

[edit] Examples

Some typical shebang lines:

  • #!/bin/sh — Execute the file using sh, the Bourne shell, or a compatible shell
  • #!/bin/csh — Execute the file using csh, the C shell, or a compatible shell
  • #!/usr/bin/perl -T — Execute using Perl with the option for taint checks
  • #!/usr/bin/ruby — Execute using Ruby
  • #!/usr/bin/python -O — Execute using Python with optimizations to code
  • #!/usr/bin/php — Execute the file using the PHP command line interpreter

On many systems, /bin/sh is a symbolic or hard link to bash. When invoked in this manner, many features of bash are disabled, to comply with POSIX.[21]

Shebang lines may include specific options that are passed to the interpreter (see the Perl example above). However, implementations vary in the parsing behavior of options.

[edit] Purpose

Interpreter directives allow scripts and data files to be used as system commands, hiding the details of their implementation from users and other programs, by removing the need to prefix scripts with their interpreter on the command line.

Hence, assuming a Bourne shell script in /usr/local/bin/foo with a first line of

#!/bin/sh -x

...which is run from the command line with the following (where "$" is just one possible prompt):

$ foo bar

...the result would be actual command execution equivalent (except for argv[0] being set to the filename) to:

$ /bin/sh -x /usr/local/bin/foo bar

Since sh reads commands from a filename provided on its command line (instead of from the user, as it would normally), the end result is that all the shell commands in /usr/local/bin/foo are run automatically, with bar provided as a parameter, $1, to those commands to use as they see fit.

Since the initial number sign is also the character introducing comments in the Bourne shell and many other interpreters, that interpreter directive itself is considered by the interpreter to be merely a comment, and skipped. However, it is up to the interpreter to ignore the shebang line; thus a script consisting of the following two lines:

#!/bin/cat
Hello world!

will echo both lines to standard output.

[edit] Portability

Shebangs must specify absolute paths to system executables; this can cause problems on systems that have a non-standard file system layout. Even when systems have fairly standard paths, it is quite possible for variants of the same operating system to have different locations for the desired interpreter. Python, for example, might be in /usr/bin/python, /usr/local/bin/python, or even something like /home/username/bin/python if being tested by a user.

Because of this it is common to need to edit the shebang line after copying a script from one computer to another because the path that was coded into the script may not apply on a new machine, depending on the consistency in past convention of placement of the interpreter. For this and other reasons, POSIX does not standardize the feature.

Often, the program /usr/bin/env can be used to circumvent this limitation by introducing a level of indirection. #! is followed by /usr/bin/env, followed by the desired command without full path, as in this example:

#!/usr/bin/env sh

This mostly works because the path /usr/bin/env is commonly used for the utility, and env invokes the first sh found in the user's $PATH, typically /bin/sh, if the user's path is correctly configured.

This approach may introduce vulnerabilities that expose information or gain unauthorized root access and does not grant complete portability.[22] There are still some portability issues with OpenServer 5.0.6 and Unicos 9.0.2 which have only /bin/env and no /usr/bin/env.[23]

Another portability problem is the interpretation of the command arguments. Some systems, including Linux, do not split up the arguments[24]; for example, when running the script with the first line like,

#!/usr/bin/env python -c

That is, python -c will be passed as one argument to /usr/bin/env, rather than two arguments. Cygwin also behaves this way. Some other systems handle the arguments differently.[original research?]

Another common problem is scripts containing a carriage return character immediately after the shebang, perhaps as a result of being edited on a system that uses DOS line breaks, such as Microsoft Windows. Some systems interpret the carriage return character as part of the interpreter command, resulting in an error message.

POSIX requires that sh is a shell capable of a syntax similar to the Bourne shell, although it does not require it to be located at /bin/sh; for example, some systems such as Solaris have the POSIX-compatible shell at /usr/xpg4/bin/sh.[25] In many Linux systems and recent releases of Mac OS X, /bin/sh is a hard or symbolic link to /bin/bash, the Bourne Again shell.

Using syntax specific to bash while maintaining a shebang pointing to the Bourne shell is not portable.[26]

[edit] Magic number

The shebang is actually a human-readable instance of a magic number in the executable file, the magic byte string being 0x23 0x21, the two-character encoding in ASCII. (Executable files that do not require an interpreter program start with other magic combinations. See File format for more details of magic numbers.)

Nonetheless, interpreted text files using the shebang are still text files, not binary files; a text editor that introduces superfluous leading bytes will break the constructions as the file would not start with 0x23 0x21. In particular, UTF-8—the standard character encoding for text files on many Unix-like systems—is ASCII-compatible, assigning all characters in the ASCII character set to the same one-byte codes; but UTF-8 files on Windows usually begin with a three-byte byte order mark (0xEF 0xBB 0xBF). These bytes change the magic number and thus the interpreter will not be run (unless this other magic number is also recognized). For this and other reasons, use of the byte order mark is strongly recommended against on POSIX (Unix-like) systems.[27][28] A byte order mark is unneeded for UTF-8 (as opposed to UTF-16) since UTF-8 can reliably be recognised as such by a simple algorithm.

There have been rumors that some old versions of UNIX look for the normal shebang followed by a space and a slash ("#! /"), but this appears to be untrue.[29]

On Unix-like operating systems, new image files are started by the "exec" family functions. This is where the operating system will detect that an image file is either a script or an executable binary. The presence of the shebang will result in the execution of the specified (usually script language) executable. This is described on the Solaris and Linux man page "execve".

[edit] Security issues

On some systems, scripts can be marked with the setuid attribute, set-user-ID, a Unix feature which means that a program is executed with the access rights of the program file's owner instead of the rights of the user running it. Although this mechanism may be safe for compiled code, the extra step introduced by the interpreter directive provides a extra window of opportunity of attack[30] along the following lines:

  1. An attacker makes a symbolic link in, say, /tmp/sneaky to a system shell script with setuid enabled, say /usr/bin/admintool (a hypothetical example).
  2. The attacker then runs /tmp/sneaky, but pauses its execution immediately
  3. If the new process had already gotten as far as opening sneaky, stop and start over, otherwise:
  4. The new process has already set its user ID to the owner of /usr/bin/admintool, so it's probably now running as root with full system rights (if not, start over)
  5. The attacker now removes the symbolic link pointing to /usr/bin/admintool
  6. The attacker creates a new script at /tmp/sneaky but with his own illicit commands therein
  7. The attacker now resumes the paused process, and the shell then opens sneaky and executes the illicit command file with root access rights.

This problem has been corrected on some modern systems, namely those supporting the /dev/fd filesystem can support the change, by opening the script first, producing a file descriptor which is safe from attack, then invoking the interpreter with that safe file descriptor as input. However, the discovery of the problem led many system administrators and developers to the conclusion that scripts couldn't be made secure, a case made more compelling by issues with the shell's internal field separator (also since corrected on modern systems); as a result, setuid functionality is often made unavailable to scripts.

As a result of these issues, setuid scripts are unsafe on older Unix-like systems, which comprise the majority of such installations. Appropriate research into the security implications of setuid scripts is therefore necessary before permitting their use. The sudo command is a widely-used alternative for providing similar functionality.

[edit] Strengths

When compared to the use of global association lists between command name extensions and the interpreting applications, the interpreter directive method allows users to use interpreters not known at a global system level, and without administrator rights. It also allows specific selection of interpreter, without overloading the filename extension namespace, and allows the implementation language of a script to be changed without changing its invocation syntax by other programs.

[edit] See also

[edit] References

  1. ^ "Advanced Bash Scripting Guide". http://tldp.org/LDP/abs/html/sha-bang.html. Retrieved 2012-01-19. 
  2. ^ "The #! magic, details about the shebang/hash-bang mechanism". http://www.in-ulm.de/~mascheck/various/shebang/. Retrieved 2012-01-19. 
  3. ^ Cooper, Mendel (November 5, 2010). Advanced Bash Scripting Guide 5.3 Volume 1. lulu.com. p. 5. ISBN 978-1435752184. http://books.google.com/books?id=WPXkgFRd4OEC&lpg=PA5&dq=sha-bang&pg=PA5#v=onepage&q=sha-bang&f=false. 
  4. ^ MacDonald, Matthew (2011). HTML5: The Missing Manual. Sebastopol, California: O'Reilly Media. p. 373. ISBN 9781449302399. http://books.google.com/books?id=SR7HXy2XvBEC&pg=PA373. 
  5. ^ Lutz, Mark (September 2009). Learning Python (4th ed.). O'Reilly Media. p. 48. ISBN 9780596158064. http://books.google.com/books?id=1HxWGezDZcgC&pg=PA48. 
  6. ^ Lie Hetland, Magnus (October 4, 2005). Beginning Python: From Novice to Professional. Apress. p. 21. ISBN 978-1590595190. http://books.google.co.uk/books?id=S0l1YFpRFVAC&lpg=PA21&dq=pound%20bang&pg=PA21#v=onepage&q=pound%20bang&f=false. 
  7. ^ "The #! magic, details about the shebang/hash-bang mechanism". http://www.in-ulm.de/~mascheck/various/shebang/. Retrieved 2012-01-19. 
  8. ^ "The #! magic, details about the shebang/hash-bang mechanism". http://www.in-ulm.de/~mascheck/various/shebang/. Retrieved 2012-01-19. 
  9. ^ Schitka, John (December 24, 2002). Linux+ Guide to Linux Certification. Course Technology. p. 353. ISBN 978-0619130046. http://books.google.co.uk/books?id=l7JhL9rJLEgC&lpg=PA353&dq=hashpling&pg=PA353#v=onepage&q=hashpling&f=false. 
  10. ^ "The #! magic, details about the shebang/hash-bang mechanism". http://www.in-ulm.de/~mascheck/various/shebang/. Retrieved 2012-01-19. 
  11. ^ "execve(2) - Linux man page". http://linux.die.net/man/2/execve. Retrieved 2010-10-21. 
  12. ^ SRFI 22
  13. ^ "Jargon File entry for shebang". Catb.org. http://catb.org/jargon/html/S/shebang.html. Retrieved 2010-06-16. 
  14. ^ http://www.catb.org/~esr/jargon/html/S/shebang.html The Jargon File: shebang
  15. ^ http://www.mckusick.com/csrg CSRG Archive CD-ROMs
  16. ^ "extracts from 4.0BSD /usr/src/sys/newsys/sys1.c". In-ulm.de. http://www.in-ulm.de/~mascheck/various/shebang/sys1.c.html. Retrieved 2010-06-16. 
  17. ^ http://cm.bell-labs.com/7thEdMan/v7vol2a.pdf UNIX TIME-SHARING SYSTEM: UNIX PROGRAMMER’S MANUAL Seventh Edition, Volume 2A, January, 1979
  18. ^ http://www.in-ulm.de/~mascheck/various/shebang/sys1.c.html The '#!' magic - details about the shebang mechanism on various Unix flavours
  19. ^ http://www.mckusick.com/csrg CSRG Archive CD-ROMs
  20. ^ http://www.linuxjournal.com/article/2568 Playing with Binary Formats, January 1998
  21. ^ GNU bash manual page: "If bash is invoked with the name sh, it tries to mimic the startup behavior of historical versions of sh as closely as possible, while conforming to the POSIX standard as well. [...] When invoked as sh, bash enters posix mode after the startup files are read."
  22. ^ "Secure Programs HowTo - Environment Variables". Dwheeler.com. 2003-03-03. http://www.dwheeler.com/secure-programs/Secure-Programs-HOWTO/environment-variables.html#ENV-VAR-SOLUTION. Retrieved 2010-11-18. 
  23. ^ "Details about '#!'". In-ulm.de. http://www.in-ulm.de/~mascheck/various/shebang/. 
  24. ^ "/usr/bin/env behaviour". Mail-index.netbsd.org. 2008-11-09. http://mail-index.netbsd.org/netbsd-users/2008/11/09/msg002388.html. Retrieved 2010-11-18. 
  25. ^ "The Open Group Base Specifications Issue 7". 2008. http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sh.html#tag_20_117_16. Retrieved 2010-04-05. 
  26. ^ pixelbeat.org: Common shell script mistakes "It's much better to test scripts directly in a POSIX compliant shell if possible. The `bash --posix` option doesn't suffice as it still accepts some 'bashisms'"
  27. ^ "FAQ - UTF-8, UTF-16, UTF-32 & BOM: Can a UTF-8 data stream contain the BOM character (in UTF-8 form)? If yes, then can I still assume the remaining UTF-8 bytes are in big-endian order?". http://unicode.org/faq/utf_bom.html#bom5. Retrieved 2009-01-04. 
  28. ^ Markus Kuhn (2007). "UTF-8 and Unicode FAQ for Unix/Linux: What different encodings are there?". http://www.cl.cam.ac.uk/~mgk25/unicode.html#ucsutf. Retrieved 20 January 2009. "Adding a UTF-8 signature at the start of a file would interfere with many established conventions such as the kernel looking for “#!” at the beginning of a plaintext executable to locate the appropriate interpreter." 
  29. ^ "32 bit shebang myth". In-ulm.de. http://www.in-ulm.de/~mascheck/various/shebang/#details. Retrieved 2010-06-16. 
  30. ^ http://docstore.mik.ua/orelly/other/puis3rd/0596003234_puis3-chp-6-sect-5.html

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages