Jump to content

Here document

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Nbarth (talk | contribs) at 13:37, 31 August 2013 (→‎See also: remove tr, just used as eg, and docstring, which is covered at string literal). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In computer science, a here document (here-document, heredoc, hereis, here-string or here-script) is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also used for a form of multiline string literals that use similar syntax, preserving line breaks and other whitespace (including indentation) in the text.

Here documents originate in the Unix shell, and are found in sh, csh, ksh, Bash and zsh, among others. Here document-style string literals are found in various high-level languages, notably Perl (syntax inspired by Unix shell) and languages influenced by Perl, such as PHP and Ruby. Other high-level languages such as Python and Tcl have other facilities for multiline strings.

For here documents, whether treated as files or strings, some languages treat it as a format string, allow variable substitution and command substitution inside the literal.

The most common syntax for here documents, originating in Unix shells, is << followed by a delimiting identifier, followed, starting on the next line, by the text to be quoted, and then closed by the same identifier on its own line. This syntax is because here documents are formally stream literals, and the content of the document is redirected to stdin (standard input) of the preceding command; the here document syntax is by analogy with the syntax for input redirection, which is < "take input from the output of the following command".

Other languages often use substantially similar syntax, but details of syntax and actual functionality can vary significantly. When used simply for string literals, the << does not indicate indirection, but is simply a starting delimiter convention. In some languages, such as Ruby, << is also used for input redirection, thus resulting in << being used twice if one wishes to redirect from a here document string literal.

File literals

Narrowly speaking, here documents are file literals or stream literals.

Unix shells

Here documents are available in many Unix shells, though the DOS/Windows Command Shell does not have an equivalent to here documents.

In the following example, text is passed to the tr command using a here document. This could be in a shell file, or entered interactively at a prompt.

 tr a-z A-Z << END_TEXT
 one two three
 uno dos tres
 END_TEXT

This yields the output:

ONE TWO THREE
UNO DOS TRES

END_TEXT was used as the delimiting identifier. It specified the start and end of the here document. The redirect and the delimiting identifier do not need to be separated by a space: <<END_TEXT or << END_TEXT both work equally well.

Appending a minus sign to the << has the effect that leading tabs are ignored. This allows indenting here documents in shell scripts without changing their value. (Note that you will probably need to type CTRL-V, TAB to actually enter a TAB character on the command line. The example below emulates tabs with spaces; don't copy and paste.)

 tr a-z A-Z <<- END_TEXT
         one two three
         uno dos tres
         END_TEXT

This yields the same output, notably not indented:

ONE TWO THREE
UNO DOS TRES

By default, variables are interpolated and commands in backticks are evaluated.

 cat << EOF
 Working dir $PWD
 EOF

yields:

Working dir /home/user

This can be disabled by quoting any part of the label. For example by setting it in single or double quotes:

 cat << "EOF"
 Working dir $PWD
 EOF

yields:

Working dir $PWD

Here strings

A here string (available in bash, ksh, or zsh) is syntactically similar, consisting of <<<, and effects input redirection from a word (a sequenced treated as a unit by the shell, in this context generally a string literal). In this case the usual shell syntax is used for the word (there is no "here string syntax"), with the only syntax being the redirection.

A single word need not be quoted:

 tr a-z A-Z <<< one

yields:

ONE

In case of a string with spaces, it must be quoted:

 tr a-z A-Z <<< 'one two three'

yields:

ONE TWO THREE

This could also be written as:

 FOO='one two three'
 tr a-z A-Z <<< $FOO

Multiline strings are acceptable, yielding:

 tr a-z A-Z <<< 'one
 two three'

yields:

ONE
TWO THREE

Note that leading and trailing newlines, if present, are included:

 tr a-z A-Z <<< '
 one
 two three
 '

yields:

 
ONE
TWO THREE
 

The key difference from here documents is that in here documents, the delimiters are on separate lines (the leading and trailing newlines are stripped), and the terminating delimiter can be specified.

Note that here string behavior can also be accomplished (reversing the order) via piping and the echo command, as in:

 echo 'one two three' | tr a-z A-Z

Microsoft NMAKE

In Microsoft NMAKE, here documents are referred to as inline files. Inline files are referenced as << or <<pathname: the first notation creates a temporary file, the second notation creates (or overwrites) the file with the specified pathname. An inline file is terminated with << on a line by itself, optionally followed by the (case-insensitive) keyword KEEP or NOKEEP to indicate whether the created file should be kept.

target0: dependent0
    command0 <<
temporary inline file
...
<<

target1: dependent1
    command1 <<
temporary, but preserved inline file
...
<<KEEP

target2: dependent2
    command2 <<filename2
named, but discarded inline file
...
<<NOKEEP

target3: dependent3
    command3 <<filename3
named inline file
...
<<KEEP

R

R does not have file literals, but provides equivalent functionality by combining string literals with a string-to-file function. R allows arbitrary whitespace, including newlines, in strings. A string then can be turned into a file descriptor using the textConnection() function. For example, the following turns a data table embedded in the source code into a data-frame variable:

str <-
"State          Population Income Illiteracy Life.Exp Murder HS.Grad Frost
Alabama              3615   3624        2.1    69.05   15.1    41.3    20
Alaska                365   6315        1.5    69.31   11.3    66.7   152
Arizona              2212   4530        1.8    70.55    7.8    58.1    15
Arkansas             2110   3378        1.9    70.66   10.1    39.9    65"
x <- read.table(textConnection(str), header=TRUE, row.names=1)

Data URI Scheme

As further explained in Data URI scheme, all major web browsers understand URIs that start with data: as here document.

Multiline string literals

The term "here document" or "here string" is also used for multiline string literals in various programming languages, notably Perl (syntax influenced by Unix shell), and languages influenced by Perl, notably PHP and Ruby. The shell-style << syntax is often retained, despite not being used for input redirection.

Perl-influenced

Perl

In Perl there are several different ways to invoke here docs.[1] The delimiters around the tag have the same effect within the here doc as they would in a regular string literal: For example using double quotes around the tag allows variables to be interpolated, but using single quotes doesn't, and using the tag without either behaves like double quotes. Using backticks as the delimiter runs the contents of the heredoc as a shell script. It is necessary to make sure that the end tag is at the beginning of the line or the tag will not be recognized by the interpreter.

Note that the here doc does not start at the tag—but rather starts on the next line. So the statement containing the tag continues on after the tag.

Here is an example with double quotes:

my $sender = "Buffy the Vampire Slayer";
my $recipient = "Spike";

print <<"END";

Dear $recipient, 

I wish you to leave Sunnydale and never return.

Not Quite Love,
$sender

END

Output:

Dear Spike,

I wish you to leave Sunnydale and never return.

Not Quite Love,
Buffy the Vampire Slayer

Here is an example with single quotes:

print <<'END';
Dear $recipient,

I wish you to leave Sunnydale and never return.

Not Quite Love,
$sender
END

Output:

Dear $recipient,

I wish you to leave Sunnydale and never return.

Not Quite Love,
$sender

And an example with backticks (may not be portable):

my $shell_script_stdout = <<`END`;
echo foo
echo bar
END

It is possible to start multiple heredocs on the same line:

say(<<BEGIN . "this is the middle\n" . <<END);
This is the beginning:
BEGIN
And now it is over!
END

# this is equivalent to:
say("This is the beginning:\nthis is the middle\nAnd now it is over!\n");

The tag itself may contain whitespace, which may allow heredocs to be used without breaking indentation.

  say <<'  END';
Hello World
  END

In addition to these strings, Perl also features file literals, namely the contents of the file following __DATA__ (formerly __END__) on a line by itself. This is accessible as the file object PACKAGE::DATA such as main::DATA, and can be viewed as a form of data segment.

PHP

In PHP, here documents are referred to as heredocs.

<?php
 
$name       = "Joe Smith";
$occupation = "Programmer";
echo <<<EOF

	This is a heredoc section.
	For more information talk to $name, your local $occupation.

	Thanks!

EOF;

$toprint = <<<EOF

	Hey $name! You can actually assign the heredoc section to a variable!

EOF;
echo $toprint;

?>

Outputs

This is a heredoc section.
For more information talk to Joe Smith, your local Programmer.

Thanks!

Hey Joe Smith! You can actually assign the heredoc section to a variable!

The line containing the closing identifier must not contain any other characters, except an optional ending semicolon. Otherwise, it will not be considered to be a closing identifier, and PHP will continue looking for one. If a proper closing identifier is not found, a parse error will result at the last line of the script.[2]

In PHP 5.3 and later, like Perl, it is possible to not interpolate variables by surrounding the tag with single quotes; this is called a nowdoc:[3]

$x = <<<'END'
Dear $recipient,

I wish you to leave Sunnydale and never return.

Not Quite Love,
$sender
END;

In PHP 5.3+ it is also possible to surround the tag with double quotes, which like Perl has the same effect as not surrounding the tag with anything at all.

Ruby

The following Ruby code displays a grocery list by using a here document.

puts <<GROCERY_LIST
Grocery list
----
1. Salad mix.
2. Strawberries.*
3. Cereal.
4. Milk.*
 
* Organic
GROCERY_LIST

The result:

$ ruby grocery-list.rb
Grocery list
------------
1. Salad mix.
2. Strawberries.*
3. Cereal.
4. Milk.*

* Organic

The << in a here document does not indicate input redirection, but Ruby also uses << for input redirection, so redirecting to a file from a here document involves using << twice, in different senses:

File::open("grocery-list", "w") do |f|
  f << <<GROCERY_LIST
Grocery list
----
1. Salad mix.
2. Strawberries.*
3. Cereal.
4. Milk.*
 
* Organic
GROCERY_LIST
end

As with Unix shelles, Ruby also allows for the delimiting identifier not to start on the first column of a line, if the start of the here document is marked with the slightly different starter "<<-". Besides, Ruby treats here documents as a double-quoted string, and as such, it is possible to use the #{} construct to interpolate code. The following example illustrates both of these features:

now = Time.now
puts <<-EOF
  It's #{now.hour} o'clock John, where are your kids?
  EOF

Like Perl, Ruby allows for starting multiple here documents in one line:

puts <<BEGIN + "<--- middle --->\n" + <<END
This is the beginning:
BEGIN
And now it is over!
END

# this equals this expression:
puts "This is the beginning:\n<--- middle --->\nAnd now it is over!"

As with Perl, Ruby features file literals, namely the contents of the file following __END__ on a line by itself. This is accessible as the file object DATA and can be viewed as a form of data segment.

Others

D

Since version 2.0, D has support for here document-style strings using the 'q' prefix character. These strings begin with q"IDENT followed immediately by a newline (for an arbitrary identifier IDENT), and end with IDENT" at the start of a line.

int main() {
    string list = q"IDENT
1. Item One
2. Item Two
3. Item Three
IDENT";
    writef( list );
}

D also supports a few quoting delimiters, with similar syntax, with such strings starting with q"[ and ending with ]" or similarly for other delimiter character (any of () <> {} or []).

Racket

Racket's here strings start with #<< followed by characters that define a terminator for the string.[4] The content of the string includes all characters between the #<< line and a line whose only content is the specified terminator. More precisely, the content of the string starts after a newline following #<<, and it ends before a newline that is followed by the terminator.

#lang racket

(displayln
 #<<HERESTRING
This is a simple here string in Racket.
  * One
  * Two
  * Three
HERESTRING
 )

Outputs:

This is a simple here string in Racket.
  * One
  * Two
  * Three

No escape sequences are recognized between the starting and terminating lines; all characters are included in the string (and terminator) literally.

#lang racket

(displayln
 #<<A here string in Racket 
This string spans for multiple lines
and can contain any Unicode symbol.
So things like λ, , α, β, are all fine.

In the next line comes the terminator. It can contain any Unicode symbol as well, even spaces and smileys!
A here string in Racket 
 )

Outputs:

This string spans for multiple lines
and can contain any Unicode symbol.
So things like λ, ☠, α, β, are all fine.

In the next line comes the terminator. It can contain any Unicode symbol as well, even spaces and smileys!

Here strings can be used normally in contexts where normal strings would:

#lang racket

(printf #<<END
Dear ~a,

Thanks for the insightful conversation ~a.

                ~a

END
        "Isaac"
        "yesterday"
        "Carl")

Outputs:

Dear Isaac,

Thanks for the insightful conversation yesterday.

                Carl

An interesting alternative is to use the language extension at-exp to write @-expressions.[5] They look like this:

#lang at-exp racket

(displayln @string-append{
  This is a long string,
  very convenient when a
  long chunk of text is
  needed.
  
  No worries about escaping
  "quotes". It's also okay
  to have λ, γ, θ, ...
  
  Embed code: @|(number->string (+ 3 4))|
  })

Outputs:

This is a long string,
very convenient when a
long chunk of text is
needed.

No worries about escaping
"quotes". It's also okay
to have λ, γ, θ, ...

Embed code: 7

An @-expression is not specific nor restricted to strings, it is a syntax form that can be composed with the rest of the language.

Windows PowerShell

In Windows PowerShell, here documents are referred to as here-strings. A here-string is a string which starts with an open delimiter (@" or @') and ends with a close delimiter ("@ or '@) on a line by itself, which terminates the string. All characters between the open and close delimiter are considered the string literal. Using a here-string with double quotes allows variables to be interpreted, using single quotes doesn't. Variable interpolation occurs with simple variables (e.g. $x but NOT $x.y or $x[0]). You can execute a set of statements by putting them in $() (e.g. $($x.y) or $(Get-Process | Out-String)).

In the following PowerShell code, text is passed to a function using a here-string. The function ConvertTo-UpperCase is defined as follows:

PS> function ConvertTo-UpperCase($string) { $string.ToUpper() }

PS> ConvertTo-UpperCase @'
>> one two three
>> eins zwei drei
>> '@
>>
ONE TWO THREE
EINS ZWEI DREI

Here is an example that demonstrates variable interpolation and statement execution using a here-string with double quotes:

$doc, $marty = 'Dr. Emmett Brown', 'Marty McFly'
$time = [DateTime]'Friday, October 25, 1985 8:00:00 AM'
$diff = New-TimeSpan -Minutes 25
@"
$doc : Are those my clocks I hear?
$marty : Yeah! Uh, it's $($time.Hour) o'clock!
$doc : Perfect! My experiment worked! They're all exactly $($diff.Minutes) minutes slow.
$marty : Wait a minute. Wait a minute. Doc... Are you telling me that it's $(($time + $diff).ToShortTimeString())?
$doc : Precisely.
$marty : Damn! I'm late for school!
"@

Output:

Dr. Emmett Brown : Are those my clocks I hear?
Marty McFly : Yeah! Uh, it's 8 o'clock!
Dr. Emmett Brown : Perfect! My experiment worked! They're all exactly 25 minutes slow.
Marty McFly : Wait a minute. Wait a minute. Doc... Are you telling me that it's 08:25?
Dr. Emmett Brown : Precisely.
Marty McFly : Damn! I'm late for school!

Using a here-string with single quotes instead, the output would look like this:

$doc : Are those my clocks I hear?
$marty : Yeah! Uh, it's $($time.Hour) o'clock!
$doc : Perfect! My experiment worked! They're all exactly $($diff.Minutes) minutes slow.
$marty : Wait a minute. Wait a minute. Doc... Are you telling me that it's $(($time + $diff).ToShortTimeString())?
$doc : Precisely.
$marty : Damn! I'm late for school!

See also

References