Leaning toothpick syndrome
|
|
The topic of this article may not meet Wikipedia's general notability guideline. Please help to establish notability by adding reliable, secondary sources about the topic. If notability cannot be established, the article is likely to be merged, redirected, or deleted. (October 2011) |
In computer programming, leaning toothpick syndrome (LTS) is the situation in which a quoted expression becomes unreadable because it contains a large number of escape characters, usually backslashes ("\"), to avoid delimiter collision.
The official Perl documentation[1] introduced the term to wider usage; there, the phrase is used to describe regular expressions that match Unix-style paths in which the elements are separated by forward slashes.
LTS appears in many programming languages and in many situations, including in patterns that match Uniform Resource Identifiers (URIs) and in programs that output quoted text. Many quines fall into the latter category.
Contents |
[edit] Pattern example
Consider the following Perl regular expression intended to match URIs which identify files under the pub directory of an FTP site:
m/ftp:\/\/[^\/]*\/pub\//
Perl solves this problem by allowing many other characters to be delimiters for a regular expression. For example, the following three examples are equivalent to the expression given above:
m{ftp://[^/]*/pub/}
m#ftp://[^/]*/pub/#
m!ftp://[^/]*/pub/!
[edit] Quoted text example
A Perl program to print an HTML link tag, where the URL and link text are stored in variables $url and $text respectively, might look like this. Notice the use of backslashes to escape the quoted double-quote characters:
print "<a href=\"$url\">$text</a>";
Using single quotes to delimit the string is not feasible, as Perl does not expand variables inside single-quoted strings.
print '<a href="$url">$text</a>'
Using the printf function is a viable solution in many languages (Perl, C, PHP):
printf('<a href="%s">%s</a>', $url, $text);
The qq operator in Perl allows for any delimiter:
print qq{<a href="$url">$text</a>};
print qq|<a href="$url">$text</a>|;
print qq(<a href="$url">$text</a>);
Here documents are especially well suited for multi-line strings; however, here documents do not allow for proper indentation. This example shows the Perl syntax:
print <<HERE_IT_ENDS;
<a href="$url">$text</a>
HERE_IT_ENDS
[edit] Other languages
[edit] C#
The C# programming language handles LTS by the use of the '@' symbol at the start of string literals, before the initial quotation marks e.g.
string filePath = @"C:\Foo\Bar.txt"
rather than otherwise requiring:
string filePath = "C:\\Foo\\Bar.txt"
[edit] C++
The C++11 standard adds raw strings:
std::string filePath = R"(C:\Foo\Bar.txt)";
If the string contains the characters )" an optional delimiter can be used, such as d in the following example:
std::regex re{ R"d(s/"\([^"]*\)"/'\1'/g)d" };
[edit] Python
Python has a similar construct using 'r':
filePath = r"C:\Foo\Bar.txt"
[edit] Scala
Scala allows usage of triple quotes in order to prevent escaping confusion:
val filePath = """C:\Foo\Bar.txt"""
val pubPattern = """ftp://[^/]*/pub/"""r
The triple quotes also allow for multi line strings, as shown here:
val text = """First line,
second line."""
[edit] Sed
Sed regular expressions, particularly using the 's' operator, have a similar situation to Perl, and indeed sed is a predecessor to Perl – the default delimiter is '/', but other delimiters can also be used – default is "s/regexp/replacement/", but "s,regexp,replacement," is also valid. For example, to match a "pub" directory (as in the Perl example) and replace it with "foo", the default (escaping the slashes) is:
s/ftp:\/\/[^\/]*\/pub\//foo/
Using a comma (',') as delimiter instead yields:
s,ftp://[^/]*/pub/,foo,