Icon (programming language)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Icon
Panorama Icon logo Mondadori.png
Paradigmmulti-paradigm: structured, text-oriented
Designed byRalph Griswold
First appeared1977; 43 years ago (1977)
Stable release
9.5.1 / September 27, 2018; 2 years ago (2018-09-27)
Typing disciplinedynamic
Websitewww.cs.arizona.edu/icon
Major implementations
Icon, Jcon
Dialects
Unicon
Influenced by
SNOBOL, SL5, ALGOL
Influenced
Unicon, Python, Goaldi

Icon is a very high-level programming language featuring goal-directed execution and many facilities for managing strings and textual patterns. It is related to SNOBOL and SL5, string processing languages. Icon is not object-oriented, but an object-oriented extension called Idol was developed in 1996 which eventually became Unicon.

Basic syntax[edit]

The Icon language is derived from the ALGOL-class of structured programming languages, and thus has syntax similar to C or Pascal. Icon is most similar to Pascal, using := syntax for assignments, the procedure keyword and similar syntax. On the other hand, Icon uses C-style brackets for structuring execution groups, and programs start by running a procedure called main.

In many ways Icon also shares features with most scripting languages (as well as SNOBOL and SL5, from which they were taken): variables do not have to be declared, types are cast automatically, and numbers can be converted to strings and back automatically. Another feature common to many scripting languages, but not all, is the lack of a line-ending character; in Icon, lines not ended by a semicolon get ended by an implied semicolon if it makes sense.

Procedures are the basic building blocks of Icon programs. Although they use Pascal naming, they work more like C functions and can return values; there is no function keyword in Icon.

 procedure doSomething(aString)
   write(aString)
 end

Goal-directed execution[edit]

One of Icon's key concepts is that control structures are based on the "success" or "failure" of expressions, rather than on boolean logic, as in most other programming languages. This feature derives directly from SNOBOL, in which any pattern match and/or replacement operation could be followed by success and/or failure clauses that specified a statement label to be branched to under the requisite condition. Under the goal-directed branching model, a simple comparison like if a < b does not mean, "if the operations to the right evaluate to true" as they would under most languages; instead, it means something more like, "if the operations to the right succeed". In this case the < operator succeeds if the comparison is true, so the end result is the same. In addition, the < operator returns its second argument if it succeeds, allowing things like if a < b < c, a common type of comparison that in most languages must be written as a conjunction of two inequalities like if (a < b) && (b < c).

Icon uses success or failure for all flow control, so this simple code:

if a := read() then write(a)

will copy one line of standard input to standard output. It will work even if the read() causes an error, for instance, if the file does not exist. In that case the statement a := read() will fail, and write will simply not be called.

Success and failure are passed "up" through functions, meaning that a failure inside a nested function will cause the functions calling it to fail as well. For instance, here is a program that copies an entire file:

while write(read())

When the read() command fails, at the end of file for instance, the failure will be passed up the call chain, and write() will fail as well. The while, being a control structure, stops on failure. A similar example written in pseudocode (using syntax close to Java):

 try {
   while ((a = read()) != EOF) {
     write(a);
   }
 } catch (Exception e) {
   // do nothing, exit the loop
 }

This case needs two comparisons: one for end of file (EOF) and another for all other errors. Since Java does not allow exceptions to be compared as logic elements, as under Icon, the lengthy try/catch syntax must be used instead. Try blocks also impose a performance penalty, even if no exception is thrown, a distributed cost that Icon avoids.

Icon refers to this concept as goal-directed execution, referring to the way that execution continues until some goal is reached. In the example above the goal is to read the entire file; the read command succeeds when information has been read, and fails when it hasn't. The goal is thus coded directly in the language, instead of by checking return codes or similar constructs.

Generators[edit]

Expressions in Icon often return a single value, for instance, x < 5 will evaluate and succeed if the value of x is less than 5, or else fail. However, many expressions do not immediately return success or failure, returning values in the meantime. This drives the examples with every and to; every causes to to continue to return values until it fails.

This is a key concept in Icon, known as generators. Generators drive much of the loop functionality in the language, but without the need for an explicit loop comparing values at each iteration.

Within the parlance of Icon, the evaluation of an expression or function produces a result sequence. A result sequence contains all the possible values that can be generated by the expression or function. When the result sequence is exhausted the expression or function fails. Iteration over the result sequence is achieved either implicitly via Icon's goal-directed evaluation or explicitly via the every clause.

Icon includes several generator-builders. The alternator syntax allows a series of items to be generated in sequence until one fails:

 1 | "hello" | x < 5

can generate "1", "hello", and "5" if x is less than 5. Alternators can be read as "or" in many cases, for instance:

 if y < (x | 5) then write("y=", y)

will write out the value of y if it is smaller than x or 5. Internally Icon checks every value from left to right until one succeeds or the list empties and it returns a failure. Functions will not be called unless evaluating their parameters succeeds, so this example can be shortened to:

 write("y=", (x | 5) > y)

Another simple generator is to, which generates lists of integers; every write(1 to 10) will call write() ten times. The bang syntax generates every item of a list; every write(!aString) will output each character of aString on a new line.

This concept is powerful for string operations. Most languages include a function known as find or indexOf that returns the location of one string within another. For example:

 s = "All the world's a stage. And all the men and women merely players";
 i = indexOf("the", s)

This code will return 4, the position of the first occurrence of the word "the" (assuming the indices start at 0). To get the next instance of "the" an alternate form must be used,

 i = indexOf("the", s, 5)

the 5 at the end saying it should look from position 5 on. So to extract all the occurrences of "the", a loop must be used:

 s = "All the world's a stage. And all the men and women merely players";
 i = indexOf("the", s)
 while i != -1 {
   write(i);
   i =  indexOf("the", s, i+1);
 }

Under Icon the find function is a generator, and will return the next instance of the string each time it is resumed before failing when it reaches the end of the string. The same code can be written:

 s := "All the world's a stage. And all the men and women merely players"
 every write(find("the", s))

find will return the index of the next instance of "the" each time it is resumed by every, eventually reaching the end of the string and failing.

Of course there are times where one wants to find a string after some point in input, for instance, if scanning a text file containing data in multiple columns. Goal-directed execution works here as well:

 write(5 < find("the", s))

The position will only be returned if "the" appears after position 5; the comparison will fail otherwise. Comparisons that succeed return the right-hand result, so it is important to put the find on the right-hand side of the comparison. If it were written:

 write(find("the", s) > 5)

then "5" would be written instead of the result of find.

Icon adds several control structures for looping through generators. The every operator is similar to while, looping through every item returned by a generator and exiting on failure:

  every k := i to j do
   write(someFunction(k))

while re-evaluates the first result, whereas every produces all results. The every syntax actually injects values into the function in a fashion similar to blocks under Smalltalk. For instance, the above loop can be re-written this way:

 every write(someFunction(i to j))

Generators can be defined as procedures using the suspend keyword:

 procedure findOnlyOdd(pattern, theString)
   every i := find(pattern, theString) do
     if i % 2 = 1 then suspend i
 end

This example loops over theString using find to look for pattern. When one is found, and the position is odd, the location is returned from the function with suspend. Unlike return, suspend memorizes the state of the generator, allowing it to pick up where it left off on the next iteration.

Strings[edit]

Icon has features to make working with strings easier. The scanning system repeatedly calls functions on a string:

s ? write(find("the"))

is a short form of the examples shown earlier. In this case the subject of the find function is placed outside the parameters in front of the question mark. Icon function signatures identify the subject parameter so that it can be hoist in this fashion.

Substrings can be extracted from a string by using a range specification within brackets. A range specification can return a point to a single character, or a slice of the string. Strings can be indexed from either the right or the left. Positions within a string are defined to be between the characters 1A2B3C4 and can be specified from the right −3A−2B−1C0

For example,

 "Wikipedia"[1]     ==> "W"
 "Wikipedia"[3]     ==> "k"
 "Wikipedia"[0]     ==> "a"
 "Wikipedia"[1:3]   ==> "Wi"
 "Wikipedia"[-2:0]  ==> "ia"
 "Wikipedia"[2+:3]  ==> "iki"

Where the last example shows using a length instead of an ending position

The subscripting specification can be used as a lvalue within an expression. This can be used to insert strings into another string or delete parts of a string. For example,

    s := "abc"
    s[2] := "123"
    s now has a value of "a123c"
    s := "abcdefg"
    s[3:5] := "ABCD"
    s now has a value of "abABCDefg"
    s := "abcdefg"
    s[3:5] := ""
    s now has a value of "abefg"

Icon's subscript indices are between the elements. Given the string s := "ABCDEFG", the indexes are: 1A2B3C4D5E6F7G8. The slice s[3:5] is the string between the indices 3 and 5, which is the string "CD".

Other structures[edit]

Icon also has syntax to build lists (or arrays):

aCat := ["muffins", "tabby", 2002, 8]

The items within a list can be of any type, including other structures. To build larger lists, Icon includes thelist generator; {{{1}}} generates a list containing 10 copies of "word".

Like arrays in other languages, Icon allows items to be looked up by position, e.g., {{{1}}}. As with strings, the indices are between the elements, and a slice of a list can be obtained by specifying the range, e.g., aCat[2:4] produces the list ["tabby",2002]. Unlike for strings, a slice of an array is not an lvalue.

The bang-syntax enumerates the range. For example, every write(!aCat) will print out four lines, each with one element.

Icon includes stack-like functions, push and pop to allow arrays to form the bases of stacks and queues.

Icon also includes functionality for sets and associative arrays with tables:

 symbols := table(0)
 symbols["there"] := 1
 symbols["here"] := 2

This code creates a table that will use zero as the default value of any unknown key. It then adds two items into it, with the keys "there" and "here", and values 1 and 2.

String scanning[edit]

One of the powerful features of Icon is string scanning. The scan string operator, ?, saves the current string scanning environment and creates a new string scanning environment. The string scanning environment consists of two keyword variables, &subject and &pos, where &subject is the string being scanned, and &pos is the cursor or current position within the subject string.

For example,

  s := "this is a string"
  s ? write("subject=[",&subject,"] pos=[",&pos,"]")

would produce

subject=[this is a string] pos=[1]

Built-in and user-defined functions can be used to move around within the string being scanned. Many of the built-in functions will default to &subject and &pos (for example the find function). The following, for example, will write all blank-delimited "words" in a string.

  s := "this is a string"
  s ? {                               # Establish string scanning environment
      while not pos(0) do  {          # Test for end of string
          tab(many(' '))              # Skip past any blanks
          word := tab(upto(' ') | 0)  # the next word is up to the next blank -or- the end of the line
          write(word)                 # write the word
      }
  }

A more complex example demonstrates the integration of generators and string scanning within the language.

 procedure main()
     s := "Mon Dec 8"
     s ? write(Mdate() | "not a valid date")
 end
 # Define a matching function that returns
 # a string that matches a day month dayofmonth
 procedure Mdate()
 # Define some initial values
 static dates
 static days
 initial {
        days := ["Mon","Tue","Wed","Thr","Fri","Sat","Sun"]
        dates := ["Jan","Feb","Mar","Apr","May","Jun",
                  "Jul","Aug","Sep","Oct","Nov","Dec"]
 }
 every suspend   (retval <-  tab(match(!days)) ||     # Match a day
                             =" " ||                  # Followed by a blank
                             tab(match(!dates)) ||    # Followed by the month
                             =" " ||                  # Followed by a blank
                             matchdigits(2)           # Followed by at least 2 digits
                 ) &
                 (=" " | pos(0) ) &                   # Either a blank or the end of the string
                 retval                               # And finally return the string
 end
 # Matching function that returns a string of n digits
 procedure matchdigits(n)
     suspend (v := tab(many(&digits)) & *v <= n) & v
 end

The idiom of expr1 & expr2 & expr3 returns the value of the last expression.

See also[edit]

References[edit]

The definitive work is The Icon Programming Language (third edition) by Griswold and Griswold, ISBN 1-57398-001-3. It is out of print but can be downloaded as a PDF.

Icon also has co-expressions, providing non-local exits for program execution. See The Icon Programming language and also Shamim Mohamed's article Co-expressions in Icon.

External links[edit]