Jump to content

Scheme (programming language)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 80.56.174.125 (talk) at 19:19, 26 November 2008 (→‎External links). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Scheme
Paradigmmulti-paradigm
Designed byGuy L. Steele and Gerald Jay Sussman
First appeared1970s
Typing disciplinestrong, dynamic
Filename extensions.scm, .ss
Websitewww.scheme-reports.org
Major implementations
PLT Scheme, MIT/GNU Scheme, Scheme 48, Chicken, Gambit, Gauche, Guile, Bigloo, Chez Scheme, STk, STklos, Larceny, SCM, Kawa, Ikarus, Mosh, Ypsilon
Dialects
T
Influenced by
Lisp, ALGOL
Influenced
Common Lisp, JavaScript, Ruby, Dylan, Lua

Scheme is a multi-paradigm programming language. It is one of the two main dialects of Lisp and supports a number of programming paradigms but is best known for its support of functional programming. It was developed by Guy L. Steele and Gerald Jay Sussman in the 1970s. Scheme was introduced to the academic world via a series of papers now referred to as Sussman and Steele's Lambda Papers. There are two standards that define the Scheme language: the official IEEE standard, and a de facto standard called the Revisedn Report on the Algorithmic Language Scheme, nearly always abbreviated RnRS, where n is the number of the revision. The most widely implemented standard is R5RS[1], and on August 28, 2007, R6RS,[2] the next major revision of the Scheme language was ratified,[3] with about two thirds of the voters in favor of R6RS.

Scheme's philosophy is minimalist. Scheme provides as few primitive notions as possible and, where practical, lets everything else be provided by programming libraries.

Scheme was the first dialect of Lisp to choose static (a.k.a. lexical) over dynamic variable scope. It was also one of the first programming languages to support first-class continuations.

Origin

Scheme started as an attempt to understand Carl Hewitt's Actor model.[4] Scheme was originally called "Schemer", in the tradition of other Lisp-derived languages like Planner or Conniver. The current name resulted from the authors' use of the ITS operating system, which limited filenames to two components of at most six characters each. Currently, "Schemer" is commonly used to refer to a Scheme programmer.

Distinguishing features

Like all Lisp dialects, Scheme has a very simple syntax. There are no operator precedence rules because fully nested and parenthesized notation is used for all compound forms. Example (the recursive factorial function):

 (define (fact n)
    (if (= n 0)
        1
        (* n (fact (- n 1)))))

Scheme is a minimalist language. The R5RS language standard[1] is only 50 pages, including a denotational semantics for the language core. The latest revision of the standard, R6RS, has been expanded[2] to describe several libraries.

In contrast with Common Lisp, Scheme is a "Lisp-1". All data and functions share a common namespace in Scheme, whereas in Common Lisp functions and data have separate namespaces and it is thus possible (in Common Lisp) for a function and a variable to have the same name.

Procedures in Scheme are first-class values, as are continuations. Scheme's call-with-current-continuation procedure (also known as call/cc) captures the current continuation, enabling the programmer to create non-local control constructs that must be built into other languages, such as iterators, coroutines, and backtracking.

A simple use of call/cc is as follows:

(define (add-if-all-numbers lst)
  (call/cc
    (lambda (exit)
      (let loop ((lst lst) (sum 0))
        (if (null? lst) sum
          (if (not (number? (car lst))) (exit #f)
            (loop (cdr lst) (+ sum (car lst)))))))))

This adds an arbitrary list of numbers, but if a non-numeric value is found in the list the procedure is aborted immediately and the constant value #f (false) is returned. This is achieved by capturing the current continuation in the variable exit and using it as an "escape procedure".

Scheme supports lazy evaluation through the delay and force forms.

(define a 10)
(define eval-aplus2 (delay (+ a 2)))
(define a 20)
(force eval-aplus2)
=> 22

delay and force have been the subject of much discussion within the Scheme community because implementing many popular forms of lazy evaluation is actually quite difficult using the Scheme primitives.[5] For example, a Scheme Request For Implementation, SRFI-40, describes a "streams" library which defines a lazily-evaluated list type; this was withdrawn by its author, Philip L. Bewig, as a result of discussion that unveiled a serious space leak in the specification. The revised version, SRFI-41, is currently in draft status.[6]

Scheme's high level macro system allows the user to add new syntactic constructs to the language. It respects the lexical scoping of the rest of the language, which avoids common programming errors that can occur in the macro systems of other programming languages. Many implementations also provide a more conventional low level macro system.

Tail recursion

Scheme has looping constructs, but it is idiomatic to use tail recursion to express loops. Scheme implementations are required to optimize tail calls to run in constant space.[1]

Taking the factorial example above:

 (define (fact n)
    (if (= n 0)
        1
        (* n (fact (- n 1)))))

This is not tail recursive because factorial n is evaluated recursively by first evaluating factorial n-1 as an intermediate value, then multiplying the result by n. The last operation in the evaluation (the "tail") is the multiplication.

A tail recursive version can be written as follows:

  (define (fact n)
    (define (fact2 n m)
      (if (= n 0)
          m
          (fact2 (- n 1) (* m n))))
    (fact2 n 1))

Although this is written in a recursive form, the recursion is the last operation in evaluating the procedure (a "tail call"), and in effect replaces the procedure invocation by another (for instance, (fact2 10 1) is replaced by (fact2 9 10)).

A tail call is sometimes described as "a goto with parameters" because its effect is the same as branching to the start of the procedure and replacing the old parameters with new ones. It is this characteristic that makes it possible for Scheme compilers and interpreters to guarantee that tail recursive procedures will always be evaluated in constant space.

Language elements

Comments

Each comment is preceded by a semicolon (;) and extends for the rest of the line. Some implementations allow comments to span multiple lines by wrapping them with a #|...|# (possibly nested). Other implementations allow an entire s-expression to be commented out by prepending it with #;.[7] These two comment forms are included in the R6RS.

Variables

Variables are dynamically typed. Variables are bound by a define, a let expression, and a few other Scheme forms. Variables bound at the top level with a define are in global scope.

  (define var1 value)

Variables bound in a let are in scope for the body of the let.

  (let ((var1 value))
    ...
    ; scope of var1
    ...)

let is a convenient syntax that is not fundamentally necessary. A let expression can be implemented using procedures directly. For example, the above is equivalent to:

  ((lambda (var1)
    ...
    ; scope of var1
    ...) value)

Functions

1 (define fun
   (lambda (arg1 arg2)
     ...))
2 (define (fun arg1 arg2)
   ...)
3 (fun value1 value2)
4 (apply fun (list value1 value2))

Functions (often called procedures) are first-class objects in Scheme. They can be arguments to other functions and be returned by them. They can be assigned to variables. Functions are created by lambda forms. For example, a function with two arguments arg1 and arg2 is defined in line 1. Line 2 is a shorter, equivalent form; line 3 shows how functions are applied. Note that the function being applied is in the first position of the list while the rest of the list contains the arguments. The apply function will take its first argument and apply it to a given list of arguments, so the previous function call can also be written as seen in line 4.

In Scheme, functions are divided into two basic categories: procedures and primitives. All primitives are procedures, but not all procedures are primitives. Primitives are pre-defined functions in the Scheme language. These include +, -, *, /, set!, car, cdr, and other basic procedures. Procedures are user-defined functions. In several variations of Scheme, a user can redefine a primitive. For example, the code

(define (+ x y)
  (- x y))

or simply

(define + -)

actually redefines the + primitive to perform subtraction, rather than addition.

Lists

Scheme uses the singly-linked list data structure, implemented using a primitive data type called the pair, with accessors: getters car and cdr and setters set-car! and set-cdr!. list-ref provides access to an arbitrary member of a list, length gives its length, and the list constructor is list. There are also procedures to reverse a list, to obtain the tail of a list, to check for list membership, and to perform key-value lookups (association lists).

Data types

Besides procedures, continuations, pairs and lists, Scheme provides the following data types: atomic symbols, numbers, booleans, characters, strings, vectors and input/output ports.[1] Association lists are provided by standard procedures, and many Scheme implementations also offer hash tables and such structures.[8] Extensions are standardized through a system of "Scheme Requests for Implementation" (SRFIs).[9]

Since the IEEE Scheme standard and the R4RS Scheme standard, Scheme has asserted that all of the above types are disjoint, that is no value can belong to more than one of these types; however some very old implementations of Scheme may predate these standards such that #f and '() refer to the same value, as is the case in traditional Lisp including Common Lisp. All currently active implementations use the R4RS interpretation.

The numeric type is further divided into a numerical tower, with subtypes complex, real, rational and integer. (Note that these subtypes are not disjoint; in fact each type is a subset of the previous one). While it is not required that a Scheme implementation support the entire numerical tower, most implementations do. In addition to these traditional properties, Scheme numbers may have the property of "exactness". Integers and rational numbers are exact. An arithmetic operation involving numbers one or more of which is inexact has an inexact result.[1]

The boolean type represents true and false by #t and #f respectively. For historical reasons, however, any value can be used where a boolean is expected - any value other than #f is considered to be true, including the empty list (in traditional Lisp and Common Lisp, the empty list is considered to be false).[1]

Symbols can be created in at least the following ways:

'hello
(string->symbol "hello")

Symbols have historically been regarded as case-insensitive ('Aa is the same symbol as 'AA) and this was guaranteed in the standard up to R5RS, but many Scheme implementations have provided case-sensitive symbols, and a major change in R6RS is to switch to case-sensitive symbols as standard. Implementation of case-insensitivity is a relatively trivial matter, usually involving only a conversion of the case of incoming symbols in the reader procedure which serves as the lexical scanner and parser in most Scheme implementations.

Equality

Scheme has three different types of equality between arbitrary objects denoted by three different relational operators for testing equality, eq?, eqv? and equal?:

  • eq? evaluates to #f unless its parameters represent the same data object in memory;
  • eqv? is generally the same as eq? but treats primitive objects (eg. characters and numbers) specially so that numbers that represent the same value are eqv? even if they do not refer to the same object;
  • equal? compares data structures such as lists, vectors and strings to determine if they have congruent structure and eqv? contents.[1]

Type dependent equivalence operations also exist in Scheme: string=?; compares two strings; char=? compares characters; = compares numbers.[1]

Control structures

Conditional evaluation

(if test then-expr else-expr)

The test expression is evaluated, and if the evaluation result is true (anything other than #f) then the then-expr is evaluated, otherwise else-expr is evaluated.

A form that is more convenient when conditionals are nested is cond:

(cond (test1 expr1 ...)
      (test2 expr2 ...)
      ...
      (else exprn))

The first expression for which the test evaluates to true will be evaluated. If all tests result in #f, the else clause is evaluated.

A variant of the cond clause is

(cond ...
      (test => expr)
      ...)

In this case, expr should evaluate to a function that takes one argument. If test evaluates to true, the function is called with the return value of test.

(case expr0 ...
      ((datum1) expr1a expr1b...)
      ((datum2) expr2a expr2b...)
      ...
      (else exprna exprnb...))

The expression is evaluated and compared, in sequence, to each datum. If a match is found (using eqv?) then the corresponding sequence of expressions is evaluated in turn and the result of the case is the value of the final expression. If no match is found, the else arm of the case is evaluated. The else clause may be omitted altogether, in which case the value of the expression is unspecified if there is no match.

(and expr1 expr2...)
(or expr1 expr2...)

And and or are counted as conditionals in R5RS because they are frequently used for this purpose in actual code. And evaluates its operands from left to right until it gets to the end or one of them evaluates to the value #f. The form evaluates to the value of the last-evaluated operand. Or has the same semantics with the exception that it stops when it evaluates a value that is not #f. They are similar to the C short-circuit evaluation operators && and ||, which are also found in many programming languages such as Java and Perl, where they are also often used for conditional evaluation.

Input/output

Scheme has the concept of ports to read from or to write to.[1] R5RS defines two default ports, accessible with the functions current-input-port and current-output-port, which correspond to the Unix notions of stdin and stdout. Most implementations also provide current-error-port. Redirection of input and standard output is supported in the standard, by standard procedures such as with-input-from-file and with-output-to-file.

Implementations

Current implementations include: Bigloo, Chez Scheme, Chicken, Gambit, Gauche, Guile, Ikarus, JScheme, Kawa, Larceny, MIT/GNU Scheme, Mosh, PLT Scheme, Pvts, RScheme, Scheme 48, SCM, SISC, Stalin, STk, STklos, TinyScheme, Ypsilon.

Almost all implementations provide a traditional Lisp-style read-eval-print loop for development and debugging. Most also compile Scheme programs to executable binary. Support for embedding Scheme code in programs written in other languages is also common, as the relative simplicity of Scheme implementations make Scheme a popular choice for adding scripting capabilities to larger systems developed in languages such as C. Gambit, Chicken and Bigloo work by compiling Scheme to C, which makes embedding particularly easy. In addition, Bigloo's compiler can be configured to generate JVM bytecode, and it also features an experimental bytecode generator for .Net.

Some implementations support additional features. For example, Kawa and JScheme provide integration with Java classes, and the Scheme to C compilers often make it easy to use external libraries written in C, up to allowing the embedding of actual C code in the Scheme source. Another example is PVTS which offers a set of visual tools for supporting the learning of Scheme.

R6RS

A new language standardization process began at the 2003 Scheme workshop, with the goal of producing an R6RS standard in 2006. This process broke with the earlier RnRS approach of unanimity.

R6RS features a standard module system, allowing a split between the core language and libraries. A number of drafts of the R6RS specification were released, the final version being R5.97RS. A successful vote resulted in the ratification of the new standard, announced on August 28, 2007.[2]

Currently the newest releases of various Scheme implementations, e.g. Ikarus, Larceny, PLT Scheme, and Ypsilon support the R6RS standard. There is a portable reference implementation of the proposed implicitly-phased libraries for R6RS, loading and bootstrapping itself properly on various older Scheme implementations.[10]

R6RS introduces numerous significant changes to the language, which include the following: [11]

  • The source code is now specified in Unicode, and a large subset of unicode characters may now appear in Scheme symbols and identifiers, and there are other minor changes to the lexical rules.
  • Two new styles for comments have been added for multiline comments and expression comments.
  • Character data is now specified in Unicode.
  • Many standard procedures have been moved to the new standard libraries, which themselves form a large expansion of the standard, containing procedures and syntactic forms that were formerly not part of the standard. Supporting the libraries and the production of further libraries.
  • The new standard introduces a module system.
  • An exception system is now specified.
  • A more expressive syntactic abstraction facility (syntax-case) has been added which allows the use of all of Scheme at macro expansion time. Previously, only a limited pattern-based language (syntax-rules) was required.
  • Compliant implementations are now required to support Scheme's full numeric tower, and the semantics of numbers have been expanded, mainly in the direction of support for the IEEE 754 standard for floating point numerical representation.

Usage

Scheme is widely used by a number[12] of schools; in particular, a number of introductory Computer Science courses use Scheme in conjunction with the textbook Structure and Interpretation of Computer Programs.[13] For the past 12 years, PLT has run the TeachScheme! project, which has exposed close to 600 high school teachers and thousands of high school students to rudimentary Scheme programming. MIT's old introductory programming class 6.001 was taught in Scheme, but that class has been replaced with 6.01, which uses Python.[14] Caltech's introductory computer course CS1 is taught using Scheme. The introductory class at UC Berkeley, CS 61A, is taught entirely in Scheme, save minor diversions into Logo to demonstrate dynamic scope; all course materials, including lecture webcasts, are available online free of charge.[15] The introductory computer science course at Yale is also taught in Scheme.[16] Several introductory Computer Science courses at Rice University are also taught in Scheme.[17] Programming Design Paradigms,[18] a mandatory course for the Computer science Graduate Students at Northeastern University, also extensively uses Scheme.

There are relatively few examples of Scheme in apparent usage[19] for non-pedagogical purposes. However, the Document Style Semantics and Specification Language (DSSSL), which provides a method of specifying SGML stylesheets, uses a Scheme subset.[20] In addition, the well-known open source raster graphics editor, the GIMP uses Scheme as a scripting language.[21] Guile has been adopted by GNU project as its official scripting language, and that implementation of Scheme is embedded in such applications as GNU LilyPond and GnuCash as a scripting language for extensions. Chez Scheme has been used at Disney World in Florida for controlling virtual rides (Kent Dybvig, invited talk at International Conference on Functional Programming, 2006). Elk Scheme is used by Synopsys as a scripting language for its technology CAD (TCAD) tools.[22] Shiro Kawai used Scheme to glue Final Fantasy: The Spirits Within together.[23]

See also

References

  1. ^ a b c d e f g h i Richard Kelsey, William Clinger, Jonathan Rees; et al. (1998). "Revised5 Report on the Algorithmic Language Scheme". Higher-Order and Symbolic Computation. 11 (1): 7–105. doi:10.1023/A:1010051815785. {{cite journal}}: Explicit use of et al. in: |author= (help); Unknown parameter |month= ignored (help)CS1 maint: multiple names: authors list (link)
  2. ^ a b c "R6RS.org".
  3. ^ R6RS ratification-voting results
  4. ^ "We wanted to better understand Hewitt's actors model but were having trouble relating the actors model and its unusual terminology to familiar programming notions. We decided to construct a toy implementation of an actor language so that we could play with it. Using MacLisp as a working environment, we wrote a tiny Lisp interpreter and then added mechanisms for creating actors and sending messages." Gerald Jay Sussman and Guy L. Steele, Jr. (1998). "The First Report on Scheme Revisited" (PDF). Higher-Order and Symbolic Computation. 11 (4): 399–404. doi:10.1023/A:1010079421970. ISSN 1388-3690. Retrieved 2006-06-19. {{cite journal}}: Unknown parameter |month= ignored (help)
  5. ^ srfi-45: Primitives for Expressing Iterative Lazy Algorithms by André van Tonder
  6. ^ srfi-41: Streams by Philip L. Bewig
  7. ^ Taylor Campbell (2005-07-21). "SRFI 62: S-expression comments".
  8. ^ Matthew Flatt (2006). "PLT MzScheme Language Manual". {{cite web}}: Unknown parameter |month= ignored (help)
  9. ^ Scheme Requests for Implementation, accessed 14 September, 2007
  10. ^ R^6RS Libraries and syntax-case system
  11. ^ Revised^6 Report on the Algorithmic Language Scheme, Appendix E: language changes by the R6RS Steering Committee, 26 September 2007
  12. ^ schemers.comlist of Scheme-using schools
  13. ^ MIT Press list of SICP-using schools
  14. ^ "I talked to Professor Sussman on the phone ... He said that he'd actually been trying to have 6.001 replaced for the last ten years (and I read somewhere that Professor Abelson was behind the move too). Understanding the principles is not essential for an introduction to the subject matter anymore. He sees 6.001 as obsolete." From MIT Admissions Blog, 'The End of an Era', retrieved 2008-08-05
  15. ^ Computer Science 61A, Berkeley
  16. ^ [1] Introduction to Computer Science (CPSC 201)
  17. ^ Computer Science Courses 100-400, Rice University
  18. ^ [2]Programming Design Paradigms CSG107 Course Readings
  19. ^ scheme-faq-general; see What is Scheme used for?
  20. ^ DSSSL Technology report
  21. ^ "The major scripting language for the GIMP that has been attached to it today is Scheme." From The GIMP Basic Scheme Tutorial.
  22. ^ http://www.synopsys.com/products/tcad/pdfs/sde_ds.pdf
  23. ^ http://practical-scheme.net/docs/ILC2002.html

Further reading

External links