Jump to content

List comprehension

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by BG19bot (talk | contribs) at 23:45, 26 January 2016 (BRFA test 2 - double blank lines using AWB (11842)). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A list comprehension is a syntactic construct available in some programming languages for creating a list based on existing lists. It follows the form of the mathematical set-builder notation (set comprehension) as distinct from the use of map and filter functions.

Overview

Consider the following example in set-builder notation.

This can be read, " is the set of all numbers "2 times " where is an item in the set of natural numbers (), for which squared is greater than ."

The smallest natural number, x = 1, fails to satisfy the condition x2>3 (the condition 12>3 is false) so 2 ·1 is not included in S. The next natural number, 2, does satisfy the condition (22>3) as does every other natural number. Thus S consists of 2 ·2, 2 ·3, 2 ·4... so S = 4, 6, 8, 10,... ie, all even numbers greater than 2.

In this annotated version of the example:

  • is the variable representing members of an input set.
  • represents the input set, which in this example is the set of natural numbers
  • is a predicate expression acting as a filter on members of the input set.
  • is an output expression producing members of the new set from members of the input set that satisfy the predicate expression.
  • braces indicate that the result is a set
  • the vertical bar and the comma are separators.

A list comprehension has the same syntactic components to represent generation of a list in order from an input list or iterator:

  • A variable representing members of an input list.
  • An input list (or iterator).
  • An optional predicate expression.
  • And an output expression producing members of the output list from members of the input iterable that satisfy the predicate.

The order of generation of members of the output list is based on the order of items in the input.

In Haskell's list comprehension syntax, this set-builder construct would be written similarly, as:

s = [ 2*x | x <- [0..], x^2 > 3 ]

Here, the list [0..] represents , x^2>3 represents the predicate, and 2*x represents the output expression.

List comprehensions give results in a defined order (unlike the members of sets); and list comprehensions may generate the members of a list in order, rather than produce the entirety of the list thus allowing, for example, the previous Haskell definition of the members of an infinite list.

History

The SETL programming language (later 1960s) had a set formation construct, and the computer algebra system AXIOM (1973) has a similar construct that processes streams, but the first use of the term "comprehension" for such constructs was in Rod Burstall and John Darlington's description of their functional programming language NPL from 1977.

Smalltalk block context messages which constitute list comprehensions have been in that language since at least Smalltalk-80.

Burstall and Darlington's work with NPL influenced many functional programming languages during the 1980s, but not all included list comprehensions. An exception was the influential pure lazy functional programming language Miranda, which was released in 1985. The subsequently developed standard pure lazy functional language Haskell includes many of Miranda's features, including list comprehensions.

Comprehensions were proposed as a query notation for databases[1] and were implemented in the Kleisli database query language.[2]

Examples in different programming languages

The following provides a few examples of specific syntax used in programming languages.

Although the original example denotes an infinite list, few languages can express that, so in some of those cases we show how to take a subset of rather than a subset of .

B-Prolog

L @= [Y : X in 1..100, [Y], (X*X>3, Y is 2*X)]

A list of the form [T : E1 in D1, ..., En in Dn, LocalVars, Goal] is interpreted as a list comprehension in calls to @=/2 and constraints. A list comprehension is translated into a foreach construct with an accumulator.

C#

from x in Enumerable.Range(0, 100)
where x * x > 3
select x * 2

C# lazily generates results on demand. Results can be automatically processed in parallel on a multi-core system using Parallel LINQ.

Ceylon

{ for (x in 0..100) if ( x**2 > 3) x * 2 }

Clojure

Clojure generates infinite lazy sequences (similar to Haskell's lazy lists or Python's generators). Use take to get the first N results from the infinite sequence.

 (take 20
   (for [x (range) :when (> (* x x) 3)]
     (* 2 x)))
 ;; ⇒ (4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42)

An example without the infinite sequence:

 (for [x (range 20) :when (> (* x x) 3)]
   (* 2 x))

CoffeeScript

CoffeeScript brings list comprehensions to JavaScript.

 (x * 2 for x in [0..20] when x*x > 3)

Common Lisp

List comprehensions can be expressed with the loop macro's collect keyword. Conditionals are expressed with if, as follows:

(loop for x from 0 to 100 if (> (* x x) 3) collect (* 2 x))

An infinite lazy sequence can be created in a variety of ways, such as the CLOS object system or a yield macro.

Elixir

The same example in Elixir:

for x <- 0..100, x*x > 3, do: x*2

Erlang

The same example in Erlang:

 S = [2*X || X <- lists:seq(0,100), X*X > 3].

F#

The F# generator comprehension has the list comprehension syntax elements. Generator comprehensions can be used to generate Lists, Sequences (lazy lists) and Arrays (not discussed here).

Generators are of the form [for x in collection do ... yield expr] for lists and seq {for x in collection do ... yield expr} for sequences. For example:

> seq { for x in 0..100 do
          if x*x > 3 then yield 2*x } ;;
val it : seq<int> = seq [4; 6; 8; 10; ...]

Falcon

The "comp" generic method family provides wide support for comprehension. For example, the "mfcomp" method can be applied to an array:

  s = [].mfcomp( { i => if i*i > 3: return 2*i; return oob(1)}, [1:101] )

Falcon can also use functional generators to provide input lists. For example, the following code uses a continuation to create a set of pairs.

  gen = Continuation( function( max, c )
         i = 0
         while i < max: c(++i)
         return oob(0)
      end )
  data = [10,11,12]
  s = Set().mfcomp( {x, y => x+y}, .[gen 3], data )

Method "comp" was introduced in version 0.9.6, and methods "mcomp" and "mfcomp" in version 0.9.6.2.

Groovy

Groovy supports list comprehension style expressions for any kind of Java Collection including lists, sets, and maps.

s = (1..100).grep { it ** 2 > 3 }.collect { it * 2 }

The "it" variable is shorthand for the implicit parameter to a closure. The above is equivalent to:

s = (1..100).grep { x -> x ** 2 > 3 }.collect { x -> x * 2 }

Haskell

Please refer to the main example in the overview.

s = [ 2*x | x <- [0..], x^2 > 3 ]

Here, the list [0..] generates natural numbers one by one which get bound to variable x, x^2>3 represents the predicate that either accepts or rejects a given variable's value, and 2*x represents the result expression. There might be several generators and test predicates in one list comprehension expression in Haskell, in effect defining nested loops, e.g.:

s = [ 2*x*y | x <- [0..], x^2 > 3, y <- [1,3..x], y^2 < 100-x^2]
--   for each x from 0 by 1: 
--     if x^2 > 3:
--       for each y from 1 by 2 upto x:
--         if y^2 < 100 - x^2:
--           produce 2*x*y

The above expression becomes unproductive ("stuck") at some point, when new xs keep being generated only to be rejected later on. This is so because any test can only reject a value it is given, not any future ones (there is no cut mechanism here, in Prolog terms - a generator in general might produce its values unordered, like e.g. the above expression itself). This can be dealt with using bounded list generators always or by enclosing a generator inside a take or takeWhile call, limiting the number of generated values.

Haxe

Haxe 3 released with array and map comprehension.[3]

  var s = [for(x in [0, 1, 2, 3, 4, 5, 6, 7]) if(x * x < 5) x];

However, Haxe 2's syntax required use of Lambda:

  var a = [0, 1, 2, 3, 4, 5, 6, 7];
  var s = Lambda.array(Lambda.filter(a, function(x) return x * x < 5));

JavaScript 1.7

JavaScript 1.7 has array comprehensions. The SpiderMonkey JavaScript engine of the popular Firefox browser from Mozilla Foundation supports them.[4] For example:

 js> [2*x for each (x in [0,1,2,3,4,5,6,7]) if (x*x<5)]
 [0, 2, 4]

The sequence of integers can be obtained by prototyping the Number object,

Number.prototype.__iterator__=function(){for (var i=0; i<this; i++) yield i}
var s = [2*x for (x in 100) if (x*x<5)]

Or introducing a range function,

var range = function(start,end){for (var i=start; i<=end; i++) yield i}
var s = [2*x for (x in range(0,100)) if (x*x<5)]

Julia

Julia supports comprehensions using the syntax:

 y = [x^2+1 for x in 1:10]

and multidimensional comprehensions like:

 z = [(x-5)^2+(y-5)^2 for x = 0:10, y = 0:10]

Mathematica

The Cases command with a RuleDelayed in the second argument provides a list comprehension mechanism:

  s = Cases[Range[0,100], i_ /; i^2 > 3 :> 2i]

Alternatively

  Table[If[i^2 > 3, 2i, Unevaluated[]], {i, 0, 100}]

  Do[If[i^2 > 3, Sow[2i]], {i, 0, 100}] // Reap

OCaml

- With Batteries

OCaml Batteries Included has uniform comprehension syntax for lists, arrays, enumerations (like streams), lazy lists (like lists but evaluated on-demand), sets, hashtables, etc.

Comprehension are of the form [? expression | x <- enumeration ; condition; condition ; ... ?].

For instance,

#   [? 2 * x | x <- 0 -- max_int ; x * x > 3 ?];;
- : int Enum.t = <abstr>

or, to compute a list,

#   [? List: 2 * x | x <- 0 -- 5 ; x * x > 3 ?];;
- : int list = [4; 6; 8; 10]

or, to compute a set,

#   [? PSet: 2 * x | x <- 0 -- 5 ; x * x > 3 ?];;
- : int PSet.t = <abstr>

etc.

For example :

#    #use "topfind";;
#    #require "batteries";;
#    #require "pa_comprehension";;
#    module Enum = BatEnum;;
#    [? 2 * x | x <- 0 -- 10; (x * x) mod 3 = 0 ?] |> Enum.iter (fun n -> Printf.printf "%d\n" n);;
0, 6, 12, 18,
- : unit = ()

- With Camlp4 extension

You can also use the Camlp4 extension :

#   #use "topfind";;
#   #camlp4o;;
#   #load "Camlp4Parsers/Camlp4ListComprehension.cmo";;
#   let rec range a b = if a > b then [] else a :: (range (a+1) b);;
- : val range : int -> int -> int list = <fun>
#   let triples n = [ (a,b,c) | a <- range 0 n; b <- range a n; c <- range b n; a*a + b*b = c*c ];;
- : val triples : int -> (int * int * int) list = <fun>
#   triples 10;;
- : (int * int * int) list = [(0, 0, 0); (0, 1, 1); (0, 2, 2); (0, 3, 3); (0, 4, 4); (0, 5, 5); (0, 6, 6); (0, 7, 7); (0, 8, 8); (0, 9, 9); (0, 10, 10); (3, 4, 5); (6, 8, 10)]

Octave

GNU Octave can do list (vector) comprehensions in the form (vector expression)(vector condition).

For example,

 octave:1> x=0:100; s=(2*x)(x.**2<5)
 s =
  0 2 4

Perl 6

Perl 6 provides more than one way to implement list comprehensions.

my @s = ($_ * 2 if $_ ** 2 > 3 for ^100);

Or, using gather:

my @s = gather { for ^100 { take 2 * $_ if $_ ** 2 > 3 } };

Picat

[2*X : X in 1..100, X*X>3]

A list comprehension in Picat takes the form [T : E1 in D1, Cond1, ..., En in Dn, Condn]. A list comprehension is compiled into a foreach loop, which is further compiled into a tail-recursive predicate.

PowerShell

0..100 | Where {$_ * $_ -gt 3} | ForEach {$_ * 2}

Pure

The same example in Pure:

s = [2*n | n=1..100; n*n > 3];

Python

The Python programming language (starting in version 2.0) has a corresponding syntax for expressing list comprehensions.[5] The Python equivalent of the above example is

s = [2 * x for x in range(101) if x ** 2 > 3]

Racket

Racket provides functional versions of for-loops, which are essentially list comprehension syntax:

;; (in-range 11) would also work. 
;; in-range produces a stream; range produces a list. 
;; Since its finite, either is fine here.
(for/list ([x (range 101)] #:when (> (* x x) 3))
  (* 2 x))

The imperative for can also be used, combined with Racket's generator library to produce an infinite generator:

(require racket/generator)
(generator ()
  (for ([x (in-naturals)] #:when (> (* x x) 3))
    (yield (* 2 x))))

Ruby

In the Ruby language you can use multiple ways to simulate this function, for example:

(1..100).select{|x| x ** 2 > 3 }.collect{|x| 2 * x}

Or you can define your own method:

module Enumerable
  def comprehend(&block)
    return self if block.nil?
    collect(&block).compact
  end
end

(1..100).comprehend {|x| 2 * x if x ** 2 > 3}

Scala

Using a for-expression:

val s = for (x <- Stream from 0 if x * x > 3) yield 2 * x
val s = for {x <- 0 to 100 if x * x > 3} yield  x * 2

Using a collections:

val s = (0 to 100).filter(x => x * x > 3).map(_ * 2)

Scheme

Although there is no standard list comprehension syntax in R5RS, many implementations provide an extension for this. For example, in Chicken Scheme:

(require-extension list-of)
(list-of (* 2 x) (x range 0 101) (> (* x x) 3))

There is also a portable library SRFI/42 "Eager Comprehensions", which in particular includes list comprehensions:

(require srfi/42) ; example import into Racket Scheme
(list-ec (: x 101) (if (> (* x x) 3)) (* 2 x))

SETL

s := {2*x : x in {0..100} | x**2 > 3 };

Smalltalk

((1 to: 100) select: [:x|x*x>3]) collect: [:x|2*x]

SuperCollider

In SuperCollider list comprehensions are implemented as Routines, whose results can be collected with the message 'all'. A shortcut syntax is provided for defining list comprehensions, which internally translates to a routine.

all {: x * 2, x <- (1..100), x ** 2 > 3 }

Visual Prolog

S = [ 2*X || X = std::fromTo(1, 100), X^2 > 3 ]

Similar constructs

Monad comprehension

In Haskell, a monad comprehension is a generalization of the list comprehension to other monads in functional programming.

Set comprehension

Version 3.x and 2.7 of the Python language introduces syntax for set comprehensions. Similar in form to list comprehensions, set comprehensions generate Python sets instead of lists.

>>> s = {v for v in 'ABCDABCD' if v not in 'CB'}
>>> print(s)
{'A', 'D'}
>>> type(s)
<class 'set'>
>>>

Racket set comprehensions generate Racket sets instead of lists.

(for/set ([v "ABCDABCD"] #:unless (member v (string->list "CB")))
         v))

Dictionary comprehension

Version 3.x and 2.7 of the Python language introduced a new syntax for dictionary comprehensions, similar in form to list comprehensions but which generate Python dicts instead of lists.

>>> s = {key: val for key, val in enumerate('ABCD') if val not in 'CB'}
>>> s
{0: 'A', 3: 'D'}
>>>

Racket hash table comprehensions generate Racket hash tables (one implementation of the Racket dictionary type).

(for/hash ([(val key) (in-indexed "ABCD")]
           #:unless (member val (string->list "CB")))
  (values key val))

Parallel list comprehension

The Glasgow Haskell Compiler has an extension called parallel list comprehension (also known as zip-comprehension) that permits multiple independent branches of qualifiers within the list comprehension syntax. Whereas qualifiers separated by commas are dependent ("nested"), qualifier branches separated by pipes are evaluated in parallel (this does not refer to any form of multithreadedness: it merely means that the branches are zipped).

-- regular list comprehension
a = [(x,y) | x <- [1..5], y <- [3..5]]
-- [(1,3),(1,4),(1,5),(2,3),(2,4) ...

-- zipped list comprehension
b = [(x,y) | (x,y) <- zip [1..5] [3..5]]
-- [(1,3),(2,4),(3,5)]

-- parallel list comprehension
c = [(x,y) | x <- [1..5] | y <- [3..5]]
-- [(1,3),(2,4),(3,5)]

Racket's comprehensions standard library contains parallel and nested versions of its comprehensions, distinguished by "for" vs "for*" in the name. For example, the vector comprehensions "for/vector" and "for*/vector" create vectors by parallel versus nested iteration over sequences. The following is Racket code for the Haskell list comprehension examples.

> (for*/list ([x (in-range 1 6)] [y (in-range 3 6)]) (list x y))
'((1 3) (1 4) (1 5) (2 3) (2 4) (2 5) (3 3) (3 4) (3 5) (4 3) (4 4) (4 5) (5 3) (5 4) (5 5))
> (for/list ([x (in-range 1 6)] [y (in-range 3 6)]) (list x y))
'((1 3) (2 4) (3 5))

In Python we could do as follows:

# regular list comprehension
>>> a = [(x, y) for x in range(1, 6) for y in range(3, 6)]
[(1, 3), (1, 4), (1, 5), (2, 3), (2, 4), ...

# parallel/zipped list comprehension
>>> b = [x for x in zip(range(1, 6), range(3, 6))]
[(1, 3), (2, 4), (3, 5)]

XQuery and XPath

Like the original NPL use, these are fundamentally database access languages.

This makes the comprehension concept more important, because it is computationally infeasible to retrieve the entire list and operate on it (the initial 'entire list' may be an entire XML database).

In XPath, the expression:

 /library/book//paragraph[@style='first-in-chapter']

is conceptually evaluated as a series of "steps" where each step produces a list and the next step applies a filter function to each element in the previous step's output.[6]

In XQuery, full XPath is available, but FLWOR statements are also used, which is a more powerful comprehension construct.[7]

 for $b in //book
 where $b[@pages < 400]
 order by $b//title
 return
   <shortBook>
     <title>{$b//title}</title>
     <firstPara>{($book//paragraph)[1]}</firstPara>
   </shortBook>

Here the XPath //book is evaluated to create a sequence (aka list); the where clause is a functional "filter", the order by sorts the result, and the <shortBook>...</shortBook> XML snippet is actually an anonymous function that builds/transforms XML for each element in the sequence using the 'map' approach found in other functional languages.

So, in another functional language the above FLWOR statement may be implemented like this:

 map(
   newXML(shortBook, newXML(title, $1.title), newXML(firstPara, $1...))
   filter(
     lt($1.pages, 400),
     xpath(//book)
   )
 )

LINQ in C#

C# 3.0 has a group of related features called LINQ, which defines a set of query operators for manipulating object enumerations.

var s = Enumerable.Range(0, 100).Where(x => x*x > 3).Select(x => x*2);

It also offers an alternative comprehension syntax, reminiscent of SQL:

var s = from x in Enumerable.Range(0, 100) where x*x > 3 select x*2;

LINQ provides a capability over typical List Comprehension implementations. When the root object of the comprehension implements the IQueryable interface, rather than just executing the chained methods of the comprehension, the entire sequence of commands are converted into an Abstract Syntax Tree (AST) object, which is passed to the IQueryable object to interpret and execute.

This allows, amongst other things, for the IQueryable to

  • rewrite an incompatible or inefficient comprehension
  • translate the AST into another query language (e.g. SQL) for execution

C++

C++ does not have any language features directly supporting list comprehensions but operator overloading (e.g., overloading |, >>, >>=) has been used successfully to provide expressive syntax for "embedded" query DSLs. Alternatively, list comprehensions can be constructed using the erase-remove idiom to select elements in a container and the STL algorithm for_each to transform them.

#include <algorithm>
#include <list>

using namespace std;

template<class C, class P, class T>
C&& comprehend(C&& source, const P& predicate, const T& transformation)
{
  // initialize destination
  C d = forward(source);

  // filter elements
  d.erase(remove_if(begin(d), end(d), predicate), end(d));

  // apply transformation
  for_each(begin(d), end(d), transformation);

  return d;
}

int main()
{
  list<int> range(10);  
      // range is a list of 10 elements, all zero
  iota(begin(range), end(range), 1);
      // range now contains 1,2,...,10

  list<int> result = comprehend(
      range,
      [](int x){return x*x <= 3;},
      [](int &x){x *= 2;});
      // result now contains 4,6,...,20
}

There is some effort in providing C++ with list-comprehension constructs/syntax similar to the set builder notation.

  • In Boost.Range [1] library there is a notion of adaptors [2] that can be applied to any range and do filtering, transformation etc. With this library, the original Haskell example would look like (using Boost.Lambda [3] for anonymous filtering and transforming functions):
counting_range(1,10) | filtered( _1*_1 > 3 ) | transformed(ret<int>( _1*2 ))

Full example is here: http://codepad.org/y4bpgLJu

  • This[8] implementation uses a macro and overloads the << operator. It evaluates any expression valid inside an 'if', and any variable name may be chosen. It's not threadsafe, however. Usage example:
list<int> N;
list<double> S;

for (int i = 0; i < 10; i++)
     N.push_back(i);

S << list_comprehension(3.1415 * x, x, N, x*x > 3)
  • This[9] implementation provides head/tail slicing using classes and operator overloading, and the | operator for filtering lists (using functions). Usage example:
bool even(int x) { return x % 2 == 0; }
bool x2(int &x) { x *= 2; return true; }

list<int> l, t;
int x, y;

for (int i = 0; i < 10; i++)
     l.push_back(i);

(x, t) = l | x2;
(t, y) = t;

t = l < 9;
t = t < 7 | even | x2;
  • Language for Embedded Query and Traversal (LEESA[10]) is an embedded DSL in C++ that implements X-Path-like queries using operator overloading. The queries are executed on richly typed xml trees obtained using xml-to-c++ binding from an XSD. There is absolutely no string encoding. Even the names of the xml tags are classes and therefore, there is no way for typos. If a LEESA expression forms an incorrect path that does not exist in the data model, the C++ compiler will reject the code.

Consider a catalog xml.

<catalog>
  <book>
    <title>Hamlet</title>
    <price>9.99</price>
    <author>
      <name>William Shakespeare</name>
      <country>England</country>
    </author>
  </book>
  <book>...</book>
...
</catalog>

LEESA provides >> for X-Path's / separator. Interestingly, X-Path's // separator that "skips" intermediate nodes in the tree is implemented in LEESA using what's known as Strategic Programming. In the example below, catalog_, book_, author_, and name_ are instances of catalog, book, author, and name classes, respectively.

// Equivalent X-Path: "catalog/book/author/name"
std::vector<name> author_names = 
evaluate(root, catalog_ >> book_ >> author_ >> name_);

// Equivalent X-Path: "catalog//name"
std::vector<name> author_names = 
evaluate(root, catalog_ >> DescendantsOf(catalog_, name_));

// Equivalent X-Path: "catalog//author[country=="England"]"
std::vector<name> author_names = 
evaluate(root, catalog_  >> DescendantsOf(catalog_, author_)
                         >> Select(author_, [](const author & a) { return a.country()=="England"; })
                         >> name_);

See also

Notes and references

  1. ^ Comprehensions, a query notation for DBPLs
  2. ^ The functional guts of the Kleisli query system
  3. ^ http://haxe.org/manual/haxe3/features#array-comprehension
  4. ^ https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Predefined_Core_Objects#Array_comprehensions
  5. ^ Warsaw, Barry (2 October 2008). "PEP 202 – List Comprehensions". Python Software Foundation.
  6. ^ "2.1 Location Steps". XML Path Language (XPath). W3C. 16 November 1999.
  7. ^ "XQuery FLWOR Expressions". W3Schools.
  8. ^ "Single-variable List Comprehension in C++ using Preprocessor Macros".
  9. ^ "C++ list comprehensions".
  10. ^ "Language for Embedded Query and Traversal (LEESA)".

Haskell

OCaml

Python

Common Lisp

Clojure

Axiom