|This article needs additional citations for verification. (October 2014)|
|Paradigms||imperative (procedural), structured, parallel|
|Parallel Studio 2010 / September 2, 2010|
|static, weak, manifest|
|Influenced by||C, Cilk|
Cilk Plus adds fine-grained task support to the C and C++ programming languages, making it easier to write parallel programs that exploit the multiple processors and vector instructions available on modern CPUs. It provides simple language extensions to express data and task parallelism to the C and C++ language. Cilk Plus can be used for building IA-32 and Intel® 64 architecture programs (32-bit and 64-bit) that run on Windows, Linux, and OS X.
Cilk plus provides support for both task and data parallelism. It is particularly well suited for, but not limited to, divide and conquer algorithms. This strategy solves problems by breaking them into sub-problems (tasks) that can be solved independently, then combining the results. Recursive functions are often used for divide and conquer algorithms.
Cilk Plus has been implemented in a number of compilers:
- Intel C++ Compiler (ICC) - Intel announced support of the Cilk Plus extensions for C and C++ with the release of the Intel compiler in Intel Composer XE 2010.
- GNU Compiler Collection (GCC) - Intel has implemented the Cilk Plus extensions in the GCC C and C++ compilers, and the full implementation (except for the feature _Cilk_for) has been merged into GCC mainline as of version 4.9. 
- Clang/LLVM – Intel announced in February 2013 that it was working on an implementation of Cilk Plus as an extension to Clang/LLVM. The work thus far is available at GitHub.
The Cilk Plus runtime library is available under two licenses:
- Commercial, when shipped with the Intel C++ Compiler.
- BSD-3 when included with the GCC C and C++ compilers and Clang/LLVM compiler. The sources are also available from the cilkplus.org website under the BSD-3 license.
Cilk Plus offers a number of improvements over OpenMP, for example guaranteed maximum memory usage scaling. The developer community and forum is located on the Cilk Plus Website.
- 1 History
- 2 The core elements
- 3 Sample programs
- 4 See also
- 5 References
- 6 External links
The Cilk programming language grew out of three separate projects at the MIT Laboratory for Computer Science:
- Theoretical work on scheduling multi-threaded applications.
- StarTech – a parallel chess program built to run on the Thinking Machines Corporation's Connection Machine model CM-5.
- PCM/Threaded-C – a C-based package for scheduling continuation-passing-style threads on the CM-5
In April 1994 the three projects were combined and christened "Cilk". The name Cilk is not an acronym, but an allusion to "nice threads" (silk) and the C programming language.
MIT Cilk is an extension of C, and implemented as a source-to-source translator. The Cilk-1 system was released in September 1994. The current implementation, Cilk-5.3, is available from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), though it is no longer supported.
In 2006, Cilk Arts licensed the Cilk technology from MIT with the goal of developing a commercial C++ implementation. Cilk++ v1.0 was released in December 2008 with support for both Windows* Visual Studio and GCC/C++ on Linux.
On July 31, 2009, Cilk Arts, announced that its products and engineering team were now part of Intel Corp. Intel and Cilk Arts integrated and advanced the technology further, resulting in a September 2010 release of Intel Cilk Plus. Intel Cilk Plus differs from Cilk and Cilk++ by adding array extensions, being incorporated in a commercial compiler (from Intel), compatibility with existing debuggers, and using standard calling conventions. Intel has stated its desire to refine Cilk Plus and to enable it to be implemented by other compilers to gain industry wide adoption. In November 2010, Intel published a language specification and an ABI specification to enable other compilers to implement Cilk Plus and to optionally utilize the Intel runtime.
The core elements
- cilk_for - Allows iterations of the loop body to be executed in parallel.
- cilk_spawn - Specifies that a function call can execute asynchronously, without requiring the caller to wait for it to return. This is an expression of an opportunity for parallelism, not a command that mandates parallelism. The Intel Cilk Plus runtime will choose whether to run the function in parallel with its caller.
- cilk_sync - Specifies that all spawned calls in a function must complete before execution continues. There is an implied cilk_sync at the end of every function that contains a cilk_spawn.
Intel Cilk Plus includes reducers to help make parallel programming easier. Traditional parallel programs use locks to protect shared variables, which can be problematic. Incorrect lock use can result in deadlocks. Contention for locked regions of code can slow a program down. And while locks can prevent races, there is no way to enforce ordering, resulting in non-deterministic results. Reducers provide a lock-free mechanism that allows parallel code to use private "views" of a variable which are merged at the next sync. The merge is done in an ordered manner to maintain the serial semantics of the Intel Cilk Plus application.
Task Parallelism Tools
Parallelism introduces new types of errors that require specific tools for finding them. The Download page of the Cilk Plus community website offers a Race Detector and Scalability Analyzer, as well as a toolkit to allow you to build your own tools:
Cilk Plus SDK
The Intel Cilk Plus SDK (Software Development Kit) supplies additional tools for Intel Cilk Plus developers working on the Microsoft Windows* and Linux* operating systems. These tools support the Intel Cilk Plus implementation provided by both the Intel Composer C/C++ compilers and the GCC "cilkplus" branch C/C++ compilers.
The Intel Cilk Plus SDK provides the following tools:
- Intel Cilk screen race detector (Cilk screen) – Monitors the actual operation of an Intel Cilk Plus program as run with your test input. Cilk screen reports all data races introduced by the Intel Cilk Plus language constructs that could be encountered during execution. By monitoring program execution, Cilk screen can detect races in your production binary, and can even detect races produced by third-party libraries for which you may not have source code.
- Intel Cilk view scalability analyzer (Cilk view) – Helps you understand the parallel performance of your Intel Cilk Plus program. Cilk view reports parallel statistics about an Intel Cilk Plus program and predicts how the performance will scale on multiple processor systems. In addition, Cilk view can automatically benchmark an Intel Cilk Plus program running on one or more processors. Integration of Cilk screen and Cilk view into Microsoft Visual Studio* 2008 and 2010 on Microsoft Windows* operating systems. Integration into the user’s development environment allows users to view error locations at the click of a mouse.
In addition to the Language Specification and Application Binary Interface, Cilk defines a mechanism for low overhead tool annotations, called metadata. libzca provides access to the metadata without having to parse the metadata section of an application that uses Intel Cilk Plus.
Intel Cilk Plus includes a set of notations that allow users to express high-level operations on entire arrays or sections of arrays. These notations help the compiler to effectively vectorize the application. Intel Cilk Plus allows C/C++ operations to be applied to multiple array elements in parallel, and also provides a set of built-in functions that can be used to perform vectorized shifts, rotates, and reductions.
An elemental function is a regular function which can be invoked either on scalar arguments or on array elements in parallel. They are most useful when combined with array notation or #pragma simd.
This pragma gives the compiler permission to vectorize a loop even in cases where auto-vectorization might fail. It is the simplest way to manually apply vectorization.
Fibonacci – cilk_spawn and cilk_sync
The following function calculates a Fibonacci number. There are certainly more efficient algorithms to calculate Fibonacci numbers, but this one provides a simple recursive function and makes a good example:
int fib(int n)
if (n < 2)
int x = fib(n-1);
int y = fib(n-2);
return x + y;
The key idea here is that the calculation of fib(n-1) can be executed in parallel with the calculation of fib(n-2) without interference. The parallelism can be expressed in Cilk Plus with the following modifications:
int fib(int n)
if (n < 2)
int x = cilk_spawn fib(n-1);
int y = fib(n-2);
return x + y;
The cilk_spawn keyword specifies that the child function may execute in parallel with the caller (sometimes referred to as the parent). The code following a cilk_spawn statement is referred to as a continuation. It is important to note that cilk_spawn permits parallelism. It does not command it. It is also important to note that cilk_spawn does not create a thread. The runtime creates a pool of worker threads, and cilk_spawn allows the runtime to steal the continuation to execute in one of the worker threads.
The cilk_sync statement specifies that all child functions spawned from this function must complete before execution continues past this statement. There is always an implied cilk_sync at the end of every function that contains a cilk_spawn. It is important to note that cilk_sync only affects the child functions spawned from this function. Functions spawned higher on the call tree will continue to run in parallel with the function executing the cilk_sync.
Simple for loop – cilk_for
The cilk_for statement converts a simple for loop into a parallel for loop. That is, one where iterations of the for loop body can be executed in parallel. Consider the following loop:
for (int i = 0; i < 8; ++i)
A better approach is to use a cilk_for loop:
cilk_for (int i = 0; i < 8; ++i)
The Intel Cilk Plus compiler and runtime cooperate to divide the work of the loop in half, and then divide it in half again, until there are enough pieces to keep the cores busy, but at the same time minimize the overhead imposed by cilk_spawn. Like the recursive implementation of fib() above, this efficiently spreads the work across the available cores and minimizes steals.
- Grand Central Dispatch
- Threading Building Blocks (TBB)
- Intel Concurrent Collections (CnC)
- Intel Parallel Building Blocks (PBB)
- Intel Array Building Blocks (ArBB)
- Intel Parallel Studio
- Intel Developer Zone (Intel DZ; support and discussion)
- Parallel computing
- Sieve C++ Parallel Programming System
- "GCC 4.9 Release Series Changes, New Features, and Fixes", Free Software Foundation, Inc. Retrieved on 2014-06-29.
- Official website
- Official website at Intel
- Arch D. Robison, “Cilk Plus: Language Support for Thread and Vector Parallelism” and “Parallel Programming with Cilk Plus”, Jul 16, 2012.