Jump to content

C (programming language)

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Kate (talk | contribs) at 08:58, 9 June 2004 (→‎[[Programming tool]]s: rm g++, = c++ compiler, & gcc already listed). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

C is a programming language developed by Ken Thompson and Dennis Ritchie, in the early 1970s, for use on the UNIX operating system. It is now used on practically every operating system, and is the most popular language for writing system software, though it is also used for writing applications. It is also commonly used in computer science education.

Features

C is a relatively minimalist programming language and is more similar to assembly language than most other programming languages. It is referred to anything from a high level assembler to a low level language to medium level language to a high level language.

C has significant differences from assembly language. C source code is generally easier to read and much less burdensome to write, especially for lengthy programs. Assembly language code is usually applicable only to a specific computer architecture, whereas C code can generally be easily ported to any architecture on which a C compiler and any required libraries exist. On the other hand, the efficiency of C code is somewhat dependent on the ability of the compiler to optimize the resulting machine language, which is largely out of the programmer's control. In contrast, the efficiency of assembly code is precisely determined, since assembly is just human-readable notation for a machine language. For this reason, some programs such as operating system kernels, though mostly written in C, may contain "hand-tuned" fragments of assembly language where performance is especially crucial. It should be noted, however, that modern processors often have complex timing issues that make it hard to hand-tune assembly code, issues that can be handled much more easily by a compiler.

Similar advantages and disadvantages distinguish C from higher-level languages: the efficiency of C code can be more closely controlled, at the cost of being generally more troublesome to read and write. C is typically at least as portable as higher-level languages, because most computer architectures are equipped with a C compiler and libraries; in fact, the compilers, libraries, and interpreters of higher-level languages are often implemented in C!

Data storage in C is handled in 3 basic ways, by static memory allocation (essentially at compile time), by automatic allocation on the program stack, and by dynamic allocation through library calls from an area of memory called the heap. There is also a data type called a pointer that can hold a reference to allocated memory, for example, the memory that allocates a variable. A pointer is, roughly speaking, an abstraction of what assembly programmers call an index register. Pointers have been criticized because the simplicity of the implementation allows the introduction of malfunctions and vulnerabilities in C programs. The Java and C# languages, both descendants of C, use safer ways of referring to variables that make it much harder to write incorrect programs. These languages also have run-time systems that can detect most of the remaining problems. Run-time checking introduces overhead, but in almost all applications is worthwhile because of the greater guarantee of correct operation it provides.

Pointer variables exist independently from the variables they refer to, leading to various wild pointer problems. The programmer is responsible for deallocation of dynamic memory, so it is easy to produce the kind of problem known as a memory leak, where dynamically allocated memory is never deallocated during the lifetime of the executing program. Some languages reduce this problem with automatic garbage collection. They do not completely prevent it because, for instance, allocations can not be garbage collected while static references exist to them.

In C, memory can be referenced by pointer arithmetic (adding or subtracting an integer to a pointer), but there is generally no check on whether the result is valid. Array elements in C are accessed using pointer arithmetic, so it is possible to refer to elements in an array that were never allocated. Languages that use run-time array bounds checking are protected from the buffer overflow vulnerabilities that result from allocating fixed size buffers on the stack. Tools have been created to help C programmers avoid memory errors, including libraries for performing array bounds checking and automatic garbage collection, but they are not a standard part of C. Automated source code checking and auditing is fruitful in any language, and for C many such tools exist, for example Lint.

Some of the specific features of C are:

History

Early developments

The initial development of C occurred at AT&T Bell Labs between 1969 and 1973; according to Ritchie, the most creative period occurred in 1972. It was named "C" because many of its features were derived from an earlier language called "B". Accounts differ regarding the origins of the name "B": Ken Thompson credits the BCPL programming language, but he had also created a language called Bon in honor of his wife Bonnie.

By 1973, the C language had become powerful enough that most of the UNIX kernel, originally written in PDP-11/20 assembly language, was rewritten in C. This was one of the first operating system kernels implemented in a language other than assembly, earlier instances being the Multics system (written in PL/I) and TRIPOS (written in BCPL).

K&R C

In 1978, Ritchie and Brian Kernighan published the first edition of The C Programming Language. This book, known to C programmers as "K&R", served for many years as an informal specification of the language. The version of C that it describes is commonly referred to as "K&R C." (The second edition of the book covers the later ANSI C standard, described below.)

K&R introduced the following features to the language:

  • struct data types
  • long int data type
  • unsigned int data type
  • The =+ operator was changed to +=, and so forth (=+ was confusing the C compiler's lexical analyzer).

K&R C is often considered the most basic part of the language that is necessary for a C compiler to support. For many years, even after the introduction of ANSI C, it was considered the "lowest common denominator" that C programmers stuck to when maximum portability was desired, since not all compilers were updated to fully support ANSI C, and reasonably well-written K&R C code is also legal ANSI C.

In the years following the publication of K&R C, several "unofficial" features were added to the language, supported by compilers from AT&T and some other vendors. These included:

  • void functions and void * data type
  • functions returning struct or union types
  • struct field names in a separate name space for each struct type
  • assignment for struct data types
  • const qualifier to make an object read-only
  • a standard library incorporating most of the functionality implemented by various vendors
  • enumerations
  • the single-precision float type

ANSI C and ISO C

During the late 1970s, C began to replace BASIC as the leading microcomputer programming language. During the 1980s, it was adopted for use with the IBM PC, and its popularity began to increase significantly. At the same time, Bjarne Stroustrup and others at Bell Labs began work on adding object-oriented programming language constructs to C. The language they produced, called C++, is now the most common application programming language on the Microsoft Windows operating system; C remains more popular in the Unix world.

In 1983, the American National Standards Institute (ANSI) formed a committee, X3J11, to establish a standard specification of C. After a long and arduous process, the standard was completed in 1989 and ratified as ANSI X3.159-1989 "Programming Language C". This version of the language is often referred to as ANSI C. In 1990, the ANSI C standard (with a few minor modifications) was adopted by the International Standards Organization (ISO) as ISO/IEC 9899:1990.

One of the aims of the ANSI C standardization process was to produce a superset of K&R C, incorporating many of the unofficial features subsequently introduced. However, the standards committee also included several new features, such as function prototypes (borrowed from C++), and a more capable preprocessor.

ANSI C is now supported by almost all the widely used compilers. Most of the C code being written nowadays is based on ANSI C. Any program written only in standard C is guaranteed to perform correctly on any platform with a conforming C implementation. However, many programs have been written that will only compile on a certain platform, or with a certain compiler, due to (i) the use of non-standard libraries, e.g. for graphical displays, and (ii) some compilers not adhering to the ANSI C standard, or its successor, in their default mode.

C99

After the ANSI standardization process, the C language specification remained relatively static for some time, whereas C++ continued to evolve. (Normative Amendment 1 created a new version of the C language in 1995, but this version is rarely acknowledged.) However, the standard underwent revision in the late 1990s, leading to the publication of ISO 9899:1999 in 1999. This standard is commonly referred to as "C99". It was adopted as an ANSI standard in March 2000.

The new features in C99 include:

Interest in supporting the new C99 features appears to be mixed. Whereas GCC and several other compilers now support most of the new features of C99, the compilers maintained by Microsoft and Borland do not, and these two companies do not seem to be interested in adding such support.

"Hello, World!" in C

The following simple application prints out "Hello, World!" to standard output (which is usually the screen, but might be a file or some other hardware device or perhaps even the bit bucket depending on how standard output is mapped at the time the program is executed). A version of this program appeared for the first time in K&R.


#include <stdio.h>

int main(void)
{
    printf("Hello, World!\n");
    return 0;
}

The first line of the program is an #include statement, which causes the compiler to substitute for that line the entire text of the file (or other entity) it refers to; in this case the Standard file stdio.h will replace that line. The angle brackets indicate that the stdio.h file is to be found in whatever place is designated for the compiler to find Standard include files.

The next (non-blank) line indicates that a function named "main" is being defined; the function named "main" is special in C programs, as it is the function that is first run when the program starts (for hosted implementations of C, and leaving aside "housekeeping" code). The curly brackets delimit the extent of the function. the int defines "main" as a function that returns or evaluates to, an integral number; the void indicates that no arguments or data must be given to function main by its caller.

The next line "calls", or executes a function named printf; the included file, stdio.h, contains the information describing how the printf function is to be called. In this call, the printf function is passed a single argument, the constant string "Hello, World!\n"; the \n is translated to a "newline" character, which when displayed causes the line break. The returned value from printf is discarded since it is not used.

The return statement tells the program to exit the current function (in this case main), returning the value zero to the function that called the current function. Since the current function is "main", the caller is whatever started our program. Finally, the close curly bracket indicates the end of the function "main".

Relation to C++

The C++ programming language was originally derived from C. As C and C++ have evolved independently, the division between the two has widened, however.

C99 created a number of conflicting features. Today, the primary differences between the two languages are:

  • inline - in C++, inline functions are in the global scope while in C being in a file scope. In simple terms, this means that in C++, any definition of any inline function (but irrespective of C++ function overloading) must conform to C++'s "One Definition Rule" or ODR, requiring that either there be a single definition of any inline function or that all definitions be semantically equivalent; but that in C, the same inline function could be defined differently in different translation units (or files).
  • The bool keyword in C99 is in its own header, <stdbool.h>. Previous Standards of C did not define a boolean type, and various (incompatible) methods were used to simulate a boolean type.

Some features originally developed in C++ have also appeared in C. Among them are:

  • Prototyping.
  • Line comments. (Indicated by //. Line comments end with a newline character.)
  • inline keyword.
  • void type.
  • Stronger typing.

See also

References

Template:List of programming languages

An early version of this article contained material from FOLDOC, used with permission. [[es:Lenguaje de programaci%F3n C]]