Programming language generations

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Programming languages have been classified into several programming language generations. Historically, this classification was used to indicate increasing power of programming styles. Later writers have somewhat redefined the meanings as distinctions previously seen as important became less significant to current practice.

Historical view of first three generations[edit]

First generation[edit]

The terms "first-generation" and "second-generation" programming language were not used prior to the coining of the term "third-generation." In fact, none of these three terms are mentioned in early compendiums of programming languages. The introduction of a third generation of computer technology coincided with the creation of a new generation of programming languages. The marketing for this generational shift in machines did correlate with several important changes in what were called high level programming languages, discussed below, giving technical content to the second/third-generation distinction among high level programming languages as well, and reflexively renaming assembly languages as first-generation.

Second generation[edit]

Second-generation programming languages, originally just called low level programming languages, were created to simplify the burden of programming by making its expression more like the normal mode of expression for thoughts used by the programmer. They were introduced in the late 1950s, with FORTRAN reflecting the needs of scientific programmers, ALGOL reflecting an attempt to produce a European/American standard view.

The most important issue faced by the developers of second-level languages was convincing customers that the code produced by the compilers performed well-enough to justify abandonment of assembly programming. In view of the widespread skepticism about the possibility of producing efficient programs with an automatic programming system and the fact that inefficiencies could no longer be hidden, the developers were convinced that the kind of system they had in mind would be widely used only if they could demonstrate that it would produce programs almost as efficient as hand coded ones and do so on virtually every job. The FORTRAN compiler was seen as a tour-de-force in the production of high-quality code, even including "… a Monte Carlo simulation of its execution … so as to minimize the transfers of items between the store and the index registers."

Third generation[edit]

The introduction of a third generation of computer technology coincided with the creation of a new generation of programming languages.[1] The essential feature of third-generation languages is their hardware-independence, i.e. expression of an algorithm in a way that was independent of the characteristics of the machine on which the algorithm would run.

Some or all of a number of other developments that occurred at the same time were included in 3GLs.

Interpretation was introduced. Some 3GLs were compiled, a process analogous to the creation of a complete machine code executable from assembly code, the difference being that in higher-level languages there is no longer a one-to-one, or even linear, relationship between source code instructions and machine code instructions. Compilers are able to target different hardware by producing different translations of the same source code commands.

Interpreters, on the other hand, essentially execute the source code instructions themselves — if one encounters an "add" instruction, it performs an addition itself, rather than outputting an addition instruction to be executed later. Machine-independence is achieved by having different interpreters in the machine codes of the targeted platforms, i.e. the interpreter itself generally has to be compiled. Interpretation was not a linear "advance", but an alternative model to compilation, which continues to exist alongside it, and other, more recently developed, hybrids. Lisp is an early interpreted language.

The earliest 3GLs, such as Fortran and COBOL, were spaghetti coded, i.e. they had the same style of flow of control as assembler and machine code, making heavy use of the goto statement. Structured programming[2] introduced a model where a program was seen as a hierarchy of nested blocks rather than a linear list of instructions. For instance, structured programmers were to conceive of a loop as a block of code that is repeated, rather than so many commands followed by a backwards jump or goto. Structured programming is less about power — in the sense of one higher-level command expanding into many lower-level ones — than safety. Programmers following it were much less prone to make mistakes. The division of code into blocks, subroutines and other modules with clearly defined interfaces also had productivity benefits in allowing many programmers to work on one project. Once introduced (in the ALGOL language), structured programming was incorporated into almost all languages, and retrofitted to languages that did not originally have it, such as Fortran, etc.

Block structure was also associated with deprecation of global variables, a similar source of error to goto. Instead, the structured languages introduced lexical scoping and automated management of storage with a stack.

Another high-level feature was the development of type systems that went beyond the data types of the underlying machine code, such as strings, arrays and records.

Where early 3GLs were special-purpose, (e.g. science or commerce) an attempt was made to create general-purpose languages, such as C and Pascal. While these enjoyed great success, domain specific languages did not disappear.

An alternative characterization of the first three generations[edit]

Since at least 1979, many authors[who?] have used a different characterization of programming language generations.

First generation[edit]

In this categorization, a first-generation programming language refers to numeric machine code, i.e. numerical instructions directly corresponding to individual hardware instructions.

Originally, no translator was used to compile or assemble the assembler source to produce the numeric machine code. The first-generation programming instructions were entered through the front panel switches of the computer system.

The main benefit of programming in machine code is that the code a user writes can run very fast and efficiently, since it is directly executed by the CPU. However, machine code is a lot more difficult to learn than higher generational programming languages, and it is far more difficult to edit if errors occur. In addition, if instructions need to be added into memory at some location, then all the instructions after the insertion point need to be moved down to make room in memory to accommodate the new instructions. Doing so on a front panel with switches can be very difficult.

Third generation[edit]

Third-generation programming languages (3GL) originally referred to all programming languages at a level higher than assembly. Whereas individual instructions of a second generation language are in one-to-one correspondence to individual machine instructions (i.e. they are close to the machine's domain), a third generation language aims to be closer to the human domain. Instructions operate at a higher, abstract level, closer to the human way of thinking, and each individual instruction can be translated into a (possibly large) number of machine-level instruction. Third generation languages are intended to be easier to use than second generation languages. In order to run on an actual computer, code written in a third generation language must be compiled either directly into machine code, or into assembly, and then assembled. Code written in a third generation language can generally be compiled to run on many different computers using a variety of hardware architectures.

First introduced in the late 1950s, FORTRAN, ALGOL and COBOL are early examples of a third-generation language.

Third generation languages tend to be either entirely (or almost entirely) independent of the underlying hardware, such as general-purpose languages like Pascal, Java, FORTRAN, etc., although some have been targeted at specific processor or processor family architectures, such as, e.g. PL/M which was targeted at Intel processors, or even C, some of whose auto-increment and auto-decrement idioms such as *(c++) derive from the PDP-11's hardware which supports the auto-increment and auto-decrement indirect addressing modes, and on which C was first developed.

Most "modern" languages (BASIC, C, C++, C#, Pascal, Ada and Java) are also third-generation languages.

Many 3GLs support structured programming.

Later generations[edit]

Initially, all programming languages at a higher level than assembly were termed "third-generation", but later on, the term "fourth-generation" was introduced to try to differentiate the (then) new declarative languages (such as Prolog and domain-specific languages) which claimed to operate at an even higher level, and in a domain even closer to the user (e.g. at a natural language level) than the original, imperative high level languages such as Pascal, C, ALGOL, Fortran, BASIC, etc.

"Generational" classification of high level languages (3rd generation and later) was never fully precise and was later perhaps abandoned, with more precise classifications gaining common usage, such as object-oriented, declarative and functional. C gave rise to C++ and later to Java and C#, Lisp to CLOS, Ada to Ada 2012, and even COBOL to COBOL2002, and new languages have emerged in that "generation" as well.


  1. ^ Rico, DF; HH Sayani; RF Field (2008). "History of computers, electronic commerce and agile methods". Advances in Computers (Academic Press). 73: Emerging Technologies. 
  2. ^ heralded by Edsger W. Dijkstra's letter to the Editor of Communications of the ACM, published in March 1968