Low-level programming language

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In computer science, a low-level programming language is a programming language that provides little or no abstraction from a computer's instruction set architecture. Generally this refers to either machine code or assembly language. The word "low" refers to the small or nonexistent amount of abstraction between the language and machine language; because of this, low-level languages are sometimes described as being "close to the hardware".

Low-level languages can be converted to machine code without using a compiler or interpreter, and the resulting code runs directly on the processor. A program written in a low-level language can be made to run very quickly, and with a very small memory footprint; an equivalent program in a high-level language will be more heavyweight. Low-level languages are simple, but are considered difficult to use, due to the numerous technical details which must be remembered.

By comparison, a high-level programming language isolates the execution semantics of a computer architecture from the specification of the program, which simplifies development.

Low-level programming languages are sometimes divided into two categories: first generation, and second generation.

Machine code[edit]

Machine code is the only language a microprocessor can process directly without a previous transformation. Currently, programmers almost never write programs directly in machine code, because it requires attention to numerous details which a high-level language would handle automatically, and also requires memorizing or looking up numerical codes for every instruction that is used. For this reason, second generation programming languages provide one abstraction level on top of the machine code. Even in the early days of coding on computers like the TX-0 and PDP-1, the first thing that the MIT hackers did was write assemblers[1]

Example: A function in 32-bit x86 machine code to calculate the nth Fibonacci number:

8B542408 83FA0077 06B80000 0000C383
FA027706 B8010000 00C353BB 01000000
B9010000 008D0419 83FA0376 078BD98B
C84AEBF1 5BC3

Assembly[edit]

Assembly language has no semantics and no specification, being only a mapping of human-readable symbols, including symbolic addresses, to opcodes, addresses, numeric constants, strings and so on. Typically, one machine instruction is represented as one line of assembly code. Assemblers produce object files which may be linked with other object files or loaded on their own.

Most assemblers provide macros.

Example: The same Fibonacci number calculator as above, but in x86 assembly language using MASM syntax:

fib:
    mov edx, [esp+8]
    cmp edx, 0
    ja @f
    mov eax, 0
    ret
 
    @@:
    cmp edx, 2
    ja @f
    mov eax, 1
    ret
 
    @@:
    push ebx
    mov ebx, 1
    mov ecx, 1
 
    @@:
        lea eax, [ebx+ecx]
        cmp edx, 3
        jbe @f
        mov ebx, ecx
        mov ecx, eax
        dec edx
    jmp @b
 
    @@:
    pop ebx
    ret

In this code example, hardware features of the x86 processor (its registers) are named and manipulated directly. The function loads its input from a precise location in the stack (8 bytes higher than the location stored in the ESP stack pointer) and performs its calculation by manipulating values in the EAX, EBX, ECX and EDX registers until it has finished and returns. Note that in this assembly language, there is no concept of returning a value. The result having been stored in the EAX register, the RET command simply moves code processing to the code location stored on the stack (usually the instruction immediately after the one which called this function) and it is up to the author of the calling code to know that this function stores its result in EAX and to retrieve it from there. x86 assembly language imposes no standard for returning values from a function (and so, in fact, has no concept of a function); it is up to the calling code to examine state after the procedure returns if it needs to extract a value.

Compare this with the same function in C:

unsigned int fib(unsigned int n)
{
    if (n <= 0)
        return 0;
    else if (n <= 2)
        return 1;
    else {
        int a,b,c;
        a = 1;
        b = 1;
        while (true) {
            c = a + b;
            if (n <= 3) return c;
            a = b;
            b = c;
            n--;
        }
    }
}

This code is very similar in structure to the asembly language example but there are significant differences in terms of abstraction:

  • While the input (parameter n) will be loaded from the stack, its precise position on the stack is not specified. The C compiler will calculate this based on the calling conventions of the target architecture.
  • The assembly language version loads the input parameter from the stack into a register and in each iteration of the loop decrements the value in the register, never altering the value in the memory location on the stack. The C compiler could do the same or could update the value in the stack; which it chooses is an implementation decision completely hidden from the code author (and one with no side effects, thanks to the standards specified by the C language).
  • The local variables a, b and c are abstractions which do not specify any specific storage location on the hardware. How they are actually stored is up to the C compiler for the target architecture.
  • The return function specifies the value to be returned but does not dictate how it is returned. It is up to the C compiler for any specific architecture to implement a standard mechanism for returning the value. As it happens, on x86 architecture the compiler will return the value in the EAX register, as in the assembly language example (the author of the assembly language example has chosen to copy the C convention but assembly language does not require this).

These abstractions make the C code compilable without modification on any architecture for which a C compiler has been written. The x86 assembly language code is specific to the x86 architecture.

Low-level programming in high-level languages[edit]

Experiments with hardware support in high-level languages in the late 1960s led to such languages as PL/S, BLISS, BCPL, extended ALGOL (for Burroughs large systems) and C being used for some degree of low-level programming. The most authentically low-level method for this is Inline assembly, allowing assembly code to be embedded in a high-level language which supports this feature. Most of these languages also allow architecture-dependent compiler optimization directives to adjust the way in which a compiler uses the target processor architecture.

References[edit]

  1. ^ Levy, Stephen (1994). Hackers: Heroes of the Computer Revolution, Penguin Books. p. 32. ISBN 0-14-100051-1