Branch table

In computer programming, a branch table (sometimes known as a jump table) is a term used to describe an efficient method of transferring program control (branching) to another part of a program using a table of branch instructions. The branch table construction is commonly used when programming in assembly language but may also be generated by a compiler.

A branch table consists of a serial list of unconditional branch instructions that is branched into using an offset created by multiplying a sequential index by the instruction length (the number of bytes in memory occupied by each branch instruction). It makes use of the fact that machine code instructions for branching have a fixed length and can be executed extremely efficiently by most hardware, and is most useful when dealing with raw data values that may be easily converted to sequential index values. Given such data, a branch table can be extremely efficient; it usually consists of the following steps: optionally validating the input data to ensure it is acceptable; transforming the data into an offset into the branch table, this usually involves multiplying or shifting it to take into account the instruction length; and branching to an address made up of the base of the table and the generated offset: this often involves an addition of the offset onto the program counter register.

Another method of implementing a branch table is with an array of addresses from which the required address is retrieved. This method can result in smaller code, and avoids the indirect jump. The method used to implement a branch table is usually based on the architecture of the processor on which the code is to be run.

Branch tables are often used in operating system development. Both system calls and library functions may be referenced by an integer index into a branch table. This can improve compatibility with subsequent software versions: if the code of a function and the address of its entry point is changed, only the branch instruction in the branch table needs to be adjusted, application software compiled against the library or for the operating system does not need modification. In addition, calling functions by number (the index into the table) can be useful for some cases when programming. In addition, some computer architectures use branch tables for dispatching interrupts.

Example

A simple example of branch table use in the 8-bit Microchip PIC assembly language is:

    movf    INDEX,W     ; move the index value into the W (working) register from the INDEX memory location
    addwf   PCL,F       ; add it onto the program counter register (PCL). each PIC instruction is one byte
                        ; so there is no need to perform any multiplication. most architectures will transform
                        ; the index in some way before adding it to the program counter

table                   ; the branch table begins here with this label
    goto    index_zero  ; each of these goto instructions is an unconditional branch to a different section
    goto    index_one   ; of code
    goto    index_two
    goto    index_three

index_zero
    ; code is added here to perform whatever action is required when INDEX was equal to zero
    return

index_one
...

History

Reason for using

The use of branch tables and the mapping of raw data representations to an index value was common in the early days of computing when memory was expensive, CPUs were slower and compact data representation and efficient choice of alternatives were important. It is also frequently seen in modern embedded programming, which often requires code to fit into tightly constrained space and operate efficiently.

Advantages

Encode first time only
High data compression ratios
Performance
Code efficiency
Table driven code

Encode first time only

It was believed in the early days of computing that it was very much more efficient to process raw data once only - on its first data entry. Thereafter, the coded data would be used instead of the raw data. This led to compact data representations internally and on databases (except mainly in 'free form' text such as proper names and parts of addresses containing house names which tended to be more variable).

Compression of raw data

The main advantages of such conversion is that, once an encoded value has been converted to an index, it can often be used to branch into a branch table or retrieve a value from a lookup table efficiently at any time in the future without having to be converted again. For example, there are a relatively few countries in the world; countries may easily be represented by a country code. Countries can therefore be stored simply as a country number rather than as a text string, such as "Central African Republic," resulting in massive savings in storage and processing time.

Performance improvements

In this example an eight bit country code reduces storage by a massive factor of 24. Numeric comparisons (especially single byte comparisons) are significantly faster than text string comparisons, and indexed retrievals are significantly faster than string searches or retrieval from a file.

Code efficiency

A further advantage of using index values rather than raw text strings is that the same index value used for accessing the full text of the country name for instance, can equally be used for accessing an image of the countries flag, its population and so on.

Table driven code

Where raw data has been reduced to coded values, efficient embedded tables can be built that define program logic to a large extent. Adding additional data types frequently involves merely adding an extra table entry and perhaps a small amount of associated code.

Disadavantages

Extra level of indirection
Lack of familiarity
Programming Language restrictions

Extra level of indirection

This is of little importance to a computer, but adds to code ^{[citation needed]}. Also, as the Millennium bug has shown, this approach can sometimes lead to later problems when the space requirement for the index or representation outgrows the space allocated to the task.

Future proofing

The future ability to extend the number of countries, in the above example, beyond the storage limit of the chosen (fixed length) data field is important and a serious design consideration. Where the programmer has ignored potential expansion issues, it can lead to later predictable problems. Using two bytes for a country code in the same example would extend the range of future countries from 256 to 65,535 while still giving a significant compression factor of 13 in memory usage. Changing the size of the memory holding the index at a later date, to allow extension of the range, does not in any way further alter the complexity of the program but might simply increase the total record size by a small fraction of its original size.

Lack of familiarity

For some programmers, who might be unfamiliar with the technique of using a branch table, its use might initially add slightly to data complexity. Once learned however, the technique can be used in almost every situation to reduce the size of programs significantly and make them easier to maintain and extend.

Programming Language restrictions

Not all programming languages support the use of Branch Tables, although they may be implemented using similar techniques that are fairly efficient. The Standard ANSI C programming language, for example, does not permit indirect branching and is instead 'simulated' with a Switch statement (or case statement or a select statement in some languages) . However there is a further restriction allowing only branches within functions. Although this means that a label can be provided within the function with an additional function call at the address branched to, additional housekeeping (stack processing etc) is inevitably involved in calling the extra function - which reduces its efficiency significantly.