|This article needs additional citations for verification. (March 2011)|
Relocation is the process of assigning load addresses to various parts of a program and adjusting the code and data in the program to reflect the assigned addresses. A linker usually performs relocation in conjunction with symbol resolution, the process of searching files and libraries to replace symbolic references or names of libraries with actual usable addresses in memory before running a program.
Relocation is typically done by the linker at link time, but it can also be done at run time by a relocating loader, or by the running program itself. Some architectures avoid relocation entirely by deferring address assignment to run time; this is known as zero address arithmetic.
Relocation is typically done in two steps:
- Each object file has various sections like code, data, .bss etc. To combine all the objects to a single executable, the linker merges all sections of similar type into a single section of that type. The linker then assigns run time addresses to each section and each symbol. At this point, the code (functions) and data (global variables) will have unique run time addresses.
- Each section refers to one or more symbols which should be modified so that they point to the correct run time addresses based on information stored in a relocation table in the object file.
The relocation table is a list of pointers created by the compiler or assembler and stored in the object or executable file. Each entry in the table, or "fixup", is a pointer to an address in the object code that must be changed when the loader relocates the program. Fixups are designed to support relocation of the program as a complete unit. In some cases, each fixup in the table is itself relative to a base address of zero, so the fixups themselves must be changed as the loader moves through the table.
In some architectures a fixup that crosses certain boundaries (such as a segment boundary) or that is not aligned on a word boundary is illegal and flagged as an error by the linker.
Far pointers (32-bit pointers with segment:offset, used to address 20-bit 640 KB memory space available to DOS programs), which point to code or data within an DOS executable (EXE) do not have absolute segments, because the actual address of code/data depends on where the program is loaded in memory and this is not known until the program is loaded.
Instead, segments are relative values in the DOS EXE file. These segments need to be corrected, when the executable has been loaded into memory. The EXE loader uses a relocation table to find the segments which need to be adjusted.
With 32-bit Windows operating systems it is not mandatory to provide relocation tables for EXE files, since they are the first image loaded into the virtual address space and thus will be loaded at their preferred base address.
For both DLLs and for EXEs which opt into Address Space Layout Randomisation - an exploit mitigation technique introduced with Windows Vista, relocation tables once again become mandatory because of the possibility that the binary may be dynamically moved before being executed, even though they are still the first thing loaded in the virtual address space.
When running native 64-bit binaries on Windows Vista and above, ASLR (Address Space Layout Randomization) is mandatory, and thus relocation sections cannot be omitted by the compiler.
The ELF executable format and SO shared library format used by most Unix-like systems allows several types of relocation to be defined.
The following example uses Donald Knuth's MIX architecture and MIXAL assembly language. The principles are the same for any architecture, though the details will change.
- (A) Program SUBR is compiled to produce object file(B), shown as both machine code and assembler. The compiler may start the compiled code at an arbitrary location, often location zero as shown. Location 13 contains the machine code for the jump instruction to statement ST in location 5.
- (C) If SUBR is later linked with other code it may be stored at a location other than zero. In this example the linker places it at location 120. The address in the jump instruction, which is now at location 133, must be relocated to point to the new location of the code for statement ST, now 125. [1 61 shown in the instruction is the MIX machine code representation of 125].
- (D) When the program is loaded into memory to run it may be loaded at some location other than the one assigned by the linker. This example shows SUBR now at location 300. The address in the jump instruction, now at 313, needs to be relocated again so that it points to the updated location of ST, 305. [4 49 is the MIX machine representation of 305].
- Linker (computing)
- Library (computing)
- Object file
- Static library
- Position-independent code
- John R. Levine (October 1999). "Chapter 1: Linking and Loading". Linkers and Loaders. Morgan-Kauffman. p. 5. ISBN 1-55860-496-0.
- John R. Levine (October 1999). "Chapter 3: Object Files". Linkers and Loaders. Morgan-Kauffman. ISBN 1-55860-496-0.
- "Borland article #15961: Coping with 'Fixup Overflow' messages". Archived from the original on 2007-03-24. Retrieved 2007-01-15.