User:Ptersilie/sandbox
Tracing is a technique of Just-in-time compilation that is used to optimize the execution of a programs during runtime. This is done by recording a sequence of frequently executed operations and compiling them to native machine code. Another approach to implement JITs are Method JITs which generate machine code for often executed methods instead of traces.
Overview
Programs can be executed in two different ways. Whereas statically typed languages (e.g C, C++) are compiled, dynamically typed languages (e.g. JavaScript, Python, PHP) are typically interpreted. The problem is that interpreted programs are often slower than compiled programs, since they cannot be executed directly on the CPU. By using Just-in-compilation it is possible to reduce this overhead and increase execution speed. This is done by analysing the program during runtime and compile parts of it to native machine code. The code that is compiled depends on the just-in-time-compiler. While Method JITs compile frequently executed methods, Tracing JITs record sequences of operations. [ There are different kinds of implementations of programming languages. Compiling langugages, interpreting languages and the implementation of a bytecode interpreter, which combines both implementations. When dynamic aspects play a big role for a programming language, interpreter are choosen often. This is due to the complexity of dynamic languages. For e.g.: types cannot always be resolved at compile time. Because interpreted languages cannot benefit from optimizations a compiler can provide, they are often weak in performance. Therefore Tracing JIT is a good approach to balance the performance out. Tracing JITs analyse a program at runtime, optimize and compile it to native code. This can be run directly on the cpu, which is faster than running with interpretation overhead. ]
Goals
The main goal of a Tracing JIT is to gain a performance speedup for so called "hot loops". A “hot” loop is a loop that is frequently executed and thus a good candidate for optimizations. When a loop is recorded only one execution path is handled. Since variable types or branches can change in some iterations guards have to be placed in the appropiate places. This procedure satifies the goal to keep the program sound.
Assumptions
Just-in-time compilers are based on the following assumptions:
- programs spend most of their time in loops
- subsequent loop iterations take similar paths
All implementation work is based on these assumptions. Especially loops are the main focus of Tracing JITs. Therefore all techniques of tracing are developed for loop iterations and building traces of loops.
Applications
There are different examples of implementations of Tracing JITs. Most of these implementations are focused on dynamic languages like Javascript or Python. Other work have found solutions for directly running on native code.
Following a trace
All implementations of Tracing JITs have roughly the following steps in common: First profiling information of loops are collected. Then, after a certain number of iterations, the "hot loop" is recorded, optimized and compiled to machine code (trace). When this loop is discovered again under the same preconditions (loop variable, etc), the compiled trace is called instead of the program counterpart. These steps are explained in detail in the following:
Profiling phase
During profiling, informations about loop executions are collected. For each execution a variable is incremented. If the value surpasses a certain threshold, the tracing phase is initiated.
Tracing phase
In the tracing phase, each program command is recorded and often translated into an intermediate representation. Since a loop can follow another path (or variables can change their type) in later iterations, guards have to be placed. In the intermediate representation guards are represented by its own code. The tracing recording finishes when the loop branches back to the start. The tracing fails if the loop takes a side exit.
Optimization phase
In modern Tracing JITs the optimization phase is connected with the tracing phase. For example, TraceMonkey uses a 2 step optimization, where the first stage is made in a forward run. During the tracing phase the intermediate code is recorded and optimized in a row. These optimizations prepares other optimizations in the second stage. Beside different techniques, machine code is emitted to run directly on the processor in this stage.
Typical optimizations are
- constant subexpression elimination
- dead code elimination
- register allocation
- invariant code motion
Execution
When a trace has been compiled to machine code, it is stored in a trace cache for fast access. The cached traces are often connected together to avoid unnessecary jumps between traces and the interpretation level. When a loop is recognized as a recorded trace in the cache, the recorded trace is executed. If a trace takes a side exit to a program part where also a trace exists, the execution is kept in the trace cache. Then the execution does not need to return back to the interpreter level. After finishing the loop or if a guard fails, execution is returned to the interpreter level. To avoid loss of variable assignments (which were done in the aborted trace) and simliar instructions, the program state has to be transformed back to the interpreter level, too. Of course these back-jumps to the interpreter level evolve in big preformance losses.
History
Whereas the idea of JITS reaches way back to the 1960s, Tracing JITS are quite young. The first mentioning of an idea that is similar to today's idea of Tracing JITs was by Mitchell et al. They observed that compiled code could be derived from an interpreter at run-time by simply storing the actions performed during interpretation.
However the first implementation of a Tracing JIT was Dynamo. Dynamo is "a software dynamic optimiziation system that is capable of transparently improving the performance of a native instruction stream as it executes on the processor". To do this, the native instruction stream is interpreted until a “hot” instruction sequence is found. For this sequence an optimized version is generated, cached and executed.
Another implementation of a Tracing JIT is DynamoRIO (2003). DynamoRIO is a framework for interpreter construction that combines a JIT and partial evaluation. It is used to “dynamically remove interpreter overhead from language implementations”.
In 2006, a JIT compiler for a Java VM was developed that was small enough to fit on resource-constrained devices. This VM, called HotpathVM, was capable of dynamically identifying frequently executed bytecode instructions which are compiled using Static Single Assignment (SSA) construction.
A more famous example of a Tracing JIT is Mozilla’s JavaScript interpreter for Firefox(2009). With their SpiderMonkey, Mozilla presented an alternative technique to compile frequently executed loop traces in dynamic languages at run-time and specialize the generated code for the actual dynamic types occuring on each path.
Another well-known project that utilizes Tracing JITs is PyPy. It enables the use of Tracing JITs for language implementations that were written with PyPy's translation toolchain, thus improving the performance of any program that is executed using that interpreter. This was possible by tracing the interpreter itself, instead of the program that is executed by the interpreter. The problem with this approach was that in an interpreter the only hot loop is the bytecode dispatch loop. Therefore not only single opcodes were traced but instead several opcodes, unrolling the bytecodes in such a way that they corresponded to the user program.
Tracing JITs have also been explored by Microsoft for their Common Intermediate Language (CIL). Their goal was to enable the optimizations of the JIT in any program that is compiled to that platform.
Concepts
Guards
Guards are special instructions that ensure the execution of an already jitted path through conditions. While recording a trace, guard instructions get set at decision points (flow control points) making sure that the path recorded gets executed later on. Guards implicitly also mark so called “side exits”, meaning that if a guard is called and the execution is about to leave the jitted path because of not matching the condition, the interpreter regains control about the execution at bytecode level, exiting the trace. Side exists therefore mark starting points for non-jitted paths.
Trace Trees
The data structure for storing multiple jitted paths linked to another is called a trace tree. Any bytecode can be the root of such a trace tree, but usually it’s
Trace trees get expanded once certain side exists are left too often, marking the alternating path as hot and therefore compile-worthy. The new path then gets compiled and attached to the trace tree, creating a node where the failed guard was placed before. The guard gets removed since it is now superfluous.
Through the structure of the tree, it enables the execution to jump from one trace to another without having the interpreter interfere, which speeds up the execution.
Applications
Many applications already make use of the tracing JIT technique, such as the Mozilla Firefox# webbrowser or the Python implementation PyPy#. One of the most prominent project using the techniques of tracing JITs is TraceMonkey##, the first JIT compiler for JavaScript.
Example of a Trace
Consider the following Python program that calculates the square of all numbers from 1 to 10000:
i = 0
while i < 10000:
y = i*i
i = i + 1
A trace for this program could look something like this:
label(loopstart) i1 = getvalue(Integer, i) i2 = int_lt(i1, 10000) # i < 10000 guard_true(i2) i3 = int_mul_ovf(i1, i1) # i*i guard_no_overflow(i3) p1 = new(Integer) setvalue(p1, i3) # y = i*i i6 = getvalue(Integer, i) i7 = int_add_ovf(i6, 1) # i = i+1 guard_no_overflow(i7) p2 = new(Integer) setvalue(p2, i7) jump(loopstart)
The trace starts inside the while loop with checking the condition i < 10000
. First the value i
is read and then compared to 10000
(usually in dynamic languages, reading the value i would also necessitate to unpack it from an object. But this can be ignored in favour of a better comprehension). The result of the comparison is stored on the stack and then checked by a guard. The guard must not fail, otherwise the trace is left and exeuction will be returned to the interpreter. In the next instructions the value i
is multiplicated with itself and stored on the variable y
. For this a new integer object is created and the result stored in one of its attributes. Then the index variable i
is increased while a guard checks for an integer overflow. The jump at the end ensures that the trace is executed until one of the guard fails.