Bytecode: Difference between revisions

Content deleted Content added

Inline

Revision as of 02:54, 7 August 2014

Bytecode, also known as p-code (portable code), is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) which encode the result of parsing and semantic analysis of things like type, scope, and nesting depths of program objects. They therefore allow much better performance than direct interpretation of source code.

The name bytecode stems from instruction sets which have one-byte opcodes followed by optional parameters. Intermediate representations such as bytecode may be output by programming language implementations to ease interpretation, or it may be used to reduce hardware and operating system dependence by allowing the same code to run on different platforms. Bytecode may often be either directly executed on a virtual machine (i.e. interpreter), or it may be further compiled into machine code for better performance.

Since bytecode instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware instructions; virtual stack machines are the most common, but virtual register machines have also been built.^[1]^[2] Different parts may often be stored in separate files, similar to object modules, but dynamically loaded during execution.

Execution

A bytecode program may be executed by parsing and directly executing the instructions, one at a time. This kind of bytecode interpreter is very portable. Some systems, called dynamic translators, or "just-in-time" (JIT) compilers, translate bytecode into machine language as necessary at runtime: this makes the virtual machine hardware-specific, but doesn't lose the portability of the bytecode itself. For example, Java and Smalltalk code is typically stored in bytecoded format, which is typically then JIT compiled to translate the bytecode to machine code before execution. This introduces a delay before a program is run, when bytecode is compiled to native machine code, but improves execution speed considerably compared to direct interpretation of the source code—normally by several magnitudes.^{[citation needed]}

Because of its performance advantage, today many language implementations execute a program in two phases, first compiling the source code into bytecode, and then passing the bytecode to the virtual machine. There are bytecode based virtual machines of this sort for Java, Python, PHP,^[3] Tcl, and Forth (however, Forth is not ordinarily compiled via bytecodes in this way, and its virtual machine is more generic instead). The implementation of Perl and Ruby 1.8 instead work by walking an abstract syntax tree representation derived from the source code.

More recently, the authors of the V8^[4] and Dart^[5] languages have challenged the notion that intermediate bytecode is a necessity for fast and efficient VM implementation. Both of these language implementations currently do direct JIT compilation from source code to machine code without any bytecode intermediary.^[6]

Examples

ActionScript executes in the ActionScript Virtual Machine (AVM), which is part of Flash Player and AIR. ActionScript code is typically transformed into bytecode format by a compiler. Examples of compilers include the one built into Adobe Flash Professional and the one that is built into Adobe Flash Builder and available in the Adobe Flex SDK.
Adobe Flash objects
BANCStar, originally bytecode for an interface-building tool but used as a language in its own right.
Byte Code Engineering Library
C to Java Virtual Machine compilers
CLISP implementation of Common Lisp used to compile only to bytecode for many years; however, now it also supports compilation to native code with the help of GNU lightning.
CMUCL and Scieneer Common Lisp implementations of Common Lisp can compile either to bytecode or to native code; bytecode is much more compact
Common Intermediate Language executed by Common Language Runtime. Used by Microsoft .NET languages such as C#.
Dalvik bytecode, designed for the Android platform, is executed by the Dalvik virtual machine.
Dis bytecode, designed for the Inferno (operating system), is executed by the Dis virtual machine.
EiffelStudio for the Eiffel programming language
Emacs is a text editor with a majority of its functionality implemented by its specific dialect of Lisp. These features are compiled into bytecode. This architecture allows users to customize the editor with a high level language, which after compilation into bytecode yields reasonable performance.
Embeddable Common Lisp implementation of Common Lisp can compile to bytecode or C code
Ericsson implementation of Erlang uses BEAM bytecodes
Icon^[7] and Unicon^[8] programming languages
Infocom used the Z-machine to make its software applications more portable.
Java bytecode, which is executed by the Java Virtual Machine
- ASM
- BCEL
- Javassist
- JMangler
LLVM, a modular bytecode compiler and virtual machine
Lua uses a register-based bytecode virtual machine.
m-code of the MATLAB programming language^[9]
O-code of the BCPL programming language
OCaml programming language optionally compiles to a compact bytecode form
p-code of UCSD Pascal implementation of the Pascal programming language
Parrot virtual machine
The R environment for statistical computing offers a byte code compiler through the compiler package, now standard with R version 2.13.0. It is possible to compile this version of R so that the base and recommended packages take advantage of this.^[10]
Scheme 48 implementation of Scheme using bytecode interpreter
Bytecodes of many implementations of the Smalltalk programming language
The SPIN interpreter built into the Parallax Propeller Microcontroller
SWEET16
Visual FoxPro compiles to bytecode
YARV and Rubinius for Ruby.

Notes

^ The Implementation of Lua 5.0 involves a register-based virtual machine.
^ "Dalvik VM". is register based
^ Although PHP opcodes are generated each time the program is launched, and are always interpreted and not Just-In-Time compiled
^ "Dynamic Machine Code Generation". Google.
^ Loitsch, Florian. "Why Not a Bytecode VM?". Google.
^ "JavaScript myth: JavaScript needs a standard bytecode".
^ The Implementation of the Icon Programming Language
^ The Implementation of Icon and Unicon a Compendium
^ For the details refer to "United States Patent 6,973,644".
^ For the details refer to "R Installation and Administration".

[1] The Implementation of Lua 5.0 involves a register-based virtual machine.

[2] "Dalvik VM". is register based

[3] Although PHP opcodes are generated each time the program is launched, and are always interpreted and not Just-In-Time compiled

[4] "Dynamic Machine Code Generation". Google.

[5] Loitsch, Florian. "Why Not a Bytecode VM?". Google.

[6] "JavaScript myth: JavaScript needs a standard bytecode".

[7] The Implementation of the Icon Programming Language

[8] The Implementation of Icon and Unicon a Compendium

[9] For the details refer to "United States Patent 6,973,644".

[10] For the details refer to "R Installation and Administration".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

@@ Line 22: / Line 22: @@
 *[[Byte Code Engineering Library]]
 *[[Java Virtual Machine#C to bytecode compilers|C to Java Virtual Machine compilers]]
-*[[CLISP]] implementation of [[Common Lisp]] compiles only to bytecode
+*[[CLISP]] implementation of [[Common Lisp]] used to compile only to bytecode for many years; however, now it also supports compilation to native code with the help of [[GNU lightning]].
 *[[CMUCL]] and [[Scieneer Common Lisp]] implementations of [[Common Lisp]] can compile either to bytecode or to native code; bytecode is much more compact
 *[[Common Intermediate Language]] executed by [[Common Language Runtime]]. Used by [[.NET Framework|Microsoft .NET]] languages such as [[C Sharp (programming language)|C#]].