Bytecode

From Wikipedia, the free encyclopedia
Jump to: navigation, search
"Portable code" redirects here. For other uses, see software portability.

Bytecode, also known as portable code or p-code, is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normally numeric addresses) that encode the result of parsing and semantic analysis of things like type, scope, and nesting depths of program objects. They therefore allow much better performance than direct interpretation of source code.

The name bytecode stems from instruction sets that have one-byte opcodes followed by optional parameters. Intermediate representations such as bytecode may be output by programming language implementations to ease interpretation, or it may be used to reduce hardware and operating system dependence by allowing the same code to run on different platforms. Bytecode may often be either directly executed on a virtual machine (a p-code machine i.e. interpreter), or it may be further compiled into machine code for better performance.

Since bytecode instructions are processed by software, they may be arbitrarily complex, but are nonetheless often akin to traditional hardware instructions; virtual stack machines are the most common, but virtual register machines have also been built.[1][2] Different parts may often be stored in separate files, similar to object modules, but dynamically loaded during execution.

Execution[edit]

A bytecode program may be executed by parsing and directly executing the instructions, one at a time. This kind of bytecode interpreter is very portable. Some systems, called dynamic translators, or "just-in-time" (JIT) compilers, translate bytecode into machine language as necessary at runtime: this makes the virtual machine hardware-specific, but doesn't lose the portability of the bytecode itself. For example, Java and Smalltalk code is typically stored in bytecoded format, which is typically then JIT compiled to translate the bytecode to machine code before execution. This introduces a delay before a program is run, when bytecode is compiled to native machine code, but improves execution speed considerably compared to direct interpretation of the source code—normally by several orders of magnitude.[citation needed]

Because of its performance advantage, today many language implementations execute a program in two phases, first compiling the source code into bytecode, and then passing the bytecode to the virtual machine. There are bytecode based virtual machines of this sort for Java, Python, PHP,[3] Tcl, awk and Forth (however, Forth is not ordinarily compiled via bytecodes in this way, and its virtual machine is more generic instead). The implementation of Perl and Ruby 1.8 instead work by walking an abstract syntax tree representation derived from the source code.

More recently, the authors of V8[4] and Dart[5] have challenged the notion that intermediate bytecode is a necessity for fast and efficient VM implementation. Both of these language implementations currently do direct JIT compilation from source code to machine code without any bytecode intermediary.[6]

Examples[edit]

References[edit]

  1. ^ The Implementation of Lua 5.0 involves a register-based virtual machine.
  2. ^ "Dalvik VM".  is register based
  3. ^ Although PHP opcodes are generated each time the program is launched, and are always interpreted and not Just-In-Time compiled
  4. ^ "Dynamic Machine Code Generation". Google. 
  5. ^ Loitsch, Florian. "Why Not a Bytecode VM?". Google. 
  6. ^ "JavaScript myth: JavaScript needs a standard bytecode". 
  7. ^ The Implementation of the Icon Programming Language
  8. ^ The Implementation of Icon and Unicon a Compendium
  9. ^ Paul, Matthias (2001-12-30). "KEYBOARD.SYS internal structure". comp.os.msdos.programmer. Retrieved 2016-09-17. […] In fact, the format is basically the same in MS-DOS 3.3 - 8.0, PC DOS 3.3 - 2000, including Russian, Lithuanian, Chinese and Japanese issues, as well as in Windows NT, 2000, and XP […]. There are minor differences and incompatibilities, but the general format has not changed over the years. […] Some of the data entries contain normal tables […]. However, most entries contain "executable code" interpreted by some kind of P-code interpreter at *runtime*, including conditional branches and the like. This is why the KEYB driver has such a huge memory footprint compared to table-driven keyboard drivers which can be done in 3 - 4 Kb getting the same level of functionality except for the interpreter. […] 
  10. ^ Mendelson, Edward (2001-07-20). "How to Display the Euro in MS-DOS and Windows DOS". Display the euro symbol in full-screen MS-DOS (including Windows 95 or Windows 98 full-screen DOS). Archived from the original on 2016-09-17. Retrieved 2016-09-17. […] Matthias Paul […] warns that the IBM PC DOS version of the keyboard driver uses some internal procedures that are not recognized by the Microsoft driver, so, if possible, you should use the IBM versions of both KEYB.COM and KEYBOARD.SYS instead of mixing Microsoft and IBM versions […]  (NB. What is meant by "procedures" here are some additional byte codes in the IBM KEYBOARD.SYS file not supported by the Microsoft version of the KEYB driver.)
  11. ^ For the details refer to "United States Patent 6,973,644". 
  12. ^ For the details refer to "R Installation and Administration". 
  13. ^ "The SQLite Bytecode Engine".