From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
WikiProject Computing / Software (Rated Start-class, High-importance)
WikiProject iconThis article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 High  This article has been rated as High-importance on the project's importance scale.
Taskforce icon
This article is supported by WikiProject Software (marked as High-importance).


Some pretentious people making something of nothing. Bytecode is nonsense (just invented to make your dicks seem bigger).This article is total tosh and needs completely re-writing in terms of hex vs binary and interpreting for virtual machines. Better still, delete it all. —Preceding unsigned comment added by (talk) 12:39, 13 April 2009 (UTC)

Hey guys, you'd better point out some disadvantages of bytecode once you speak about its advantages!!

This article is, umm, rather incomprehensible to someone who doesn't already know everything about this topic. k.lee

Bytecode may be used as an intermediate code of a compiler, or may be the saved 'tokenized' form used by an interpreter

How can this sentence be related to virtual machine? -- HJH

Bytecode may be used as an intermediate code of a compiler, or may be the saved 'tokenized' form used by an interpreter or a virtual machine

"Byte code", "byte-code", and "bytecode" seem to be fighting it out. Specifically, there is an entry for the Java Bytecode. Anyone have a strong preference as to which the final version should be? Charles Merriam 21:10, 4 February 2006 (UTC)

  • BytecodeRuud 01:19, 11 February 2006 (UTC)
  • I also vote for bytecode; a quick Google search seems to confirm its greater popularity. More importantly, "The Java Virtual Machine Specification" by Lindholm and Yellin (1997) spells it "bytecode". This is surely the ultimate reference, at least for the Java version. --Mike Van Emmerik 12:37, 11 February 2006 (UTC)
  • Bytecode, definately. Wouter Lievens 12:51, 11 February 2006 (UTC)
  • Bytecode, I don't think "byte" works as an adjective njaard 15:16, 23 March 2006 (UTC)
  • Bytecode is more common. Obviously, make redirects. --Leapfrog314 04:58, 20 April 2006 (UTC)
  • Bytecode | Acaciz 18:01, 3 June 2006 (UTC)
  • Bytecode. ais523 16:55, 15 June 2006 (UTC)
Page moved. Eugène van der Pijll 21:14, 24 June 2006 (UTC)

"The current reference implementation of the Ruby programming language does not use bytecode, however it relies on tree-like structures which resemble intermediate representations used in compilers.". Is it relevant to talk about Ruby not using bytecode in this article? - Philoctet

incorrect use of term bytecode[edit]

I believe that this entire article is a misuse of the term bytecode. I have worked near machine level in computer science for many years, and in my experience, bytecode applies specifically to the Java Virtual Machine, whose instruction set does indeed consist of one-byte opcodes. For other programming languages, the correct term for what this article described is "intermediate language". Visual Basic compiles to an intermediate language, as did Pascal, Smalltalk, and others. These were NEVER to my knowledge called "bytecode."

I think this needs to be fixed, hopefully by the author of this article.

I agree. / HenkeB (talk) 21:40, 13 January 2008 (UTC)

I disagree. See PuerExMachina (talk) 04:51, 14 January 2008 (UTC)

Ok, is there a fundamental difference between "bytecode" and other intermediate representations then, as you see it? / HenkeB (talk) 14:47, 14 January 2008 (UTC)
Yes an intermediary representation does not have to have it's tokens stored in a single byte per token. It's quite possible that a completely different scheme is used. A "bytecode" representation however, always uses a single byte for a single pseudo opcode token.Mahjongg (talk) 16:43, 14 January 2008 (UTC)
A quite superficial distinction then, if I understand you correctly? Sounds like, perhaps, this article should be renamed intermediate code with the word bytecode redirect here (instead of the other way round) - or better - create a separate one for intermediate code. However, I had the feeling this java-term had begun to mean just about any representation that is more similar (isomorph) to ordinary machine code than, say, tree-structured code, "quadruples", or stack code? / HenkeB (talk) 17:47, 14 January 2008 (UTC)

Smalltalk-80 used the term Bytecode as well, and, it was always a inconsistent notion. Smalltalk bytecode do not use a fixed size to encode opcodes, but 4bit to 8bit, and there are instructions which are encoded with 2byte. So,in essence, the term Bytecode is usually used to name a VM instruction set which is designed with a hardware instruction set architecture in mind. 2009-08-20 —Preceding unsigned comment added by (talk) 12:09, 20 August 2009 (UTC)

Well, I will just comment that personally I find both this article, as well as the one on interpreters, to be rather vague / misleading.

For example, usually it is not "semantic analysis" (as I understand the term) which produces bytecode, rather, it is more commonly the process of flattening an AST which produces bytecode, with this process driving the remainder of the compiler logic, and often with little or no "semantic analysis" (at least for many dynamic languages, where most of this is left to be figured out at runtime).

As for Java and Bytecode, I think Java popularized the term, but they by no means own it. Generally, it refers to a byte-centric opcode-based structure, with 1 (or more) bytes for an opcode, and usually any arguments directly following. Usually, it is understood to be interpreted linearly as well (similar to machine code), and often handling control flow via offsets and jumps, rather than being tree or graph structured or using high-level control flow.

Its main property then is usually that of being similar to, but at the same time usually far less complex and bit-twiddly than, machine code (as well as traditionally interpreted or JIT-compiled rather than being directly run on a piece of hardware). —Preceding unsigned comment added by (talk) 18:56, 2 October 2009 (UTC)

The focus should probably be on 'byte-oriented', in the sense of simplifying instruction decoding. The op-code is only one of several fields -- it is not a great benefit if the op-code is easy to extract, while other fields are complex. I've always thought the instruction encoding used for the EM-1 'machine' was a good example: opcode is one byte, escape sequence is one byte, and address fields is one or two bytes. There are a few exceptions where the instructions and arguments were encoded into one byte, but this was to speed execution of very common instructions. (See Informatica Report IR-81 (from 1983) by Andrew S Tanenbaum et al.: Description of a machine architecture for use with block structured languages.) Although the term 'bytecode' is not used by the authors, it has been used in descriptions of the Amsterdam Compiler Kit, of which EM-1 was a central concept.Athulin (talk) 09:46, 10 January 2011 (UTC)

Layman's terms[edit]

I have to agree with the above comments about how the article needs to be easier to understand. I'm a part-time developer for various languages for the past 10 years. And I don't even understand what byte-code is, nor has this article helped. I'm not suggesting we compromise and make a 'for dummies' article, but just add a sentence here and there to help clarify.

I also agree with this. Specifically, in the sentence, "Since it is processed by software, [bytecodes are] usually more abstract than machine code". In what sense is the word "abstract" being used? When comparing it to "machine code", do you mean more abstract than binary code or more abstract than assembly language? So, is bytecode higher level compared to one of these or lower level than one of these, or just different syntactically?

The following sentence is also similarly confusing: "Compared to source code (intended to be human-readable), bytecodes are less abstract, more compact, and more computer-centric."

From what I understand, being "more abstract" usually means lower level. So how could bytecode be more human readable (less abstract) but then more abstract than machine code??? From the current description, I interpret the former sentence to mean that Bytecode is lower level than assembly, or possibly, even lower level than binary, which isn't possible! I am also not familiar with how the word "computer-centric" is generally used when refering to levels of computer code, but I think this needs to be described more simply.

In normal computer terminology, the more abstract the code the further it is removed from the physical implementation on the hardware. Usually more abstract code is therefore easier for a human to understand in everyday concepts instead of machine concepts. -- RTC 18:47, 7 February 2007 (UTC)

Bytecode execution techniques?[edit]

This and virtual machine both don't explain any techniques used to execute bytecode. I've sketched some basic thoughts out on a blog entry of mine at KernelTrap; but I don't know how current day ones operate, if there's generally optimization, if instruction ordering counts, etc.

binary requirement[edit]

Could a textual language be considered bytecode? Obviously not, but this article lists CIL as an example of bytecode, and many other articles call it "bytecode". The article for CIL even calls it both "human-readable" and "bytecode". CIL example Herorev 21:22, 19 November 2006 (UTC)

It seems CIL is not in itself bytecode, but can be assembled into bytecode. So CIL itself is not a form of bytecode. The only "human readable bytecode" I can think of is one which just uses each letter of the alphabet were each letter stands for the mnemonic of one opcode, for example the letter 'G' (47Hex) for 'Goto'. That would be "readable", and would use one byte for each instruction. Mahjongg (talk) 11:57, 14 January 2008 (UTC)
Another human-readable bytecode: the original wiki mentions "sed scripts don't need to be tokenized; they already are ... All sed commands are one byte long, not including arguments. ... More languages than just sed have this property or a similar one." -- WikiWikiWeb: LittleLanguage. -- (talk) 01:43, 25 October 2008 (UTC)

Old page history[edit]

For old page history that used to be at this title, see Talk:Bytecode/old. Graham87 08:33, 5 February 2009 (UTC)

Merge neologisms[edit]

So far as I can tell, “bytecode” appears to be nothing but an euphemism for a slightly lower level Interpreted language. (talk) 03:43, 19 June 2010 (UTC)

Bytecode vs. machine code[edit]

I have suggested that bytecode and machine code be merged on the machine code page's discussion page -- my point being that the two words are interchangeable in all cases and that there is no way to tell them apart nor do they in any way differ from one another. Bytecode langauges are just ordinary languages we pretend are not, by, implmenting them in software rather than hardware. There is no reason why any language cannot be run in hardware as well as software. There are plenty of example of hardware implementations of bytecode languages (various java processors) and software implementations of machine code languages (qemu, boch, vmware, etc.). Besides, the bytecode article appears mostly to be a list of example languages that are considered to be bytecode. I would also like to note that it is entirely possible to translate langauges which typically are translated into a "bytecode" language, into a "machine code" language and vice versa (GCJ for instance translates Java source code into native machine code. I'm certain you can find C compilers that target any particular "bytecode" machine). FrederikHertzum (talk) 13:35, 19 June 2010 (UTC)

The main difference is that bytecodes are specifically designed to run on systems independent of the native machine code of the system. You are right that the bytecode could be the native machine code, but that is often done to run such bytecode on a platform optimized to run it. bytecode is NOT the same thing as machinecode. Mahjongg (talk) 14:21, 19 June 2010 (UTC)
From a computer scientists point of view, there is no difference between a virtual or a real machine and their instruction sets are clearly equally powerful (given that both are Turing complete). My point is not what is being done, but the technical difference between the two terms, being that there is none. Being designed to be run on-top of another, unknown, machine is hardly a good criterion for distinction. FrederikHertzum (talk) 03:41, 21 June 2010 (UTC)
Keep Separate — Well, the difference is that a "virtual machine" needs a real machine... Also, all computer related articles cannot have an abstract computer "science" point of view; although basic principles are indeed important to clarify, real world usage, practical aspects, and conventions are equally (or more) important in most articles on computing, electronics and science. (talk) 08:03, 26 July 2010 (UTC)
Keep Separate — But this article needs to be refined and its category has to be determined. Is it related to virtual machines, interpreters, or process virtual machines? The page on VMs has to explain the type of code --Melab±1 22:45, 19 June 2010 (UTC)
Keep Separate The term Bytecode has been in use ever since its canonical example (or maybe even earlier) of a byte-code in the form of the p-code used by UCSD Pascal, which was one of the contenders for an operating system for the upcoming IBM PC. So at the very least the term has historical significance. Mahjongg (talk) 17:52, 21 June 2010 (UTC)

List of examples[edit]

It seems to me like the list of examples will grow indefinitely, as most interpreters use bytecode as an intermediate representation for interpretation, including many of those which can also optionally produce native code or compile to another language (i.e. to C or LLVM). I can myself immediately think of various examples which are missing from the list, but am not sure it would be wise to add them... (talk) 15:13, 5 July 2013 (UTC) It wouldn't.