Talk:Java class file

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Java (Rated Start-class, Low-importance)
WikiProject icon This article is within the scope of WikiProject Java, a collaborative effort to improve the coverage of Java on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Low  This article has been rated as Low-importance on the project's importance scale.
 

Structure section needs a lot of work[edit]

The structure sections need some work, it is incorrect as is. The "arrays", such as the constant pool, are not arrays in the typical sense (the entries are not all the same size), instead it is more like a stream. Thus the formulas calculating it's size are misleading at best; they don't compute the number of bytes used. Also they don't account for the strange fact that long and double types consume two slots (leaving a phantom unused slot behind) rather than one slot like all other types. It should also be mentioned that the JVM's UTF-8 string type is not quite standard UTF-8; it is a modified non-standard form. The C representation is also by no means able to "fully represent" the file structure as stated. First, one must assume big-endian layout, no padding bytes, and the use of IEEE 754 floating point storage format. And as mentioned the arrays are nothing like C arrays at all (they contain variable-width members), so that part is pseudo code at best. Also the structure shown is only the outermost layer; none of the dependent inner types are shown in the code (so it can not be a complete representation). The table is also just visually cluttered, I'm sure it could be cleaned up a lot. - Dmeranda 05:20, 23 October 2007 (UTC)

I've made some attempts at cleaning up the large table and elaborating on some of the details. I also removed the fake "C structure" section entirely; it was not even close to being correct C code, and was not at all useful to this article. - Dmeranda 21:09, 23 October 2007 (UTC)

C programming language representation belongs in article[edit]

The Java Virtual Machine Specification introduces the class file format with a C-like representation rather than a binary table format for a reason. Some find this more readable and easier to comprehend. That is why both are present in this article. @modi 05:06, 8 November 2007 (UTC)

I appreciate trying to make this article more readable, but with this case I disagree. The reason is apparently because the JVM spec authors don't know C, or didn't care, or because they follow the code with dozens of pages of clarifying text which then go on to explain why the format is actually not like the pseudo-C code they just presented. This example code is not even remotely close to being correct C code (or even Java code), and trying to present it as such is quite misleading. Most of the "types" are of undefined variable length (C types are always fixed length). No padding bytes are ever introduced (as C frequently does). A definite byte-order is mandated, which C can never indicate. The things which look like C arrays are stored absolutely nothing like real C arrays. Even the indicated "length" of the arrays is completely wrong in the constant_pool case due to the double-slot flaw. In fact, just about everything in the file format is so completely not like C that I wonder what in the world the writers of the specification were thinking. If a reader can't understand a table of simple file offsets and wants to pretend like they can use a C structure then they can read the spec; that's why it's listed in the references. This article is not trying to replace the entire quite-lengthy specification, it is only showing a flavor of the format. Copying large blocks of poorly-authored text verbatim from the referenced document (which shouldn't be done regardless of quality) and taken out of context (from all the clarifying text that follows) provides no extra value here, and only clutters the article and adds about a dozen ways the reader can be easily misled. - Dmeranda 14:08, 8 November 2007 (UTC)

Re structure and representation[edit]

One thing that isn't mentioned in the structure section, and is a very convenient thing to know in practice, is the assurance from JVMS chapter 4 that the concrete representations of various data types in the class file are exactly the representations consumed and produced by java.io.DataInputStream and java.io.DataOutputStream, respectively.--128.210.4.213 (talk) 22:47, 3 January 2008 (UTC)

Re history[edit]

It might be very useful for the history section to summarize the differences between class file versions in terms of which new attributes become mandatory in 46.0, 49.0, and 50.0 (JSR202 JVMS revised p. 126), and the opcodes to become illegal in 51.0 (p. 166). I haven't said more because I am not sure how much would be disclosable under the annoying EULA I had to accept before reading JSR202, but if others think such a brief summary could be allowed, I think it would be a valuable addition to the page. Also, footnote 1 on revised p. 96 supplies a relationship between class version and JDK version for JDK versions 1.2 and onward.--128.210.4.213 (talk) 22:47, 3 January 2008 (UTC)

Is this sort of history noteworthy, or is it sufficient to just link to more detailed external specifications? Certainly I don't think this article should get to the level of mentioning specific JVM opcodes. As far as using non-free sources of information, remember that anything here must be able to meet both the WP:V and WP:C rules. Many people including myself will never cross the EULA barrier, so it is hard to offer advise on what's allowed or not in this case. Perhaps you can get better answers from Wikipedia:Copyright assistance. - Dmeranda (talk) 15:54, 4 January 2008 (UTC)
What I mostly thought would be useful was a high-level, one-liner sort of overview of what has happened between class-format/JVM revisions. Of course from one JDK version to the next there are always hundreds of changes to the Java library APIs and often changes to the source syntax as well; but changes to the virtual machine and the class file format are less common (a good thing, because they affect deployability of compiled code). When such changes are made, they tend to reveal interesting fundamental things about what new Java features were unimplementable on the older VM, and what minimal set of VM changes were required to support them. The information also helps in understanding which capabilities you are likely to lose when using -target to generate an older class format.
I did find a nice EULA-free description of what happened in class format 50.0: 50.0 type checking verifier.--128.210.4.214 (talk) 19:12, 8 January 2008 (UTC)

Major and Minor Version values in Class File header[edit]

What is the source for the description given for the Major version in the class file header?

I checked a number of JDK Java compilers, and found

  * 45.3:  javac 1.2.2
  * 46.0:  javac 1.4.2_12,1.4.2_14
  * 49.0:  javac 1.5.0_11,1.5.0_12
  * 50.0:  javac 1.6.0_06

So there is some difference to the description text for the major version given in this article, at least some JDK 1.4.x versions seem to be using major 46 instead of 48, and at least some 1.2.x versions use 45 instead of 46.

boolean, byte, and short in constant pool[edit]

The article states, "Other integral types appearing in the high-level language, such as boolean, byte, and short must be represented as an integer constant."

While it's true that if you wanted to include a constant of one of these types into the constant pool, an integer entry was needed, a standard compiler will never do so as there are instructions (bipush and sipush) for using small integer values directly without the need for a constant pool entry at all.

Since the Java bytecode makes no difference between these integral types, all int values fitting into the short value range are encoded this way. Only values outside this range or needed for initializing compile-time constants (e.g. static final fields) will end up in the constant pool. —Preceding unsigned comment added by 77.188.94.56 (talk) 15:56, 22 February 2011 (UTC)

Indeed, byte, short, and smaller char constant values used in the code will never appear in the constant pool (and maybe this is worth mentioning in the article). But for other uses of constant pool entries, e.g. the compile-time values of static final variables, these types require an int entry. There is another difference between the general class file format and the byte code instructions: as the article states, there’s no padding in the class file format, but within byte code instructions there is an exception to this rule regarding the two switch instructions as their tables are aligned. 77.12.173.11 (talk) — Preceding undated comment added 09:40, 16 April 2014 (UTC)