Java performance: Difference between revisions

Content deleted Content added

Inline

Revision as of 11:58, 23 August 2010

Objectively comparing the performance of a Java program and another equivalent one written in another programming language such as C++ is a convoluted and controversial task. The target platform of Java's bytecode compiler is the Java platform, and the bytecode is either interpreted or compiled into machine code by the JVM. Other compilers almost always target a specific hardware/software platform, producing machine code that will stay virtually unchanged during its execution. Very different and hard-to-compare scenarios arise from these two different approaches: static vs. dynamic compilations and recompilations, the availability of precise information about the runtime environment and others.

The performance of the compiled Java program will depend on how smartly its particular tasks are going to be managed by the host JVM, and how well for doing it the JVM could take advantage of the features of the hardware and OS. Thus, any Java performance test or comparison has always to report the version, vendor, OS and hardware architecture of the used JVM. In a similar manner, the performance of the equivalent natively-compiled program will depend on the quality of its generated machine code, so the test or comparison also has to report the name, version and vendor of the used compiler, and its activated optimization directives.

Historically, Java programs' execution speed improved significantly due to the introduction of Just-In Time compilation (in 1997/1998 for Java 1.1),^[1]^[2]^[3] the addition of language features supporting better code analysis, and optimizations in the Java Virtual Machine itself (such as HotSpot becoming the default for Sun's JVM in 2000). Hardware execution of Java bytecode, such as that offered by ARM's Jazelle, can also offer significant performance improvements.

Virtual machine optimization techniques

Many optimizations have improved the performance of the Java Virtual Machine over time. However, although Java was often the first Virtual machine to implement them successfully, they have often been used in other similar platforms as well.

Just-In-Time compilation

Early Java Virtual Machines always interpreted bytecodes. This had a huge performance penalty (between a factor 10 and 20 for Java versus C in average applications).^[4] To combat this, a just-in-time (JIT) compiler was introduced into Java 1.1. Due to the high cost of compilation, an additional system called HotSpot was introduced into Java 1.2 and was made the default in Java 1.3. Using this framework, the Virtual Machine continually analyzes the program's performance for "hot spots" which are frequently or repeatedly executed. These are then targeted for optimization, leading to high performance execution with a minimum of overhead for less performance-critical code.^[5]^[6] Some benchmarks show a 10-fold speed gain from this technique.^[7] Time constraints may prevent the JIT compiler from performing certain optimisations, but other opimisations, such as lock-elision, may be enabled by the run-time information available to a JIT compiler; such information would not be available to a non-profiling ahead-of-time compiler.

Adaptive optimization

Adaptive optimization is a technique in computer science that performs dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between Just-in-time compilation and interpreting instructions. At another level, adaptive optimization may take advantage of local data conditions to optimize away branches and to use inline expansion.

A Virtual Machine like HotSpot is also able to deoptimize a previously JITed code. This allows it to perform aggressive (and potentially unsafe) optimizations, while still being able to deoptimize the code and fall back on a safe path later on.^[8]^[9]

Garbage collection

The 1.0 and 1.1 Virtual Machines used a mark-sweep collector, which could fragment the heap after a garbage collection. Starting with Java 1.2, the Virtual Machines switched to a generational collector, which has a much better defragmentation behaviour.^[10] Modern Virtual Machines use a variety of techniques that have further improved the garbage collection performance.^[11]

Other optimization techniques

Split bytecode verification

Prior to executing a class, the Sun JVM verifies its bytecodes (see Bytecode verifier). This verification is performed lazily: classes bytecodes are only loaded and verified when the specific class is loaded and prepared for use, and not at the beginning of the program. (Note that other verifiers, such as the Java/400 verifier for IBM System i, can perform most verification in advance and cache verification information from one use of a class to the next.) However, as the Java Class libraries are also regular Java classes, they must also be loaded when they are used, which means that the start-up time of a Java program is often longer than for C++ programs, for example.

A technique named Split-time verification, first introduced in the J2ME of the Java platform, is used in the Java Virtual Machine since the Java version 6. It splits the verification of bytecode in two phases:^[12]

Design-time - during the compilation of the class from source to bytecode
runtime - when loading the class.

In practice this technique works by capturing knowledge that the Java compiler has of class flow and annotating the compiled method bytecodes with a synopsis of the class flow information. This does not make runtime verification appreciably less complex, but does allow some shortcuts^{[citation needed]}.

Escape analysis and lock coarsening

Java is able to manage multithreading at the language level. Multithreading is a technique that allows one to

improve a user's perceived impression about program speed, by allowing user actions while the program performs tasks, and
take advantage of multi-core architectures, enabling multiple unrelated tasks to be performed at the same time by multiple cores.

However, programs that use multithreading need to take extra care of objects shared between threads, locking access to shared methods or blocks when they are used by one of the threads. Locking a block or an object is a time-consuming operation due to the nature of the underlying operating system-level operation involved (see concurrency control and lock granularity).

As the Java library does not know which methods will be used by more than one thread, the standard library always locks blocks when necessary in a multithreaded environment.

Prior to Java 6, the virtual machine always locked objects and blocks when asked to by the program (see Lock Implementation), even if there was no risk of an object being modified by two different threads at the same time. For example, in this case, a local Vector was locked before each of the add operations to ensure that it would not be modified by other threads (Vector is synchronized), but because it is strictly local to the method this is not necessary:

public String getNames() {
     Vector v = new Vector();
     v.add("Me");
     v.add("You");
     v.add("Her");
     return v.toString();
}

Starting with Java 6, code blocks and objects are locked only when necessary [1] [2], so in the above case, the virtual machine would not lock the Vector object at all.

As of version 6u14, Java includes experimental support for escape analysis. [3]

Register allocation improvements

Prior to Java 6, allocation of registers was very primitive in the "client" virtual machine (they did not live across blocks), which was a problem in architectures which did not have a lot of registers available, such as x86 for example. If there are no more registers available for an operation, the compiler must copy from register to memory (or memory to register), which takes time (registers are significantly faster to access). However the "server" virtual machine used a color-graph allocator and did not suffer from this problem.

An optimization of register allocation was introduced in Sun's JDK 6^[13]; it was then possible to use the same registers across blocks (when applicable), reducing accesses to the memory. This led to a reported performance gain of approximately 60% in some benchmarks.^[14]

Class data sharing

Class data sharing (called CDS by Sun) is a mechanism which reduces the startup time for Java applications, and also reduces memory footprint. When the JRE is installed, the installer loads a set of classes from the system jar file (the jar file containing all the Java class library, called rt.jar) into a private internal representation, and dumps that representation to a file, called a "shared archive". During subsequent JVM invocations, this shared archive is memory-mapped in, saving the cost of loading those classes and allowing much of the JVM's Metadata for these classes to be shared among multiple JVM processes.^[15]

The corresponding improvement for start-up time is more noticeable for small programs.^[16]

Sun Java versions performance improvements

Apart from the improvements listed here, each Sun's Java version introduced many performance improvements in the Java API.

JDK 1.1.6 : First Just-in-time compilation (Symantec's JIT—compiler)^[1]^[3]

J2SE 1.2 : Use of a generational collector.

J2SE 1.3 : Just-In-Time compilation by HotSpot.

J2SE 1.4 : See here, for a Sun overview of performance improvements between 1.3 and 1.4 versions.

Java SE 5.0 : Class Data Sharing^[17]

Java SE 6 :

Split bytecode verification
Escape analysis and lock coarsening
Register allocation Improvements

Other improvements:

Java OpenGL Java 2D pipeline speed improvements^[18]
Java 2D performance has also improved significantly in Java 6^[19]

See also 'Sun overview of performance improvements between Java 5 and Java 6'.^[20]

Java SE 6 Update 10

Java Quick Starter reduces application start-up time by preloading part of JRE data at OS startup on disk cache.^[21]
Parts of the platform that are necessary to execute an application accessed from the web when JRE is not installed are now downloaded first. The entire JRE is 12 MB, a typical Swing application only needs to download 4 MB to start. The remaining parts are then downloaded in the background.^[22]
Graphics performance on Windows improved by extensively using Direct3D by default,^[23] and use Shaders on GPU to accelerate complex Java 2D operations.^[24]

Future improvements

Future performance improvements are planned for an update of Java 6 or Java 7:^[25]

Provide JVM support for dynamic languages, following the prototyping work currently done on the Multi Language Virtual Machine,^[26]
Enhance the existing concurrency library by managing parallel computing on multi-core processors,^[27]^[28]
Allow the virtual machine to use both the Client and Server compilers in the same session with a technique called Tiered compilation:^[29]
- The Client would be used at startup (because it is good at startup and for small applications),
- The Server would be used for long-term running of the application (because it outperforms the Client compiler for this).
Replace the existing concurrent low-pause garbage collector (also called CMS or Concurrent Mark-Sweep collector) by a new collector called G1 (or Garbage First) to ensure consistent pauses over time^[30]^[31].

Comparison to other languages

Java is often Just-in-time compiled at runtime by the Java Virtual Machine, but may also be compiled ahead-of-time, just like C++. When Just-in-time compiled, its performance is generally: [4]

1-4 times slower than compiled languages as C or C++
close to other Just-in-time compiled languages such as C#,
much higher than languages without an effective native-code compiler (JIT or AOT), such as Perl, Ruby, PHP and Python.

Program speed

Java is in some cases equal to C++ on low-level and numeric benchmarks^[32]. Some comments must be made on the reasonable interpretation of benchmark data. This previously quoted source is interactive and can also be used to compare C++. That however does not show you what a typical C++ OO style implementation will do. Some Java benchmarks run as fast as C++ benchmarks on the 'same' task. On average it is recommend for any reader to examine the data for themselves, as the various people who have written this article read the referenced sources and claim the data mean different things. Thus while it is true that "Java is in some cases", the question whether or not that is a meaningful statement is still open. (Previously in this paragraph the word 'same' was quoted, this was to warn the non-expert reader that 'same' may in this context not have the usual dictionary meaning. It is a decidedly non trivial task to demonstrate that two programs do in fact do the same task. It is particularly challenging to say when they start and finish, if one has not yet taken out the garbage and the task is not long enough to force at least some garbage removal.)

It must also be said that benchmarks often measure performance for small numerically-intensive programs. In some real-life programs, Java out-performs C, and often (ref required for often) there is no performance difference at all. One example is the benchmark of Jake2 (a clone of Quake 2 written in Java by translating the original GPL C code). The Java 5.0 version performs better in some hardware configurations than its C counterpart^[33]. While it's not specified how the data was measured (for example if the original Quake 2 executable compiled in 1997 was used, which may be considered bad as current C compilers may achieve better optimizations for Quake), it notes how the same Java source code can have a huge speed boost just by updating the VM, something impossible to achieve with a 100% static approach.

Also some optimizations that are possible in Java and similar languages are not possible in C++:^[34]

C-style pointers make optimization hard in languages that support them,
The use of escape analysis techniques is limited in C++ for example, because the compiler does not know where an object will be used as accurately (also because of pointers).

Likewise, since C/C++ allows direct access to the target CPU via inline assembly, processor specific optimization is not possible in Java without using the Java Native Interface. Nor can memory be accessed directly in Java, which excludes it from obtaining direct access to system level functionality and optimizing performance via function pointers.

However, results for microbenchmarks between Java and C++ highly depend on which operations are compared. For example, when comparing with Java 5.0:

32 and 64 bits arithmetics operations,^[35]^[36] File I/O^[37] and Exception handling,^[38] have a similar performance to comparable C++ programs
Collections,^[39]^[40] Heap allocation, as well as method calls^[41] are much better in Java than in C++.
Arrays^[42] operations performance are better in C.
Trigonometric functions performance is much better in C.^[43]

It is again worth noting that the accuracy of some benchmarking, is at least questionable, in that the speed of various versions of Java vary by a factor of 8. It is also of note that it is not the latest supposedly 'fastest' versions of Java that test as fastest.

Startup time

Java startup time is often much slower than for C or C++, because a lot of classes (and first of all classes from the platform Class libraries) must be loaded before being used.

If compared to similar popular runtimes, for small programs running on a Windows machine, the startup time appears to be similar to Mono's and slightly worse than .Net's.

It seems that much of the startup time is due to IO-bound operations rather than JVM initialization or class loading (the rt.jar class data file alone is 40 MB and the JVM must seek a lot of data in this huge file).^[21] Some tests showed that although the new Split bytecode verification technique improved class loading by roughly 40%, it only translated to about 5% startup improvement for large programs.^[44]

Albeit a small improvement it is more visible in small programs that perform a simple operation and then exit, because the Java platform data loading can represent many times the load of the actual program's operation.

Beginning with Java SE 6 Update 10, the Sun JRE comes with a Quick Starter that preloads class data at OS startup to get data from the disk cache rather than from the disk.

Excelsior JET approaches the problem from the other side. Its Startup Optimizer reduces the amount of data that must be read from the disk on application startup, and makes the reads more sequential.

Memory usage

Java memory usage is heavier than for C++, because:

there is a 8-byte overhead for each object^[45] and 12-byte for each array^[46] in Java (32-bit; twice as much in 64-bit java). If size of an object is not a multiple of 8 bytes, it is rounded up to next multiple of 8. This means an object containing a single byte field occupies 16 bytes and requires 4-byte reference. However, C++ also allocates a pointer (usually 4 or 8 bytes) for every object that declares virtual functions.^[47]
parts of the Java Library must be loaded prior to the program execution (at least the classes that are used "under the hood" by the program)^[48]. This leads to a significant memory overhead for small applications when compared to its best known competitors Mono or .Net.

both the Java binary and native recompilations will typically both be in memory
the virtual machine itself consumes memory.
in Java, a composite object (class A which uses instances of B and C) is created using references to allocated instances of B and C. In C++ the cost of the references can be avoided.
lack of address arithmetic makes creating memory-efficient containers, such as tightly spaced structures and XOR linked lists, impossible.

However, it can be difficult to strictly compare the impact on memory of using Java versus C++. Some reasons why are:

In C++, memory deallocation happens synchronously. In contrast, Java can do the deallocation asynchronously, possibly when the program is otherwise idle. A program that has periods of inactivity might perform better with Java because it is not deallocating memory during its active phase. However, memory deallocation is very fast operation, especially in C++, where containers use allocators at a great extent. Thus garbage collector advantages are at best negligible.^[49]
In C++, there can be a question of which part of the code "owns" an object, and is therefore responsible for deallocating it. For example, a container of objects might make copies of objects inserted into it, relying on the calling code to free its own copy, or it might insert the original object, creating an ambiguity of whether the calling code is handing the object off to the container (in which case the container should free the object when it is removed) or asking the container only to remember the object (in which case the calling code, not the container, will free the object later). For example, the C++ standard containers (in the STL) make copies of inserted objects.^[50] In Java, none of this is necessary because neither the calling code nor the container "owns" the object. So while the memory needed for a single object can be heaver than in C++, actual Java programs may create fewer objects, depending on the memory strategies of the C++ code, and if so, the time required for creating, copying, and deleting these objects is also not present in a Java program.

The consequences of these and other differences are highly dependent on the algorithms involved, the actual implementations of the memory allocation systems (free, delete, or the garbage collector), and the specific hardware.

As a result, for applications in which memory is a critical factor of choosing between languages, a deep analysis is required.

Trigonometric functions

Performance of trigonometric functions can be bad compared to C, because Java has strict specifications for the results of mathematical operations, which may not correspond to the underlying hardware implementation.^[51] On the x87, Java since 1.4 does argument reduction for sin and cos in software.^[52], causing a big performance hit for values outside the range.^[53]

Java Native Interface

The Java Native Interface has a high overhead associated with it, making it costly to cross the boundary between code running on the JVM and native code.^[54]^[55] Java Native Access (JNA) provides Java programs easy access to native shared libraries (DLLs on Windows) without writing anything but Java code—no JNI or native code is required. This functionality is comparable to Windows' Platform/Invoke and Python's ctypes. Access is dynamic at runtime without code generation. But it comes with a cost and JNA is usually slower than JNI.^[56]

User interface

Swing has been perceived as slower than native widget toolkits, because it delegates the rendering of widgets to the pure Java Java 2D API. However, benchmarks comparing the performance of Swing versus the Standard Widget Toolkit, which delegates the rendering to the native GUI libraries of the operating system, show no clear winner, and the results greatly depend on the context and the environments^[57].

Use for high performance computing

Recent independent studies seem to show that Java performance for high performance computing (HPC) is similar to Fortran on computation intensive benchmarks, but that JVMs still have scalability issues for performing intensive communication on a Grid Network.^[58]

However, high performance computing applications written in Java have recently won benchmark competitions. In 2008, Apache Hadoop, an open-source high performance computing project written in Java was able to sort a terabyte of integers the fastest, although the hardware setup in the benchmark competition was not fixed and the Hadoop cluster had more than 9 times as many processing cores as the next closest contestant.^[59] ^[60]

In programming contests

As Java solutions run slower than solutions in other compiled languages^[61]^[62], to be fair to contestants using Java, online judge administrators typically increase time limits significantly for Java solutions, by factor of two or more, e.g.^[63]^[64]^[65] )

Notes

^ ^a ^b "Symantec's Just-In-Time Java Compiler To Be Integrated Into Sun JDK 1.1".
^ "Apple Licenses Symantec's Just In Time (JIT) Compiler To Accelerate Mac OS Runtime For Java".
^ ^a ^b "Java gets four times faster with new Symantec just-in-time compiler".
^ http://www.shudo.net/jit/perf/
^ Kawaguchi, Kohsuke (2008-03-30). "Deep dive into assembly code from Java". Retrieved 2008-04-02.
^ "Fast, Effective Code Generation in a Just-In-Time Java Compiler" (PDF). Intel Corporation. Retrieved 2007-06-22.
^ This article shows that the performance gain between interpreted mode and Hotspot amounts to more than a factor of 10.
^ "The Java HotSpot Virtual Machine, v1.4.1". Sun Microsystems. Retrieved 2008-04-20.
^ Nutter, Charles (2008-01-28). "Lang.NET 2008: Day 1 Thoughts". Retrieved 2008-04-20. Deoptimization is very exciting when dealing with performance concerns, since it means you can make much more aggressive optimizations...knowing you'll be able to fall back on a tried and true safe path later on
^ IBM DeveleporWorks Library
^ For example, the duration of pauses is less noticeable now. See for example this clone of Quake 2 written in Java: Jake2.
^ New Java SE 6 Feature: Type Checking Verifier at java.net
^ Bug report: new register allocator, fixed in Mustang (JDK 6) b59
^ Mustang's HotSpot Client gets 58% faster! in Osvaldo Pinali Doederlein's Blog at java.net
^ Class Data Sharing at java.sun.com
^ Class Data Sharing in JDK 1.5.0 in Java Buzz Forum at artima developer
^ Sun overview of performance improvements between 1.4 and 5.0 versions.
^ STR-Crazier: Performance Improvements in Mustang in Chris Campbell's Blog at java.net
^ See here for a benchmark showing an approximately 60% performance boost from Java 5.0 to 6 for the application JFreeChart
^ Java SE 6 Performance White Paper at http://java.sun.com
^ ^a ^b Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27. At the OS level, all of these megabytes have to be read from disk, which is a very slow operation. Actually, it's the seek time of the disk that's the killer; reading large files sequentially is relatively fast, but seeking the bits that we actually need is not. So even though we only need a small fraction of the data in these large files for any particular application, the fact that we're seeking all over within the files means that there is plenty of disk activity.
^ Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.
^ Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.
^ Campbell, Chris (2007-04-07). "Faster Java 2D Via Shaders". Retrieved 2008-04-26.
^ Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.
^ "JSR 292: Supporting Dynamically Typed Languages on the Java Platform". jcp.org. Retrieved 2008-05-28.
^ Goetz, Brian (2008-03-04). "Java theory and practice: Stick a fork in it, Part 2". Retrieved 2008-03-09.
^ Lorimer, R.J. (2008-03-21). "Parallelism with Fork/Join in Java 7". infoq.com. Retrieved 2008-05-28.
^ "New Compiler Optimizations in the Java HotSpot Virtual Machine" (PDF). Sun Microsystems. May 2006. Retrieved 2008-05-30.
^ Humble, Charles (2008-05-13). "JavaOne: Garbage First". infoq.com. Retrieved 2008-09-07.
^ Coward, Danny (2008-11-12). "Java VM: Trying a new Garbage Collector for JDK 7". Retrieved 2008-11-15.
^ Computer Language Benchmarks Game
^ : 260/250 frame/s versus 245 frame/s (see benchmark)
^ Lewis, J.P. "Performance of Java versus C++". Computer Graphics and Immersive Technology Lab, University of Southern California. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ "Microbenchmarking C++, C#, and Java: 32-bit integer arithmetic". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: 64-bit double arithmetic". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: File I/O". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: Exception". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: Single Hash Map". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: Multiple Hash Map". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: Object creation/ destruction and method call". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: Array". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "Microbenchmarking C++, C#, and Java: Trigonometric functions". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.
^ "How fast is the new verifier?". 2006-02-07. Retrieved 2007-05-09.
^ http://java.sun.com/docs/books/performance/1st_edition/html/JPRAMFootprint.fm.html#24456
^ http://www.javamex.com/tutorials/memory/object_memory_usage.shtml
^ http://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=195
^ http://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html
^ http://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt04ch11.html#id462480
^ http://www.sgi.com/tech/stl/Container.html
^ "Math (Java Platform SE 6)". Sun Microsystems. Retrieved 2008-06-08.
^ Gosling, James (2005-07-27). "Transcendental Meditation". Retrieved 2008-06-08.
^ W. Cowell-Shah, Christopher (2004-01-08). "Nine Language Performance Round-up: Benchmarking Math & File I/O". Retrieved 2008-06-08.
^ Wilson, Steve (2001). "JavaTM Platform Performance: Using Native Code". Sun Microsystems. Retrieved 2008-02-15. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ Kurzyniec, Dawid. "Efficient Cooperation between Java and Native Codes - JNI Performance Benchmark" (PDF). Retrieved 2008-02-15. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
^ "How does JNA performance compare to custom JNI?". Sun Microsystems. Retrieved 2009-12-26.
^ Igor, Križnar (2005-05-10). "SWT Vs. Swing Performance Comparison" (PDF). cosylab.com. Retrieved 2008-05-24. It is hard to give a rule-of-thumb where SWT would outperform Swing, or vice versa. In some environments (e.g., Windows), SWT is a winner. In others (Linux, VMware hosting Windows), Swing and its redraw optimization outperform SWT significantly. Differences in performance are significant: factors of 2 and more are common, in either direction {{cite web}}: Check date values in: |date= (help)
^ Brian Amedro, Vladimir Bodnartchouk, Denis Caromel, Christian Delbe, Fabrice Huet, Guillermo L. Taboada (August 2008). "Current State of Java for HPC". INRIA. Retrieved 2008-09-04. We first perform some micro benchmarks for various JVMs, showing the overall good performance for basic arithmetic operations(...). Comparing this implementation with a Fortran/MPI one, we show that they have similar performance on computation intensive benchmarks, but still have scalability issues when performing intensive communications. {{cite web}}: Check date values in: |date= (help)CS1 maint: multiple names: authors list (link)
^ Owen O'Malley - Yahoo! Grid Computing Team (July 2008). "Apache Hadoop Wins Terabyte Sort Benchmark". Retrieved 2008-12-21. This is the first time that either a Java or an open source program has won.
^ Chris Nyberg and Mehul Shah. "Sort Benchmark Home Page".
^ http://topcoder.com/home/tco10/2010/06/08/algorithms-problem-writing/
^ http://acm.timus.ru/help.aspx?topic=java&locale=en
^ http://acm.pku.edu.cn/JudgeOnline/faq.htm#q11
^ http://acm.tju.edu.cn/toj/faq.html#qj
^ http://m-judge.maximum.vc/faq.cgi#java

External links

[symantec.com-1] "Symantec's Just-In-Time Java Compiler To Be Integrated Into Sun JDK 1.1".

[2] "Apple Licenses Symantec's Just In Time (JIT) Compiler To Accelerate Mac OS Runtime For Java".

[infoworld.com-3] "Java gets four times faster with new Symantec just-in-time compiler".

[4] ttp://www.shudo.net/jit/perf/

[5] Kawaguchi, Kohsuke (2008-03-30). "Deep dive into assembly code from Java". Retrieved 2008-04-02.

[6] "Fast, Effective Code Generation in a Just-In-Time Java Compiler" (PDF). Intel Corporation. Retrieved 2007-06-22.

[7] This article shows that the performance gain between interpreted mode and Hotspot amounts to more than a factor of 10.

[8] "The Java HotSpot Virtual Machine, v1.4.1". Sun Microsystems. Retrieved 2008-04-20.

[9] Nutter, Charles (2008-01-28). "Lang.NET 2008: Day 1 Thoughts". Retrieved 2008-04-20. Deoptimization is very exciting when dealing with performance concerns, since it means you can make much more aggressive optimizations...knowing you'll be able to fall back on a tried and true safe path later on

[10] IBM DeveleporWorks Library

[11] For example, the duration of pauses is less noticeable now. See for example this clone of Quake 2 written in Java: Jake2.

[12] New Java SE 6 Feature: Type Checking Verifier at java.net

[13] Bug report: new register allocator, fixed in Mustang (JDK 6) b59

[14] Mustang's HotSpot Client gets 58% faster! in Osvaldo Pinali Doederlein's Blog at java.net

[15] Class Data Sharing at java.sun.com

[16] Class Data Sharing in JDK 1.5.0 in Java Buzz Forum at artima developer

[17] Sun overview of performance improvements between 1.4 and 5.0 versions.

[18] STR-Crazier: Performance Improvements in Mustang in Chris Campbell's Blog at java.net

[19] See here for a benchmark showing an approximately 60% performance boost from Java 5.0 to 6 for the application JFreeChart

[20] Java SE 6 Performance White Paper at http://java.sun.com

[jrecache-21] Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27. At the OS level, all of these megabytes have to be read from disk, which is a very slow operation. Actually, it's the seek time of the disk that's the killer; reading large files sequentially is relatively fast, but seeking the bits that we actually need is not. So even though we only need a small fraction of the data in these large files for any particular application, the fact that we're seeking all over within the files means that there is plenty of disk activity.

[22] Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.

[23] Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.

[24] Campbell, Chris (2007-04-07). "Faster Java 2D Via Shaders". Retrieved 2008-04-26.

[25] Haase, Chet (May 2007). "Consumer JRE: Leaner, Meaner Java Technology". Sun Microsystems. Retrieved 2007-07-27.

[26] "JSR 292: Supporting Dynamically Typed Languages on the Java Platform". jcp.org. Retrieved 2008-05-28.

[27] Goetz, Brian (2008-03-04). "Java theory and practice: Stick a fork in it, Part 2". Retrieved 2008-03-09.

[28] Lorimer, R.J. (2008-03-21). "Parallelism with Fork/Join in Java 7". infoq.com. Retrieved 2008-05-28.

[29] "New Compiler Optimizations in the Java HotSpot Virtual Machine" (PDF). Sun Microsystems. May 2006. Retrieved 2008-05-30.

[30] Humble, Charles (2008-05-13). "JavaOne: Garbage First". infoq.com. Retrieved 2008-09-07.

[31] Coward, Danny (2008-11-12). "Java VM: Trying a new Garbage Collector for JDK 7". Retrieved 2008-11-15.

[32] Computer Language Benchmarks Game

[33] : 260/250 frame/s versus 245 frame/s (see benchmark)

[idiom-34] Lewis, J.P. "Performance of Java versus C++". Computer Graphics and Immersive Technology Lab, University of Southern California. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[35] "Microbenchmarking C++, C#, and Java: 32-bit integer arithmetic". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[36] "Microbenchmarking C++, C#, and Java: 64-bit double arithmetic". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[37] "Microbenchmarking C++, C#, and Java: File I/O". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[38] "Microbenchmarking C++, C#, and Java: Exception". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[39] "Microbenchmarking C++, C#, and Java: Single Hash Map". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[40] "Microbenchmarking C++, C#, and Java: Multiple Hash Map". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[41] "Microbenchmarking C++, C#, and Java: Object creation/ destruction and method call". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[42] "Microbenchmarking C++, C#, and Java: Array". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[43] "Microbenchmarking C++, C#, and Java: Trigonometric functions". Dr. Dobb's Journal. 2005-07-01. Retrieved 2007-11-17.

[44] "How fast is the new verifier?". 2006-02-07. Retrieved 2007-05-09.

[45] ttp://java.sun.com/docs/books/performance/1st_edition/html/JPRAMFootprint.fm.html#24456

[46] ttp://www.javamex.com/tutorials/memory/object_memory_usage.shtml

[47] ttp://www.informit.com/guides/content.aspx?g=cplusplus&seqNum=195

[48] ttp://www.tommti-systems.de/go.html?http://www.tommti-systems.de/main-Dateien/reviews/languages/benchmarks.html

[49] ttp://gcc.gnu.org/onlinedocs/libstdc++/manual/bk01pt04ch11.html#id462480

[50] ttp://www.sgi.com/tech/stl/Container.html

[51] "Math (Java Platform SE 6)". Sun Microsystems. Retrieved 2008-06-08.

[52] Gosling, James (2005-07-27). "Transcendental Meditation". Retrieved 2008-06-08.

[53] W. Cowell-Shah, Christopher (2004-01-08). "Nine Language Performance Round-up: Benchmarking Math & File I/O". Retrieved 2008-06-08.

[54] Wilson, Steve (2001). "JavaTM Platform Performance: Using Native Code". Sun Microsystems. Retrieved 2008-02-15. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[55] Kurzyniec, Dawid. "Efficient Cooperation between Java and Native Codes - JNI Performance Benchmark" (PDF). Retrieved 2008-02-15. {{cite web}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[56] "How does JNA performance compare to custom JNI?". Sun Microsystems. Retrieved 2009-12-26.

[57] Igor, Križnar (2005-05-10). "SWT Vs. Swing Performance Comparison" (PDF). cosylab.com. Retrieved 2008-05-24. It is hard to give a rule-of-thumb where SWT would outperform Swing, or vice versa. In some environments (e.g., Windows), SWT is a winner. In others (Linux, VMware hosting Windows), Swing and its redraw optimization outperform SWT significantly. Differences in performance are significant: factors of 2 and more are common, in either direction {{cite web}}: Check date values in: |date= (help)

[58] Brian Amedro, Vladimir Bodnartchouk, Denis Caromel, Christian Delbe, Fabrice Huet, Guillermo L. Taboada (August 2008). "Current State of Java for HPC". INRIA. Retrieved 2008-09-04. We first perform some micro benchmarks for various JVMs, showing the overall good performance for basic arithmetic operations(...). Comparing this implementation with a Fortran/MPI one, we show that they have similar performance on computation intensive benchmarks, but still have scalability issues when performing intensive communications. {{cite web}}: Check date values in: |date= (help)CS1 maint: multiple names: authors list (link)

[59] Owen O'Malley - Yahoo! Grid Computing Team (July 2008). "Apache Hadoop Wins Terabyte Sort Benchmark". Retrieved 2008-12-21. This is the first time that either a Java or an open source program has won.

[60] Chris Nyberg and Mehul Shah. "Sort Benchmark Home Page".

[61] ttp://topcoder.com/home/tco10/2010/06/08/algorithms-problem-writing/

[62] ttp://acm.timus.ru/help.aspx?topic=java&locale=en

[63] ttp://acm.pku.edu.cn/JudgeOnline/faq.htm#q11

[64] ttp://acm.tju.edu.cn/toj/faq.html#qj

[65] ttp://m-judge.maximum.vc/faq.cgi#java

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

@@ Line 31: / Line 31: @@
 | publisher=[[Intel Corporation]]
 | accessdate=2007-06-22}}</ref>
-Some benchmarks show a 10-fold speed gain from this technique.<ref>This [http://www.shudo.net/jit/perf/ article] shows that the performance gain between interpreted mode and Hotspot amounts to more than a factor of 10.</ref> However, due to time constraints, the compiler cannot fully optimize the program, and therefore the resulting program is slower than native code alternatives.<ref>[http://www.jelovic.com/articles/why_java_is_slow.htm]</ref>
+Some benchmarks show a 10-fold speed gain from this technique.<ref>This [http://www.shudo.net/jit/perf/ article] shows that the performance gain between interpreted mode and Hotspot amounts to more than a factor of 10.</ref> Time constraints may prevent the JIT compiler from performing certain optimisations, but other opimisations, such as lock-elision, may be enabled by the run-time information available to a JIT compiler; such information would not be available to a non-profiling ahead-of-time compiler.
 ===Adaptive optimization===