Memory locality - Objects are allocated on the heap with little or no consideration for locality. While this approach may be appropriate for uniprocessors or small-scale SMPs, it is unlikely to work well on a cluster of workstations where remote memory access is one or two orders of magnitude slower than local memory access.
Parallel garbage collection - Garbage collection can consume a considerable amount of application time. Typically, JVMs employ "stop-the-world" garbage collectors, where program threads are halted during garbage collection. This approach will not work for large numbers of processors, for two reasons. First, the cost of "stopping the world" is considerably higher when the number of processors is large. Second, using a single thread to collect garbage results in an unacceptably large sequential fraction for any application.
Memory consistency model - To achieve scaling performance on a large number of processors, it is important to exploit the "relaxed" Java Memory Model. Presently no JVM implements the JMM faithfully, and indeed many implement it incorrectly, leading to lack of coherence and loss of optimization opportunities. The specification of the JMM was also revised in 2007.
Efficient threads and synchronization - With a large number of processors, it is critical to provide efficient threading support and synchronization mechanisms that scale well.