|Max. CPU clock rate||1.10 GHz to 1.35 GHz|
|Instruction set||SPARC V9|
The servers series are the SPARC64 V+, VI, VI+, VII, VII+, X, and X+. The SPARC64 VI and its successors up to the VII+ were used in the Fujitsu and Sun (later Oracle) SPARC Enterprise M-Series servers. In addition to servers, a version of the the SPARC64 VII was also used in the commercially available Fujitsu FX1 supercomputer. As of July 2016, the SPARC64 X+ is the latest server processor, and it is used in the Fujitsu and Oracle M10 servers.
The supercomputer series was based on the SPARC64 VII, and are the SPARC64 VIIfx, IXfx, and XIfx. The SPARC64 VIIIfx was used in the K computer, and the SPARC64 IXfx in the commercially available PRIMEHPC FX10. As of July 2016, the SPARC64 XIfx is the latest supercomputer processor, and it is used in the Fujitsu PRIMEHPC FX100 supercomputer.
- 1 History
- 2 Description
- 3 SPARC64 V+
- 4 SPARC64 VI
- 5 SPARC64 VII
- 6 SPARC64 VII+
- 7 SPARC64 X
- 8 SPARC64 X+
- 9 HPC-specialized processors (fx)
- 10 See also
- 11 Notes
- 12 References
- 13 External links
In the late 1990s, HAL Computer Systems, a subsidiary of Fujitsu, was designing a successor to the SPARC64 GP also called the SPARC64 V. This would have been a complex design with a very wide superscalar organization with superspeculation, an L1 instruction trace cache, a small but very fast 8 KB L1 data cache, and separate L2 caches for instructions and data. It was designed in Fujitsu's CS85 process, a 0.17 μm CMOS process with six levels of copper interconnect; and would have consisted of 65 million transistors on a 380 mm2 die. It was canceled in mid-2001 when HAL was closed by Fujitsu and replaced by a Fujitsu design.
The first Fujitsu SPARC64 Vs were fabricated in December 2001. They operated at 1.1 to 1.35 GHz. Fujitsu's 2003 SPARC64 roadmap showed that the company planned a 1.62 GHz version for release in late 2003 or early 2004, but it was canceled in favor of the SPARC64 V+. The SPARC64 V was used by Fujitsu in their PRIMEPOWER servers.
The SPARC64 V was presented at Microprocessor Forum 2002 by Aiichiro Inoue, the director of the Processor Development Division of the Development Department at Fujitsu. At introduction, it had the highest clock frequency of both SPARC implementations and 64-bit server microprocessors in production; and the highest SPEC rating of any SPARC implementation.
The SPARC64 V fetches up to eight instructions from the instruction cache during the first stage and places them into a 48-entry instruction buffer. In the next stage, four instructions are taken from this buffer, decoded and issued to the appropriate reserve stations. The SPARC64 V has six reserve stations, two that serve the integer units, one for the address generators, two for the floating-point units, and one for branch instructions. Each integer, address generator and floating-point unit has an eight-entry reserve station. Each reserve station can dispatch an instruction to its execution unit. Which instruction is dispatched firstly depends on operand availability and then its age. Older instructions are given higher priority than newer ones. The reserve stations can dispatch instructions speculatively (speculative dispatch). That is, instructions can be dispatched to the execution units even when their operands are not yet available but will be when execution begins. During stage six, up to six instructions are be dispatched.
The register files are read during stage seven. The SPARC architecture has separate register files for integer and floating-point instructions. The integer register file has eight register windows. The JWR contains 64 entries and has eight read ports and two write ports. The JWR contains a subset of the eight register windows, the previous, current and next register windows. Its purpose is reduce the size of register file so that the microprocessor can operate at higher clock frequencies. The floating-point register file contains 64 entries and has six read ports and two write ports.
Execution begins during stage nine. There are six execution units, two for integer, two for loads and stores, and two for floating-point. The two integer execution units are designated EXA and EXB. Both have an arithmetic logic unit (ALU) and a shift unit, but only EXA has multiply and divide units. Loads and stores are executed by two address generators (AGs) designated AGA and AGB. These are simple ALUs used to calculate virtual addresses.
The two floating-point units (FPUs) are designated FLA and FLB. Each FPU contains an adder and a multiplier, but only FLA has a graphics unit attached. They execute add, subtract, multiply, divide, square root and multiply–add instructions. Unlike its successor SPARC64 VI, the SPARC64 V performs the multiply–add with separate multiplication and addition operations, thus with up to two rounding errors. The graphics unit executes Visual Instruction Set (VIS) instructions, a set of single instruction, multiple data (SIMD) instructions. All instructions are pipelined except for divide and square root, which are executed using iterative algorithms. The FMA instruction is implemented by reading three operands from the operand register, multiplying two of the operands, forwarding the result and the third operand to the adder, and adding them to produce the final result.
Results from the execution units and loads are not written to the register file. To maintain program order, they are written to update buffers, where they reside until committed. The SPARC64 V has separate update buffers for integer and floating-point units. Both have 32 entries each. The integer register has eight read ports and four write ports. Half of the write ports are used for results from the integer execution units and the other half by data returned by loads. The floating-point update buffer has six read ports and four write ports.
Commit takes place during stage ten at the earliest. The SPARC64 V can commit up to four instructions per cycle. During stage eleven, results are written to the register file, where it becomes visible to software.
The SPARC64 V has two-level cache hierarchy. The first level consists of two caches, an instruction cache and a data cache. The second level consists of an on-die unified cache.
The level 1 (L1) caches each have a capacity of 128 KB. They are both two-way set associative and have 64-byte line size. They are virtually indexed and physically tagged. The instruction cache is accessed via a 256-bit bus. The data cache is accessed with two 128-bit buses. The data cache consists of eight banks separated by 32-bit boundaries. It uses a write-back policy. The data cache writes to the L2 cache with its own 128-bit unidirectional bus.
The second level cache has a capacity of 1 or 2 MB and the set associativity depends on the capacity.
The microprocessor has a 128-bit system bus that operates at 260 MHz. The bus can operate in two modes, single-data rate (SDR) or double-data (DDR) rate, yielding a peak bandwidth of 4.16 or 8.32 GB/s, respectively.
The SPARC64 V consisted of 191 million transistors, of which 19 million are contained in logic circuits. It was fabricated by unnamed foundry in a 0.13 µm, eight-layer copper metallization, complementary metal–oxide–semiconductor (CMOS) silicon on insulator (SOI) process. The die measured 18.14 mm by 15.99 mm for a die area of 290 mm2.
At 1.3 GHz, the SPARC64 V has a power dissipation of 34.7 W. The Fujitsu PrimePower servers that use the SPARC64 V supply a slightly higher voltage the microprocessor to enable it to operate at 1.35 GHz. The increased power supply voltage and operating frequency increased the power dissipation to ~45 W.
|Max. CPU clock rate||1.65 GHz to 2.16 GHz|
|Instruction set||SPARC V9|
The SPARC64 V+, code-named "Olympus-B", is a further development of the SPARC64 V. Improvements over the SPARC64 V included higher clock frequencies of 1.82–2.16 GHz and a larger 3 or 4 MB L2 cache.
The first SPAR64C V+, a 1.89 GHz version, was shipped in September 2004 in the Fujitsu PrimePower 650 and 850. In December 2004, a 1.82 GHz version was shipped in the PrimePower 2500. These versions have a 3 MB L2 cache. In February 2006, four versions were introduced: 1.65 and 1.98 GHz versions with 3 MB L2 caches shipped in the PrimePower 250 and 450; and 2.08 and 2.16 GHz versions with 4 MB L2 caches shipped in mid-range and high-end models.
|L1 cache||128 KB per core|
|L2 cache||4–6 MB per core|
|Transistors||90 nm transistors|
The SPARC64 VI, code-named Olympus-C, is a two-core processor (the first multi-core SPARC64 processor) which succeeded the SPARC64 V+. It is fabricated by Fujitsu in a 90 nm, 10-layer copper, CMOS silicon on insulator (SOI) process, which enabled two cores and an L2 cache to be integrated on a die. Each core is a modified SPARC64 V+ processor. One of the main improvements is the addition of two-way coarse-grained multi-threading (CMT), which Fujitsu called vertical multi-threading (VMT). In CMT, which thread is executed is determined by time-sharing, or if the thread is executing a long-latency operation, then execution is switched to the other thread. The addition of CMT required duplication of the program counter and the control, integer, and floating-point registers so there is one set of each for every thread. A floating-point fused multiply-add (FMA) instruction was also added, the first SPARC processor to do so.
The cores share a 6 MB on-die unified L2 cache. The L2 cache is 12-way set associative and has 256-byte lines. The cache is accessed via two unidirectional buses, a 256-bit read bus and a 128-bit write bus. The SPARC64 VI has a new system bus, the Jupiter Bus. The SPARC64 VI consisted of 540 million transistors. The die measures 20.38 mm by 20.67 mm (421.25 mm2).
The SPARC64 VI was originally to have been introduced in mid-2004 in Fujitsu's PrimePower servers. Development of the PrimerPowers were canceled after Fujitsu and Sun Microsystems announced in June 2004 that they would collaborate on new servers called the Advanced Product Line (APL_. These servers were scheduled to be introduced in mid-2006, but were delayed until April 2007, when they were introduced as the SPARC Enterprise. The SPARC64 VI processors featured in the SPARC Enterprise at its announcement were a 2.15 GHz version with a 5 MB L2 cache, and 2.28 and 2.4 GHz versions with 6 MB L2 caches.
The SPARC64 VII (previously called the SPARC64 VI+), code-named Jupiter, is a further development of the SPARC64 VI announced in July 2008. It is a quad-core microprocessor. Each core is capable of two-way simultaneous multithreading (SMT), which replaces two-way coarse-grained multithreading, termed vertical multithreading (VMT) by Fujitsu. Thus, it can execute eight threads simultaneously. Other changes include more RAS features. The integer register file is now protected by ECC, and the number of error checkers has been increased to around 3,400. It consists of 600 million transistors, is 21.31 × 20.86 mm (444.63 mm2) large and is fabricated by Fujitsu in its 65 nm CMOS, copper interconnect process.
The SPARC64 VII was featured in the SPARC Enterprise, with the first versions operating at 2.4 or 2.52 GHz. It is socket-compatible with its predecessor, the SPARC64 VI, and is field-upgradeable. SPARC64 VIIs could coexist, whilst operating at their native clock frequency, alongside SPARC64 VIs.
The SPARC64 VII+ (Jupiter-E), referred to as the M3 by Oracle, is a further development of the SPARC64 VII. The clock frequency was increased up to 3 GHz and the L2 cache size was doubled to 12 MB. This version was announced on 2 December 2010 for the high-end SPARC Enterprise M8000 and M9000 servers. These improvements resulted in an approximately 20% increase to overall performance. A 2.66 GHz version was for mid-range M4000 and M5000 models. On 12 April 2011, a 2.86 GHz version with two or four cores and a 5.5 MB L2 cache was announced for the low-end M3000. The VII+ is socket-compatible with its predecessor, the VII. Existing high-end SPARC Enterprise M-Series servers are able to upgrade to the VII+ processors in the field.
The SPARC64 X is a 16-core server microprocessor announced in 2012 and used in Fujitsu's M10 servers (which are also marketed by Oracle). The SPARC64 X is based on the SPARC64 VII+ with significant enhancements to its core and chip organization. The cores were improved by the inclusion of a pattern history table for branch prediction, load address speculation, more execution units, support for the HPC-ACE extension (originally from the SPARC64 VIIIfx) and IEEE 754-2008 decimal floating-point numbers, deeper pipeline for a 3.0 GHz clock frequency, and accelerators for cryptography and database functions. The 16 cores share a unified, 24 MB, 24-way set-associative L2 cache. Chip organization improvements include four integrated DDR3 SDRAM memory controllers, glueless four-way symmetrical multiprocessing, ten SERDES channels for symmetrical multiprocessing scalability to 64 sockets, and two integrated PCI Express 3.0 controllers. The SPARC64 X contains 2.95 billion transistors, measures 23.5 mm by 25 mm (637.5 mm2), and is fabricated in a 28 nm CMOS process with copper interconnects.
The SPARC64 X+ is an enhanced SPARC64 X processor announced in 2013. It features minor improvements to the core organization, and a higher 3.5 GHz clock frequency obtained through better circuit design and layout. It contained 2.99 billion transistors, measured 24 mm by 25 mm (600 mm2), and is fabricated in the same process as the SPARC64 X. On 8 April 2014, 3.7 GHz speed-binned parts became available in response to the introduction of new Xeon E5 and E7 models by Intel; and the impending introduction of the POWER8 by IBM.
HPC-specialized processors (fx)
These processors are designed by Fujitsu for high-performance computing (HPC) and include a Fujitsu-designed extension to the SPARC V9 architecture called High Performance Computing-Arithmetic Computational Extensions (HPC-ACE).
The SPARC64 VIIIfx, code-named Venus, is an eight-core version of the SPARC64 VII. It includes a memory controller and 760 million transistors. The processor's peak performance is 128 GFLOPS and it is fabricated using Fujitsu's 45 nm process technology.
- Registers: 192 integer, 256 floating point; 8 FP ops, or 4 FMA ops, per cycle; 3 interrupt.
- Physical address range: 41 bits
- Translation lookaside buffer: 16 fetch + 256 4-way store instruction, 512 4-way store data, no victim cache
- Page sizes: 8 KB, 64 KB, 512 KB, 4 MB, 32 KMB, 256 MB, 2 KGB
- Translation storage buffer: Not supported in hardware
- SIMD: Up to two SIMD instructions per cycle. Each SIMD instruction can operate on four single-precision or two double-precision floating-point numbers, for up to eight floating-point operations per cycle. The 128-bit SIMD registers can be used for integer operations as well.
- Eight DDR3 SDRAM memory channels; 64 GB/s peak bandwidth
The K computer is a supercomputer manufactured by Fujitsu and located at the RIKEN Advanced Institute for Computational Science campus in Kobe, Japan. It obtains its performance from 88,128 SPARC64 VIIIfx processors. In June 2011, TOP500 Project Committee announced that the K computer (still incomplete with only 68,544 processors) topped the LINPACK benchmark at 8.162 PFLOPS, realizing 93% of peak performance, making it the fastest supercomputer in the world at that time.
The SPARC64 IXfx is an improved version of the SPARC64 VIIIfx designed by Fujitsu and LSI first revealed in the announcement of the PRIMEHPC FX10 supercomputer on 7 November 2011. It, along with the PRIMEHPC FX10, is a commercialization of the technologies that first appeared in the VIIIfx and K computer. Compared to the VIIIfx, organizational improvements included doubling the number of cores was to 16, doubling the amount of shared L2 cache to 12 MB, and increasing peak DDR3 SDRAM memory bandwidth to 85 GB/s. The IXfx operates at 1.848 GHz, has a peak performance of 236.5 GFLOPS, and consumes 110 W for a power efficiency of more than 2 GFLOPS per watt. It consisted of 1 billion transistors and was implemented in a 40 nm CMOS process with copper interconnects.
Fujitsu introduced the SPARC64 XIfx in August 2014 at the Hot Chips symposium. It is used in the Fujitsu PRIMEHPC FX100 supercomputer, which succeeded the PRIMEHPC FX10. The XIfx operates at 2.2 GHz and has a peak performance of 1.1 TFLOPS. It consists of 3.75 billion transistors and is fabricated by the Taiwan Semiconductor Manufacturing Company in its 20 nm high-κ metal gate (HKMG) process. The Microprocessor Report estimated the die to have an area of 500 mm and a typical power consumption of 200 W.
The XIfx has 34 cores, 32 of which are compute cores used to run user applications, and 2 assistant cores used to run the operating system and other system services. The delegation of user applications and operating system to dedicated cores improves performance by ensuring that the private caches of the compute cores are not shared with or disrupted by non-application instructions and data. The 34 cores are further organized into two Core Memory Groups, each consisting of 16 compute cores and 1 assistant core sharing a 12 MB L2 unified cache. The division of the cores into two Core Memory Groups enabled 34 cores to be integrated on a single die by easing the implementation of cache coherence and avoiding the need for the L2 cache to be shared between 34 cores.
The XIfx cores have an improved organization. The XIfx implements an improved version of the HPC-ACE extensions (HPC-ACE2), which doubled the width of the SIMD units to 256 bits and added new SIMD instructions. Compared to the SPARC64 IXfx, the XIfx has an improvement of a factor of 3.2 for double precision and 6.1 for single precision. To complement the increased width of the SIMD units, the L1 cache bandwidth was increased to 4.4 TB/s. Improvements to the SoC organization include the replacement of the integrated memory controllers with four Hybrid Memory Cube (HMC) interfaces for decreased memory latency and an improved memory bandwidth; and replacement of the ten Tofu interconnect ports with the second-generation Tofu2 interconnect, which has a 25 GB/s full-duplex bandwidth (12.5 GB/s per direction, 125 GB/s for ten ports). The XIfx supports 32 GB of memory. Each CMG has two HMC interfaces, each connected to two HMCs via its own port. The HMCs are 16-lane, 15 Gbit/s per lane versions. Each CMG has 240 GB/s (120 GB/s in and 120 GB/s out) of memory bandwidth. According to the Microprocessor Report, the IXfx was the first processor to use HMCs.
- "Fujitsu Draws Sparc64 Roadmap Past 2010"
- "Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems".
- "Fujitsu-Siemens upgrades PrimePower Unix servers"
- "Fujitsu's SPARC64 V Is Real Deal" p. 1.
- "SPARC64 V Processor For UNIX Server"
- "Fujitsu's SPARC V Is Real Deal", p. 2.
- "SPARC64 VI Extensions" page 56, Fujitsu Limited, Release 1.3, 27 March 2007
- "Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems", p. 4.
- "A 1.3GHz Fifth Generation SPARC64 Microprocessor", p. 702.
- "Fujitsu's SPARC V IS Real Deal", p. 3.
- "A 1.3GHz Fifth Generation SPARC64 Microprocessor", p. 702.
- "A 1.3GHz Fifth Generation SPARC64 Microprocessor", p. 705.
- Morgan 2004
- "Fujitsu-Siemens Cranks the Clock on Sparc V Chips for PrimePowers"
- Fujitsu Limited (27 March 2007). "SPARC64 VI Extensions, Release 1.3". pp. 45–46.
- Morgan 2007
- "SPARC's Still Going Strong", p. 1.
- Morgan 2008
- "Hot Chips: Fujitsu shows off SPARC64 VII"
- "Sun SPARC Enterprise Server Family Architecture: Flexible, Mainframe-Class Compute Power for the Datacenter" (PDF). Sun Microsystems. Retrieved 21 April 2008.
- Morgan 2011
- Fujitsu 2010
- Fujitsu 2011
- "Ellison: Sparc T4 due next year: Sparc64-VII+ clock and cache bumps now". The Register. Retrieved 3 December 2010.
- Halfhill, Tom R. (17 September 2012). "Fujitsu and Oracle Ignite SPARCs". Microprocessor Report.
- Maruyama, Takumi (29 August 2012). "SPARC64 X: Fujitsu's New Geneneration 16 Core Processor for the next generation UNIX servers".
- Gwennap, Linley (7 October 2013). "Fujitsu, Oracle Processors Evolve". Microprocessor Report.
- Yoshida, Toshio (27 August 2013). "SPARC64 X+: Fujitsu's Next Generation Processor for UNIX servers".
- Prickett, Timothy Morgan (8 April 2014). "Oracle Unfolds Sparc Roadmap, Fujitsu boosts SPARC64 X Clocks". EnterpriseTech.
- "Fujitsu unveils world’s fastest CPU". The Inquirer. Retrieved 14 May 2009.
- Takumi Maruyama (2009). SPARC64 VIIIfx: Fujitsu's New Generation Octo Core Processor for PETA Scale computing (PDF). Proceedings of Hot Chips 21. IEEE Computer Society.
- "Japanese supercomputer 'K' is world's fastest". The Telegraph. 20 June 2011. Retrieved 20 June 2011.
- "Japanese ‘K’ Computer Is Ranked Most Powerful". The New York Times. 20 June 2011. Retrieved 20 June 2011.
- "Supercomputer "K computer" Takes First Place in World". Fujitsu. Retrieved 20 June 2011.
- "Supercomputer "K computer" Takes First Place in World". RIKEN. Retrieved 20 June 2011.
- "Japan Reclaims Top Ranking on Latest TOP500 List of World’s Supercomputers", top500.org, retrieved 20 June 2011
- "K computer, SPARC64 VIIIfx 2.0 GHz, Tofu interconnect", top500.org, retrieved 20 June 2011
- Byrne 2011
- Fujitsu Launches PRIMEHPC FX10 Supercomputer
- Morgan, Timothy Prickett (7 November 2011). "Fujitsu readies 23 petaflops Sparc FX10 super beast". The Register.
- Halfhill 2014
- Sparc-Prozessor für 100-Petaflop-Rechner Heise Newsticker, 6 August 2014
- Next Generation PRIMEHPC Fujitsu Ltd., 2014
- Fujitsu guns for faster supercomputers with new chip Agam Shah, PC World, 6 August 2014
- Morgan, Timothy Prickett. "Inside Japan's Future Exascale ARM Supecomputer". The Next Platform. Retrieved 13 July 2016.
- "Hot Chips: Fujitsu shows off SPARC64 VII". (27 August 2008). The H.
- Ando, Hisashige; et al. (November 2003 ). "A 1.3GHz Fifth Generation SPARC64 Microprocessor". IEEE Journal of Solid-State Circuits, Volume 38, Issue 11. pp. 1896–1905.
- Byrne, Joseph (5 December 2011). "Sparc64 IXfx Burns Through FP Code". Microprocessor Report.
- Fujitsu Limited (August 2004). SPARC64 V Processor For UNIX Server.
- Fujitsu Limited (2 December 2010). Fujitsu and Oracle Enhance SPARC Enterprise M-Series with New Processor.
- Fujitsu Limited (14 April 2011). Fujitsu and Oracle Deliver Enhanced SPARC Enterprise M3000 Server.
- Halfhill, Tom R. (22 September 2014). "Sparc64 XIfx Uses Memory Cubes". Microprocessor Report.
- Diefendorff, Keith (15 November 1999). "Hal Makes Sparcs Fly". Microprocessor Report, Volume 13, Number 5.
- Krewell, Kevin (21 October 2002). "Fujitsu's SPARC64 V Is Real Deal". Microprocessor Report.
- Krewell, Kevin (24 November 2003). "Fujitsu Makes SPARC See Double". Microprocessor Report.
- krewell, Kevin (24 June 2004). "SPARC's New Roadmap. Microprocessor Report.
- Krewell, Kevin (25 October 2004). "SPARC Turns 90nm". Microprocessor Report.
- Krewell, Kevin (14 November 2005). "SPARC's Still Going Strong". Microprocessor Report.
- McGhan, Harlan (25 September 2006). "The Sun-Fujitsu APL Alliance". Microprocessor Report.
- McGhan, Harlan (23 October 2006). "SPARC64 VI Ready for PrimeTime". Microprocessor Report.
- Morgan, Timothy Prickett (24 June 2004). "Fujitsu-Siemens Upgrades PrimePower Unix Servers". The Unix Guardian.
- Morgan, Timothy Prickett (9 February 2006). "Fujitsu-Siemens Cranks the Clock on Sparc V Chips for PrimePowers". The Unix Guardian, Volume 3, Number 5.
- Morgan, Timothy Prickett (23 February 2006). "Fujitsu Draws Sparc64 Roadmap Past 2010". The Unix Guardian.
- Morgan, Timothy Prickett (19 April 2007). "Fujitsu, Sun Deliver Joint Sparc Enterprise Server Line". The Unix Guardian.
- Morgan, Timothy Prickett (17 July 2008). "Fujitsu and Sun Flex Their Quads with New Sparc Server Lineup". The Unix Guardian.
- Morgan, Timothy Prickett (12 April 2011). "Oracle, Fujitsu goose Sparc M3000 entry box". The Register.
- Sakamoto, Mariko et al. (2003). "Microarchitecture and Performance Analysis of a SPARC-V9 Microprocessor for Enterprise Server Systems". Proceedings of the 9th International Symposium on High-Performance Computer Architecture. pp. 141–152.