Larrabee (microarchitecture): Difference between revisions
→Comparison with competing products: Updating the "GPU from 2006" link to reflect a correction from the original source, which links to the PC Pro article originally linked here. |
|||
Line 14: | Line 14: | ||
[[DreamWorks Animation]] has partnered with Intel and is planning to use Larrabee in movie production. DreamWorks Animation CEO [[Jeffrey Katzenberg]] states "we are well on the way of upgrading our software to really take advantage of Larrabee and in terms of speed, flexibility, capacity, it just raises the bar of what we can do by not 2 or 3x, but 20x."<ref>[http://news.cnet.com/8301-1001_3-10152355-92.html]</ref> |
[[DreamWorks Animation]] has partnered with Intel and is planning to use Larrabee in movie production. DreamWorks Animation CEO [[Jeffrey Katzenberg]] states "we are well on the way of upgrading our software to really take advantage of Larrabee and in terms of speed, flexibility, capacity, it just raises the bar of what we can do by not 2 or 3x, but 20x."<ref>[http://news.cnet.com/8301-1001_3-10152355-92.html]</ref> |
||
Larrabee's early presentation has drawn some criticism from GPU competitors. At [[Nvision|NVISION 08]], several [[NVIDIA]] employees called Intel's [[SIGGRAPH]] paper about Larrabee "marketing puff" and told the press that the Larrabee architecture was "like a [[GPU]] from 2006".<ref>[http:// |
Larrabee's early presentation has drawn some criticism from GPU competitors. At [[Nvision|NVISION 08]], several [[NVIDIA]] employees called Intel's [[SIGGRAPH]] paper about Larrabee "marketing puff" and told the press that the Larrabee architecture was "like a [[GPU]] from 2006".<ref>[http://news.cnet.com/8301-13512_3-10024280-23.html Larrabee performance--beyond the sound bite]</ref> As of June 2009, prototypes of Larrabee have been claimed to be on par with the [[GeForce 200 Series|nVidia GeForce GTX 285]].<ref>[http://www.tomshardware.com/news/intel-larrabee-nvidia-geforce,7944.html Intel's 'Larrabee' on Par With GeForce GTX 285]</ref> |
||
===Differences with current GPUs=== |
===Differences with current GPUs=== |
Revision as of 06:10, 17 June 2009
Larrabee is the codename for a GPGPU chip that Intel is developing separately from its current line of integrated graphics accelerators. Larrabee is expected to compete with GeForce and Radeon products from NVIDIA and AMD respectively. Larrabee will also compete in the GPGPU and high-performance computing markets. The first video cards featuring Larrabee are likely to be released in the first half of 2010.[1][2][3][4]
Comparison with competing products
Larrabee can be considered a hybrid between a multi-core CPU and a GPU, and has similarities to both. Its coherent cache hierarchy and x86 architecture compatibility are CPU-like, while its wide SIMD vector units and texture sampling hardware are GPU-like.
As a GPU, Larrabee will support traditional rasterized 3D graphics (DirectX/OpenGL) for games. However, Larrabee's hybrid of CPU and GPU features should be suitable for general purpose GPU (GPGPU) or stream processing tasks.[1] For example, Larrabee might perform ray tracing or physics processing,[5] in real time for games or offline for scientific research as a component of a supercomputer.[6]
DreamWorks Animation has partnered with Intel and is planning to use Larrabee in movie production. DreamWorks Animation CEO Jeffrey Katzenberg states "we are well on the way of upgrading our software to really take advantage of Larrabee and in terms of speed, flexibility, capacity, it just raises the bar of what we can do by not 2 or 3x, but 20x."[7]
Larrabee's early presentation has drawn some criticism from GPU competitors. At NVISION 08, several NVIDIA employees called Intel's SIGGRAPH paper about Larrabee "marketing puff" and told the press that the Larrabee architecture was "like a GPU from 2006".[8] As of June 2009, prototypes of Larrabee have been claimed to be on par with the nVidia GeForce GTX 285.[9]
Differences with current GPUs
Larrabee will differ from other discrete GPUs currently on the market such as the GeForce 200 Series and the Radeon 4000 series in three major ways:
- Larrabee will feature cache coherency across all its cores.[10]
- Larrabee will include very little specialized graphics hardware, instead performing tasks like z-buffering, clipping, and blending in software, using a tile-based rendering approach.[10] A renderer implemented in software can more easily be modified, allowing more differentiation in appearance between games or other 3D applications. Intel's SIGGRAPH 2008 paper[10] mentions order-independent transparency, irregular Z-buffering, and real-time raytracing as rendering features that can be implemented with Larrabee.
- Due to previous point, some effects could potentially run much faster on Larrabee than on a conventional GPU. Other GPUs often cannot read the current render target, or with lots of limitations (the ROP involved in accessing the render-target are hardware implemented). Thus they require multiple passes (previous rendering ; switch to another render target or copy to 2nd VRAM render target ; rendering using previous render target image as an input texture (switching render target being a slow operation as it must flush the whole GPU pipeline). Larrabee being completely programmable, one could read and write to the render target without requiring a copy of it.
Differences with CPUs
The x86 processor cores in Larrabee will be different in several ways from the cores in current Intel CPUs such as the Core 2 Duo or Core i7:
- Larrabee's x86 cores will be based on the much simpler Pentium P54C design which is still being maintained for use in embedded applications. [11] The P54C-derived core is superscalar but does not include out-of-order execution, though it has been updated with modern features such as x86-64 support, [10] similar to Intel Atom. In-order execution means lower performance for individual cores, but since they are smaller, more can fit on a single chip, increasing overall throughput. Execution is also more deterministic so instruction and task scheduling can be done by the compiler.
- Each Larrabee core contains a 512-bit vector processing unit, able to process 16 single precision floating point numbers at a time. This is similar to but four times larger than the SSE units on most x86 processors, with additional features like scatter/gather instructions and a mask register designed to make using the vector unit easier and more efficient. Larrabee derives most of its number-crunching power from these vector units.[10]
- Larrabee includes one major fixed-function graphics hardware feature: texture sampling units. These perform trilinear and anisotropic filtering and texture decompression.[10]
- Larrabee has a 1024-bit (512-bit each way) ring bus for communication between cores and to memory.[10] This bus can be configured in two modes to support Larrabee products with 16 cores or more, or fewer than 16 cores.[12]
- Larrabee includes explicit cache control instructions to reduce cache thrashing during streaming operations which only read/write data once.[10] Explicit prefetching into L2 or L1 cache is also supported.
- Each core supports 4-way simultaneous multithreading, with 4 copies of each processor register.[10]
Theoretically Larrabee's x86 processor cores can run existing PC software; even operating systems. However, Larrabee's video card will not include all the features of a PC-compatible motherboard, so PC operating systems and applications will not run without modifications. A different version of Larrabee might sit in motherboard CPU sockets using QuickPath[13], but Intel has not yet announced plans for this. Though Larrabee Native's C/C++ compiler includes auto-vectorization and many applications can execute correctly after recompiling, maximum efficiency may require code optimization using C++ vector intrinsics or inline Larrabee assembly code.[10] However, as in all GPGPU, not all software benefits from utilization of a vector processing unit.
Comparison with the Cell Broadband Engine
Larrabee's philosophy of using many small, simple cores is similar to the ideas behind the Cell processor. There are some further commonalities, such as the use of a high-bandwidth ring bus to communicate between cores.[10] However, there are many significant differences in implementation which should make programming Larrabee simpler.
- The Cell processor includes one main processor which controls many smaller processors. Additionally, the main processor can run an operating system. In contrast, all of Larrabee's cores are the same, and the Larrabee is not expected to run an OS.
- Each computer core in the Cell (SPE) has a local store, for which explicit (DMA) operations are used for all accesses to DRAM. Ordinary reads/writes to DRAM are not allowed. In Larrabee, all on-chip and off-chip memories are under automatically-managed coherent cache hierarchy, so that its cores virtually share a uniform memory space through standard load/store instructions. However, Larrabee cores each have 256K of local L2 cache, and other L2 segments take longer to access, which is somewhat similar in principle to the Cell SPUs.[10]
- Because of the cache coherency noted above, each program running in Larrabee has virtually a large linear memory just as in traditional general-purpose CPU; whereas an application for Cell should be programmed taking into consideration limited memory footprint of the local store associated with each SPE (for details see this article) but with theoretically higher bandwidth. However, since local L2 is faster to access, an advantage can still be gained from using Cell-style programming methods.
- Cell uses DMA for data transfer to/from on-chip local memories, which has a merit in flexibility and throughput; whereas Larrabee uses special instructions for cache manipulation (notably cache eviction hints and pre-fetch instructions), which has a merit in that it can maintain cache coherence (hence the standard memory hierarchy) while boosting performance for e.g. rendering pipelines and other stream-like computation.[10]
- Each compute core in the Cell runs only one thread at a time, in-order. A core in Larrabee runs up to four threads. Larrabee's hyperthreading helps hide latencies and compensates for lack of out-of-order execution.
Comparison with Intel GMA
Intel currently sells a line of GPUs under the Intel GMA brand. These chips are not sold separately but are integrated onto motherboards. Though the low cost and power consumption of Intel GMA chips make them suitable for small laptops and less demanding tasks, they lack the 3D graphics processing power to compete with NVIDIA and AMD/ATI for a share of the high-end gaming computer market, the HPC market, or a place in popular video game consoles. In contrast, Larrabee will be sold as a discrete GPU, separate from motherboards, and is expected to perform well enough for consideration in the next generation of video game consoles.[14]
The team working on Larrabee is separate from the Intel GMA team. The hardware is being designed by Intel's Hillsboro, Oregon design team, whose last major design was the Nehalem. The software and drivers are being written by a newly-formed team. The 3D stack specifically is being written by developers at RAD Game Tools (including Michael Abrash).[15]
The Intel Visual Computing Institute will research basic and applied technologies that could be applied to Larrabee-based products.[16]
Preliminary performance data
Intel's SIGGRAPH 2008 paper describes cycle-accurate simulations (limitations of memory, caches and texture units was included) of Larrabee's projected performance.[10] Graphs show how many 1 GHz Larrabee cores are required to maintain 60 FPS at 1600x1200 resolution in several popular games. Roughly 25 cores are required for Gears of War with no antialiasing, 25 cores for F.E.A.R with 4x antialiasing, and 10 cores for Half-Life 2: Episode 2 with 4x antialiasing. It is likely that Larrabee will run faster than 1 GHz, so these numbers do not represent actual Larrabee cores, rather virtual timeslices of such.[17] Another graph shows that performance on these games scales nearly linearly with the number of cores up to 32 cores. At 48 cores the performance drops to 90% of what would be expected if the linear relationship continued.
A June 2007 PC Watch article suggests that the first Larrabee chips will feature 32 x86 processor cores and come out in late 2009, fabricated on a 45 nanometer process. Chips with a few defective cores due to yield issues will be sold as a 24-core version. Later in 2010 Larrabee will be shrunk for a 32 nanometer fabrication process which will enable a 48 core version.[18]
Tech news site Fudzilla has posted several short articles about Larrabee, claiming that Larrabee may have a TDP as large as 300W,[19] that Larrabee will use a 12-layer PCB and has a cooling system that "is meant to look similar to what you can find on high-end Nvidia cards today,"[20] that Larrabee will use GDDR5 memory, and that it is targeted to have 2 single-precision teraflops of computing power.[21]
The last statement of performance can be calculated (theoretically this is maximum possible performance) as follows: 32 cores x 16 single-precision float SIMD per core x 2GHz per core = 2 TFLOPS
See also
References
- ^ a b "First Details on a Future Intel Design Codenamed 'Larrabee'". Intel. Retrieved 2008-09-01.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ "Exclusive: Jon Peddie predicts great second half of 2009 for graphics market". Hexus.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ "Intel Corp. Q1 2009 Earnings Call Transcript". Seeking Alpha.
- ^ "Intel Confirms 'Larrabee' First Half 2010; No Delay". Tom's Hardware.
- ^ Stokes, Jon. "Intel picks up gaming physics engine for forthcoming GPU product". Ars Technica. Retrieved 2007-09-17.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ Stokes, Jon. "Clearing up the confusion over Intel's Larrabee". Ars Technica. Retrieved 2007-06-01.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ [1]
- ^ Larrabee performance--beyond the sound bite
- ^ Intel's 'Larrabee' on Par With GeForce GTX 285
- ^ a b c d e f g h i j k l m n o "Larrabee: A Many-Core x86 Architecture for Visual Computing". Intel. doi:10.1145/1360612.1360617. Retrieved 2008-08-06.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ "Intel's Larrabee GPU based on secret Pentagon tech, sorta [Updated]". Ars Technica. Retrieved 2008-08-06.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ Glaskowsky, Peter. "Intel's Larrabee--more and less than meets the eye". CNET. Retrieved 2008-08-20.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ Stokes, Jon. "Clearing up the confusion over Intel's Larrabee, part II". Ars Technica. Retrieved 2008-01-16.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ Chris Leyton (2008-08-13). "Intel's Larrabee Shaping Up For Next-Gen Consoles?". Retrieved 2008-08-24.
- ^ AnandTech: Intel's Larrabee Architecture Disclosure: A Calculated First Move
- ^ Ng, Jansen (2009-05-13). "Intel Visual Computing Institute Opens, Will Spur "Larrabee" Development". DailyTech. Retrieved 2009-05-13.
- ^ Steve Seguin (August 20, 2008). "Intel's 'Larrabee' to Shakeup AMD, Nvidia". Tom's Hardware. Retrieved 2008-08-24.
- ^ "Intel is promoting the 32 core CPU "Larrabee"". pc.watch.impress.co.jp. Retrieved 2008-08-06.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help)Template:Jatranslation - ^ "Larrabee to launch at 300W TDP". fudzilla.com. Retrieved 2008-08-06.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ "Larrabee will use a 12-layer PCB". fudzilla.com. Retrieved 2008-08-06.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help) - ^ "Larrabee will use GDDR5 memory". fudzilla.com. Retrieved 2008-08-06.
{{cite web}}
: Italic or bold markup not allowed in:|publisher=
(help)
External links
- Rasterization on Larrabee
- A First Look at the Larrabee New Instructions (LRBni)
- C++ implementation of the Larrabee new instructions
- Game Physics Performance on Larrabee
- Intel fact sheet about Larrabee
- Intel's SIGGRAPH 2008 paper on Larrabee
- Techgage.com - Discusses how Larrabee differs from normal GPUs, includes block diagram illustration
- Intel's Larrabee Architecture Disclosure: A Calculated First Move