Bulldozer (microarchitecture)

From Wikipedia, the free encyclopedia
  (Redirected from Bulldozer (processor))
Jump to: navigation, search

Bulldozer is the codename Advanced Micro Devices (AMD) has given to one of the CPU cores based on the AMD family 15h microarchitecture, successor to the family 10 h (K10) microarchitecture for the company's M-SPACE design methodology, with the core specifically aimed at 10-watt to 125-watt TDP computing products. Bulldozer is designed from scratch, not a development of earlier processors.[1] AMD claims dramatic performance-per-watt efficiency improvements in high-performance computing (HPC) applications with Bulldozer cores. Processors with the Bulldozer core for desktop computers were released on October 12, 2011.

The Bulldozer cores support most of the instruction sets implemented by Intel processors available at its introduction (including SSE4.1, SSE4.2, AES, CLMUL, and AVX) as well as future instruction sets proposed by AMD (XOP and FMA4).[2][3]

Contents

[edit] Basic description

According to AMD, Bulldozer-based CPUs are based on GlobalFoundries' 32 nm Silicon on insulator (SOI) process technology and utilize a new approach to multithreaded computer performance that, according to press notes, "balances dedicated and shared computer resources to provide a highly compact, high core count design that is easily replicated on a chip for performance scaling."[4] In other words, by eliminating some of the redundancies that naturally creep into multicore designs, AMD hoped to take better advantage of its hardware capabilities, while using less power.

Bulldozer-based implementations built on 32nm SOI with HKMG arrived in October 2011 for both servers and desktops. The server segment included the dual chip 16-core Opteron processor codenamed Interlagos (for Socket G34) and single chip 4–8 core Valencia (for Socket C32), while the 4–8 core Zambezi targeted desktops on Socket AM3+.[5][6]

Bulldozer is the first major redesign of AMD’s processor architecture since 2003, when the firm launched its Athlon 64/Opteron (K8) processors, and also features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer cores each with 4 pipelines (the fetch/decode stage is shared). Bulldozer will also introduce shared L2 cache in the new architecture. AMD calls this design a "Bulldozer module". A 16-core processor design would feature eight of these modules,[7] but the operating system will recognize each module as two physical cores.

The module, described as two cores, can be contrasted with a single Intel core with HyperThreading. The difference between the two approaches is that Bulldozer provides dedicated schedulers and integer units for each thread, whereas in Intel's core all threads must compete for available execution resources.

[edit] Architecture

[edit] Bulldozer Module

Block diagram of Bulldozer Module
Block diagram of "8 core" CPU incl. 4 Modules
  • AMD has introduced a new microarchitecture building block called module. In terms of hardware complexity and functionality, a module is midway between a dual-core processor (in which each core is fully independent) and a single processor core that has two SMT threads (in which each thread shares most of the hardware resources with the other thread).
    • A module consists of two tightly coupled, "conventional" x86 out-of-order processing engines. The processing engine shares the early pipeline stages (eg. instruction fetch, decode), the FPUs, and the L2 cache with the sibling in the module.
  • Each module has the following independent hardware resources:[8][9]
    • up to 2048 kB L2 cache per module (shared between the cores in a module)
    • 16 kB four-way L1 data cache (way-predicted) per core and two-way 64 kB L1 instruction cache per module, one way for each of the two cores[10][11][12]
    • Two dedicated integer cores
      - each consists of two ALU and two AGU which are capable for total of 4 independent arithmetic and memory operations per clock per core
      - duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which significantly increase performance in multithreaded integer applications
      - second integer core increases Bulldozer module die by around 12%, which at chip level adds about 5% of total die space[13]
    • Two symmetrical 128-bit FMAC (fused multiply–add capability) floating-point pipelines per module that can be unified into one large 256-bit-wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software
  • Multiple modules share an L3 cache as well as an Advanced Dual-Channel Memory Sub-System (IMC - Integrated Memory Controller).
  • A module has 213 million transistors in an area of 30.9 mm² (including 2 MB L2 cache) on an Orochi die[14]
  • A dual-core Bulldozer processor has a single module, a quad-core processor has two modules and an octo-core processor has four modules.

[edit] Instruction set extensions

  • Support for Intel's Advanced Vector Extensions (AVX) instruction set, which supports 256-Bit floating point operations, and SSE4.1, SSE4.2, AES, CLMUL, as well as future 128-bit instruction sets proposed by AMD (XOP, FMA4 and CVT16),[15] which have the same functionality as the SSE5 instruction set formerly proposed by AMD, but with compatibility to the AVX coding scheme.

[edit] Process technology and clock frequency

  • 11-metal layer 32 nm SOI process with implemented first generation GlobalFoundries' High-K Metal Gate (HKMG)
  • Turbo Core performance boost to increase clock frequency by 500 MHz with all cores active (for most workloads) and further, as TDP headroom permits[16]
  • The chip operates at 0.8 to 1.3 V, achieving clock frequencies of 3.5 GHz or more[14]
  • Min-Max power usage - 10 to 125 watts

[edit] Cache and memory interface

  • Up to 8 MB of L3 cache shared among all modules on the same silicon die (8MB per 4 Modules, 16MB per 8 Modules and so on)(16 MB for dual-die MCM), divided into four subcaches of 2 MB each, capable of operating at 2.4 GHz or more at 1.1 V[14]
  • Native DDR3-1866 memory support[17]
  • Dual Channel DDR3 integrated memory controller (support for PC3-15000 (DDR3-1866)) for Desktop, Quad Channel DDR3 Integrated Memory Controller (support for PC-12800 (DDR3-1600) and Registered DDR3)[18] for Server/Workstation (New Opteron Valencia and Interlagos)

[edit] I/O and socket interface

chipset and I/Os for 1st. generation
  • Hyper Transport Technology rev. 3.1 (3.20 GHz, 6.4 GT/s, 25.6 GB/s, 16-bit uplink/16-bit downlink) [first implemented into HY-D1 revision "Magny-Cours" on the socket G34 Opteron platform in March 2010 and "Lisbon" on the socket C32 Opteron platform in June 2010]
  • Socket AM3+ (AM3b)
    • 942pin, DDR3 support
    • will retain backward compatibility with Socket AM3 motherboards (as per motherboard manufacturer choice and if BIOS updates are provided[19][20]), however this not officially supported by AMD; AM3+ motherboards will be backward-compatible with AM3 processors[21].
  • For the server segment, the existing socket G34 (LGA1974) and socket C32 (LGA1207) will be used.

[edit] Processors

The first revenue shipments of Bulldozer-based Opteron processors was announced on September 7, 2011.[22] The FX-4100, FX-6100, FX-8120 and FX-8150 were released towards the end of 2011; AMD said that the remaining FX series AMD processors would be released at the end of the first quarter of 2012.

The expected Zambezi parts are summarized in the table below:

Model FX-8170 FX-8150 FX-8120 FX-8100 FX-6200 FX-6120 FX-6100 FX-4170 FX-4150 FX-4120 FX-4100
Code Name Zambezi
Integer Cores / Modules 8/4 6/3 4/2
TDP 125W 125W/95W 125W 95W 125 W 95W
Normal Freq. 3.9 GHz 3.6 GHz 3.1 GHz 2.8 GHz 3.8 GHz 3.6 GHz 3.3 GHz 4.2 GHz 3.8 GHz 3.9 GHz 3.6 GHz
Full-Load Freq. (Turbo) 4.2 GHz 3.9 GHz 3.4 GHz 3.1 GHz 4.0 GHz 3.9 GHz 3.6 GHz 4.2 GHz 3.9 GHz 4.0 GHz 3.7 GHz
Half-Load Freq. (Turbo) 4.5 GHz 4.2 GHz 4.0 GHz 3.7 GHz 4.1 GHz 4.2 GHz 3.9 GHz 4.3 GHz 4.0 GHz 4.1 GHz 3.8 GHz
L2 Cache 8MB 6MB 4MB
L3 Cache 8MB
Memory DDR3 >1866 MHz
Unlocked Yes No Yes
Turbo Core 2.0 Yes
Socket AM3+
Process Technology 32nm HkmG SOI

Major Source : CPU-World [23]

AMD plans two series of Bulldozer based processors for servers: Opteron 4200 series (code named Valencia, with up to 8 cores) and Opteron 6200 series (code named Interlagos, with up to 16 cores).[24]

[edit] "FX" Release

On 12 October 2011, AMD released the first four FX-series processors of the Bulldozer line (FX-8150, FX-8120, FX-6100, FX-4100) and lifted their NDA on official reviews.[25]

The first Bulldozer CPUs were met with a mixed response. It was discovered that the FX-8150 performed poorly in benchmarks that were not highly threaded, falling behind the second-generation Intel Core i* series processors and being matched or even outperformed by AMD's own Phenom II X6 at lower clock speeds. In highly threaded benchmarks, the FX-8150 performed on par with the Phenom II X6, and the Intel Core i7 2600K, depending on the benchmark. Given the overall more consistent performance of the Intel Core i5 2500K at a lower price, these results left many reviewers underwhelmed. The processor was found to be extremely power-hungry under load, especially when overclocked, compared to Intel's Sandy Bridge.[26]

The Tom's Hardware website commented that the lower-than-expected performance in multi-threaded workloads may be because of the way Windows 7 currently schedules threads to the cores. They point out that "if Windows were able to utilize an FX-8150's four modules first, and then backfill each module's second core, it'd maximize performance with up to four threads running concurrently." This is similar to what happens on Intel CPUs with HyperThreading – Windows 7 "schedules to physical cores before utilizing logical (HyperThreaded) cores."[27]

Overclocking was found to improve performance, but increase power draw significantly.[28]

On 13 October, AMD stated on its blog that "there are some in our community who feel the product performance did not meet their expectations", but showed benchmarks on actual applications where it outperformed "Sandy Bridge i7 2600k" and "AMD X6 1100T".[29]

[edit] Post-2011

[edit] 2nd Generation

AMD Financial Analyst Day 2010[30] revealed the 2nd generation is scheduled for 2012; AMD referred to this generation as Enhanced Bulldozer. This later generation of Bulldozer core is codenamed Piledriver, and is intended for specific desktop and notebook markets:

  • Desktop Performance market (Volan platform):[31] Zambezi's replacement is Vishera, with up to 8 cores; with Turbo Core 3.0 while using the existing Socket AM3+ format and 9xx series chipset of the 1st generation FX-series Zambezi processor. AMD says that this 2nd-generation FX-series processor, code-named Piledriver, would offer up to 20% to 30% better performance increase under digital media workloads. Will also be based on a Quad-Channel DDR3 memory interface.[32][33]
  • Desktop Budget and Mainstream market (Virgo platform):[34] The Stars-based Llano Fusion APU line replacement is 2- to 4-core Socket FM2 Trinity, Weatherford, and Richland Fusion APUs, selling at various price points in the desktop market.[35]
  • Notebook Mainstream and Performance market (Comal platform):[36] the same as mentioned in Desktop Budget/Mainstream market.

At AMD Fusion Developer Summit (AFDS) 2011, AMD said that the computational capacity of the notebook variant of Trinity would be 50% faster than Llano.[37][38][39]

For the server market, two versions were known to be under development as of November 2011:[40][41]

  • Cost-effective, energy efficient server (1 to 2 CPUs) market: Opteron 4200-series (Valencia; 6 or 8 cores) will be replaced by Sepang (up to 10 cores). Sepang will be using a socket format called C2012. The memory controller will support triple-channel DDR3 memory configuration, and will have PCI Express 3.0 controller support.
  • Enterprise and Mainstream server (2 to 4 CPUs) market: Opteron 6200-series (Interlagos; 8, 12, and 16 cores) will be replaced by Terramar (up to 20 cores). Terramar will be using a socket format called G2012. Like Sepang, it will also have a PCI Express 3.0 controller. But differ by supporting quad-channel DDR3 memory configuration.

[edit] 3rd Generation

As of 2011 AMD mentioned (by name) a 3rd generation Bulldozer-based line for 2013,[40] with working title Next Generation Bulldozer, on the 22 nm FD-SOI manufacturing process.[42]

On 21 September 2011, leaked AMD slides indicated this 3rd generation of Bulldozer core was codenamed Steamroller[43][44] and would be incorporated into specific desktop and notebook markets:

  • Desktop Budget and Mainstream market (??? platform): The Trinity Fusion APU line will be replaced by Kaveri Fusion APU line as the 3rd generation A8-, A6-, and A4-series for the desktop market.
  • Notebook Mainstream and Performance market (Indus platform): Will be the same as mentioned in Desktop Budget/Mainstream market. The FCH chipset will be codenamed Bolton.

For the server market, two versions were planned:[45]

  • Cost-effective, energy efficient server (1 to 2 CPUs) market: Opteron 4200-series Sepang (up to 10 cores) to be replaced by Macau (up to 10 cores), re-using the C2012 socket format.
  • Enterprise and Mainstream server (2 to 4 CPUs) market: Opteron 6200-series Terramar (up to 20 cores) to be replaced by Dublin (up to 20 cores), re-using the G2012 socket format.

[edit] 4th Generation

On 12 October 2011, AMD revealed Excavator to be the codename for the 4th generation Bulldozer core, scheduled for 2014 release.[46]

[edit] See also

[edit] References

  1. ^ Bulldozer 50% Faster than Core i7 and Phenom II, techPowerUp, http://www.techpowerup.com/138328/Bulldozer-50-Faster-than-Core-i7-and-Phenom-II.html, retrieved 2012-01-23 
  2. ^ AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP, and FMA4 Instructions, AMD, May 1, 2009, http://support.amd.com/us/Processor_TechDocs/43479.pdf, retrieved 2009-05-08 
  3. ^ Striking a balance, Dave Christie, AMD Developer blogs, May 7, 2009, http://forums.amd.com/devblog/blogpost.cfm?threadid=112934&catid=208, retrieved 2009-05-08 
  4. ^ AMD Sets New Mark in x86 Innovation with First Detailed Disclosures of Two New Core Designs, AMD, August 24, 2011, p. 1, http://www.amd.com/us/press-releases/pages/amd-x86-innovation-new-core-designs-2010aug24.aspx, retrieved September 18, 2011 
  5. ^ Analyst Day 2009 Summary, AMD, November 11, 2009, http://www.amd.com/us/press-releases/Pages/amd-analyst-day-2009nov11.aspx, retrieved 2009-11-14 
  6. ^ AMD bestätigt: "Zambezi" ist inkompatibel zum Sockel AM3, Planet3dnow.de, http://www.planet3dnow.de/cgi-bin/newspub/viewnews.cgi?id=1282840508, retrieved 2012-01-23 
  7. ^ Analyst Day 2009 Presentations, AMD, November 11, 2009, http://phx.corporate-ir.net/phoenix.zhtml?c=74093&p=irol-analystday, retrieved 2009-11-14 
  8. ^ Bulldozer microarchitecture block, AnandTech, August 24, 2010, http://images.anandtech.com/reviews/cpu/amd/hotchips2010/bulldozeruarch.jpg 
  9. ^ Bulldozer module functional schematic, AMD, August 24, 2010, http://www.xbitlabs.com/images/news/2010-08/bulldozer_3_aug2010.png 
  10. ^ More On Bulldozer, Tomshardware.com, 2010-08-24, http://www.tomshardware.com/reviews/bulldozer-bobcat-hot-chips,2724-2.html, retrieved 2012-01-23 
  11. ^ AMD Reveals Details About Bulldozer Microprocessors, AMD Reveals Details About Bulldozer Microprocessors, Xbitlabs.com, http://www.xbitlabs.com/news/cpu/display/20100824154814_AMD_Unveils_Details_About_Bulldozer_Microprocessors.html, retrieved 2012-01-23 
  12. ^ Real World Technologies (2010-08-26), AMD's Bulldozer Microarchitecture, Realworldtech.com, http://www.realworldtech.com/page.cfm?ArticleID=RWT082610181333&p=4, retrieved 2012-01-23 
  13. ^ Bulldozer design power efficiency, AMD, August 24, 2010, http://images.anandtech.com/reviews/cpu/amd/hotchips2010/bulldozerefficient.jpg 
  14. ^ a b c (PDF) AP, http://isscc.org/doc/2011/isscc2011.advanceprogrambooklet_abstracts.pdf, retrieved 2012-01-23 
  15. ^ XOP and FMA4 Instruction set in SSE5, Techreport.com, 2009-05-06, http://techreport.com/discussions.x/16871, retrieved 2012-01-23 
  16. ^ AMD Financial Analyst Day 2010, Server Platforms Presentation, Ir.amd.com, 2010-11-09, http://ir.amd.com/phoenix.zhtml?c=74093&p=irol-2010analystday, retrieved 2012-01-23 
  17. ^ AMD Roadmap, http://news.ati-forum.de/images/stories/Szymanski/News/2010/zambezi_roadmap.jpg, retrieved 2012-01-23 
  18. ^ Timothy Prickett Morgan (2010-11-15), AMD laughs at Intel with Opteron Bulldozers, theregister.co.uk, http://www.theregister.co.uk/2010/11/15/amd_bulldozer_opteron_rollout/page2.html, retrieved 2012-01-25 
  19. ^ ASUS confirms AM3+ compatibility on AM3 boards, Event.asus.com, http://event.asus.com/2011/mb/AM3_PLUS_Ready/, retrieved 2012-01-23 
  20. ^ MSI confirms AM3+ compatibility on AM3 boards, Event.msi.com, http://event.msi.com/mb/am3+/, retrieved 2012-01-23 
  21. ^ AM3 processors will work in the AM3+ socket, but Bulldozer chips will not work in non-AM3+ motherboards[dead link]
  22. ^ AMD Ships First "Bulldozer" Processors, http://finance.yahoo.com/news/AMD-Ships-First-Bulldozer-iw-1483835751.html?x=0 
  23. ^ AMD Bulldozer processor families, Cpu-world.com, 2011-12-30, http://www.cpu-world.com/CPUs/Bulldozer/index.html, retrieved 2012-01-23 
  24. ^ What Is Bulldozer?, 2010-08-02, http://blogs.amd.com/work/2010/08/02/what-is-bulldozer/ 
  25. ^ Unlock Your Record Setting AMD FX Series Processor Today, Amd.com, http://www.amd.com/us/press-releases/Pages/unlock-your-record-setting-2011oct12.aspx/, retrieved 2012-01-23 
  26. ^ Bulldozer Has Arrived: AMD FX-8150 Processor Review, X-bit labs, 2011-10-11, p. 13, http://www.xbitlabs.com/articles/cpu/display/amd-fx-8150_13.html#sect0, retrieved 2012-01-23 
  27. ^ Tom's Hardware review", Tomshardware.com, 2011-10-12, http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-3.html, retrieved 2012-01-23 
  28. ^ Bulldozer Has Arrived: AMD FX-8150 Processor Review, X-bit labs, 2011-10-11, p. 14, http://www.xbitlabs.com/articles/cpu/display/amd-fx-8150_14.html#sect0, retrieved 2012-01-23 
  29. ^ Our Take on AMD FX, Blogs.amd.com, 2011-10-13, http://blogs.amd.com/play/2011/10/13/our-take-on-amd-fx/, retrieved 2012-01-23 
  30. ^ AMD financial analyst day 2010 press kit, Blogs.amd.com, http://blogs.amd.com/press/2010/11/09/amd-financial-analyst-day-2010-press-kit/, retrieved 2012-01-23 
  31. ^ AMD Cancels Next-Gen Komodo Processor, Corona Platform in Favour of New Chips, X-bit labs, 2012-01-19, http://www.xbitlabs.com/news/cpu/display/20110906193303_AMD_Cancels_Next_Gen_Komodo_Processor_Corona_Platform_in_Favour_of_New_Chips.html, retrieved 2012-01-23 
  32. ^ btarunr (2012-1-20), AMD Vishera Packs Quad-Channel DDR3 IMC, http://www.techpowerup.com/forums/showthread.php?t=159062, retrieved 2012-01-25 
  33. ^ Anton Shilov (2011-10-26), AMD Expects Trinity to Offer 20-30% Performance Increase, xbitlabs.com, http://www.xbitlabs.com/news/cpu/display/20111026223104_AMD_Expects_Trinity_to_Offer_20_30_Performance_Increase.html, retrieved 2012-01-23 
  34. ^ Clive Webster (2011-06-14), AMD Reveals 2012 Roadmap, bit-tech.net, http://www.bit-tech.net/hardware/cpus/2011/06/14/amd-reveals-2012-roadmap/1, retrieved 2012-01-23 
  35. ^ APU依然当家 AMD明年各平台产品线曝光, Mb.zol.com.cn, 2011-06-30, http://mb.zol.com.cn/240/2405453.html, retrieved 2012-01-23 
  36. ^ AMD'nin 2012 için planladığı yeni nesil Fusion platformları detaylandı, Donanimhaber.com, 2011-05-31, http://www.donanimhaber.com/islemci/haberleri/AMDnin-2012-icin-planladigi-yeni-nesil-Fusion-platformlari-detaylandi.htm, retrieved 2012-01-23 
  37. ^ AMD's Trinity to Be at Least 50% Faster than Llano - Company, X-bit labs, 2011-06-14, http://www.xbitlabs.com/news/cpu/display/20110614211754_AMD_s_Trinity_to_Be_at_Least_50_Faster_than_Llano_Company.html, retrieved 2012-01-23 
  38. ^ (in French) AFDS: +50% pour Trinity et 10 Tflops en 2020 - Processeurs, HardWare.fr, 2011-06-14, http://www.hardware.fr/news/11647/afds-50-trinity-10-tflops-2020.html, retrieved 2012-01-23 
  39. ^ Marcus Pollice (2011-06-15), AMD Demonstrates Trinity, Promises 10 TFlofs APU by 2020, Brightsideofnews.com, http://www.brightsideofnews.com/news/2011/6/15/amd-demonstrates-trinity2c-promises-10tflops-apu-by-2020.aspx, retrieved 2012-01-23 
  40. ^ a b Anton Shilov (2012-01-19), AMD Plans to Release Twenty-Core Microprocessor in 2012, X-bit labs, http://www.xbitlabs.com/news/cpu/display/20101109113213_AMD_Plans_to_Release_Twenty_Core_Microprocessor_in_2012.html, retrieved 2012-01-23 
  41. ^ AMD Codename Decoder, Blogs.amd.com, 2010-11-10, http://blogs.amd.com/work/fadcodenames/, retrieved 2012-01-23 
  42. ^ The Next Transistor: planar, fins, and SoI at 22nm, Eetimes.com, 2011-07-19, http://www.eetimes.com/design/eda-design/4217997/The-next-transistor--planar--fins--and-SoI-at-22nm, retrieved 2012-01-23 
  43. ^ Hosszútávú mobil útiterv szivárgott ki az AMD-től - PROHARDVER! Processzor hír, Prohardver.hu, 2011-09-21, http://prohardver.hu/hir/amd_hosszutavu_mobil_utiterv.html, retrieved 2012-01-23 
  44. ^ Nuove roadmap AMD sulle future APU in programma nel 2012 e nel 2013 per il mercato mobile, Xtremehardware.it, 2011-09-21, http://www.xtremehardware.it/news/hardware/nuove-roadmap-amd-sulle-future-apu-in-programma-nel-2012-e-nel-2013-per-il-mercato-mobile-201109215761/, retrieved 2012-01-23 
  45. ^ 传AMD拟2013年推出Opteron Dublin/Macau_硬派网_INPAI.COM.CN, Inpai.com.cn, 2011-08-05, http://www.inpai.com.cn/doc/hard/154678.htm, retrieved 2012-01-23 
  46. ^ The Bulldozer Review: AMD FX-8150 Tested, AnandTech, 2011-10-12, http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested, retrieved 2012-01-23 

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages