Bulldozer (microarchitecture)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Bulldozer
Produced From late 2011 to present
Common manufacturer(s)
Min. feature size 32 nm
Instruction set AMD64
Predecessor K10
Successor Piledriver
Socket(s)
Core name(s)

Bulldozer is the codename for a microprocessor microarchitecture developed by AMD for the desktop and server markets. It was released on 12 October 2011 as the successor to the K10 microarchitecture.

Bulldozer is designed from scratch, not a development of earlier processors.[1] The core is specifically aimed at 10–125 watt TDP computing products. AMD claims dramatic performance-per-watt efficiency improvements in high-performance computing (HPC) applications with Bulldozer cores.

The Bulldozer cores support most of the instruction sets implemented by Intel processors available at its introduction (including SSE4.1, SSE4.2, AES, CLMUL, and AVX) as well as new instruction sets proposed by AMD (XOP and FMA4).[2][3]

Contents

Basic description [edit]

According to AMD, Bulldozer-based CPUs are based on GlobalFoundries's 32 nm Silicon on insulator (SOI) process technology and reuses the approach of DEC for multitask computer performance with the arguments of, according to press notes, "balances dedicated and shared computer resources to provide a highly compact, high units count design that is easily replicated on a chip for performance scaling."[4] In other words, by eliminating some of the "redundant" elements that naturally creep into multicore designs, AMD has hoped to take better advantage of its hardware capabilities, while using less power.

Bulldozer-based implementations built on 32nm SOI with HKMG arrived in October 2011 for both servers and desktops. The server segment included the dual chip (16-threads) Opteron processor codenamed Interlagos (for Socket G34) and single chip (4, 6 or 8 threads) Valencia (for Socket C32), while the Zambezi (4, 6 and 8 threads) targeted desktops on Socket AM3+.[5][6]

Bulldozer is the first major redesign of AMD’s processor architecture since 2003, when the firm launched its K8 processors, and also features two 128-bit FMA-capable FPUs which can be combined into one 256-bit FPU. This design is accompanied by two integer clusters, each with 4 pipelines (the fetch/decode stage is shared). Bulldozer will also introduce shared L2 cache in the new architecture. AMD's marketing service calls this design a "Module". A 16-threads processor design would feature eight of these "modules",[7] but the operating system will recognize each "module" as two logical cores.

The "module", described as two logical cores, can be contrasted with a single Intel core with HyperThreading. The only difference between the two approaches is that Bulldozer provides dedicated schedulers and integer units for each thread, whereas in Intel's core all threads must compete for available execution resources.

Architecture [edit]

Bulldozer core [edit]

Block diagram of a complete Bulldozer dual-clustered Integer Core (Module)
Block diagram of the 8 Integer clusters
Memory topology of a Bulldozer server
  • AMD has re-introduced the "Clustered Integer Core" micro-architecture, an architecture developed by DEC in 1996 with the RISC microprocessor Alpha 21264. This technology is informally called CMT (Clustered Multi-Thread) and formally called "module" by the AMD. In terms of hardware complexity and functionality, this "module" is midway between a dual-core processor and its integer power (each thread having a fully independent integer core) and a single core processor that has the SMT ability, which can create a dual threads processor but with the power of one (each thread shares the resources of the module with the other thread).
    • A "module" consists in a coupling of two "conventional" x86 out-of-order processing cores. The processing core shares the early pipeline stages (e.g. L1i, fetch, decode), the FPUs, and the L2 cache with the rest of the "module".
  • Each "module" has the following independent hardware resources:[8][9]
    • 2MB of L2 per "Module" (shared between the two integer cluster in the Core)
    • 16KB 4-way of L1d (way-predicted) per cluster and 2-way 64KB of L1i per core, one way for each of the two cluster[10][11][12]
    • Two dedicated integer clusters
      - each one consists of two ALU and two AGU which are capable for total of 4 independent arithmetic and memory operations per clock and per cluster
      - duplicating integer schedulers and execution pipelines offers dedicated hardware to each of two threads which increase performance in some multi-threaded integer case
      - the second integer cluster increases the Bulldozer core die by around 12%, which at chip level adds about 5% of total die space[13]
    • Two symmetrical 128-bit FMAC (fused multiply–add capability) floating-point pipelines per module that can be unified into one large 256-bit-wide unit if one of integer cores dispatch AVX instruction and two symmetrical x87/MMX/SSE capable FPPs for backward compatibility with SSE2 non-optimized software
  • All "modules" present share the L3 cache as well as an Advanced Dual-Channel Memory Sub-System (IMC - Integrated Memory Controller).
  • A "module" has 213 millions transistors in an area of 30,9mm² (including the 2MB shared L2 cache) on an Orochi die[14]

Instruction set extensions [edit]

  • Support for Intel's Advanced Vector Extensions (AVX) instruction set, which supports 256-Bit floating point operations, and SSE4.1, SSE4.2, AES, CLMUL, as well as future 128-bit instruction sets proposed by AMD (XOP, FMA4 and CVT16),[15] which have the same functionality as the SSE5 instruction set formerly proposed by AMD, but with compatibility to the AVX coding scheme.

Process technology and clock frequency [edit]

  • 11-metal layer 32 nm SOI process with implemented first generation GlobalFoundries's High-K Metal Gate (HKMG)
  • Turbo Core 2 performance boost to increase clock frequency up to 500 MHz with all threads active (for most workloads) and up to 1 GHz with the half of the thread active, within the TDP limit.[16]
  • The chip operates at 0.775 to 1.425 V, achieving clock frequencies of 3 GHz or more[14]
  • Min-Max TDP: 25–140 watts

Cache and memory interface [edit]

  • Up to 8MB of L3 shared among all Cores on the same silicon die (8 MB for 4 Cores in Desktop segment and 16 MB for 8 Cores in the Server segment), divided into four subcaches of 2MB each, capable of operating at 2.2 GHz at 1.1125V[14]
  • Native DDR3 memory support up to DDR3-1866[17]
  • Dual Channel DDR3 integrated memory controller for Desktop and Server/Workstation Opteron 42xx "Valencia";[18] Quad Channel DDR3 Integrated Memory Controller [19] for Server/Workstation Opteron 62xx "Interlagos"
  • AMD claims support for two DIMMs of DDR3-1600 per channel. Two DIMMs of DDR3-1866 on a single channel will be down-clocked to 1600.

I/O and socket interface [edit]

  • Hyper Transport Technology rev. 3.1 (3.20 GHz, 6.4 GT/s, 25.6 GB/s & 16-bit wide link) [first implemented into HY-D1 revision "Magny-Cours" on the socket G34 Opteron platform in March 2010 and "Lisbon" on the socket C32 Opteron platform in June 2010]
  • Socket AM3+ (AM3r2)
    • 942pin, DDR3 support only
    • will retain backward compatibility with Socket AM3 motherboards (as per motherboard manufacturer choice and if BIOS updates are provided[20][21]), however this not officially supported by AMD; AM3+ motherboards will be backward-compatible with AM3 processors[22].
  • For the server segment, the existing socket G34 (LGA1974) and socket C32 (LGA1207) will be used.

Processors [edit]

chipset and I/Os for 1st. CMT generation

The first revenue shipments of Bulldozer-based Opteron processors was announced on September 7, 2011.[23] The FX-4100, FX-6100, FX-8120 and FX-8150 were released in October 2011; with remaining FX series AMD processors released at the end of the first quarter of 2012.

Desktop [edit]

Model FX-8170 FX-8150 FX-8140 FX-8120 FX-8100 FX-6200 FX-6130 FX-6120 FX-6100 FX-4200 FX-4170 FX-4150 FX-4130 FX-4120 FX-4100
Unlocked Yes No Yes
Architecture BULLDOZER
Code Name ZAMBEZI
Cores 4 3 2
Clusters 8 6 4
L2 Cache 4×2 MB 3×2 MB 3×2 MB 2×2 MB
L3 Cache 8 MB 4 MB 8 MB
Normal Freq. 3.9 GHz 3.6 GHz 3.2 GHz 3.1 GHz 2.8 GHz 3.8 GHz 3.6 GHz 3.5 GHz 3.3 GHz 3.3 GHz 4.2 GHz 3.8 GHz 3.8 GHz 3.9 GHz 3.6 GHz
Full-load Turbo Mode 4.2 GHz 3.9 GHz 3.6 GHz 3.4 GHz 3.1 GHz 4.0 GHz 3.8 GHz 3.8 GHz 3.6 GHz 3.7 GHz N/A 3.9 GHz N/A 4.0 GHz 3.7 GHz
Half-Load Turbo Mode (Max) 4.5 GHz 4.2 GHz 4.1 GHz 4.0 GHz 3.7 GHz 4.1 GHz 3.9 GHz 4.1 GHz 3.9 GHz 4.0 GHz 4.3 GHz 4.0 GHz 3.9 GHz 4.1 GHz 3.8 GHz
TDP 140 W 125 W 125/95 W 95 W 125 W 95 W 125 W 95 W 125 W 95 W
Memory Supported DDR3 1866 MHz
Turbo Core Yes (2.0)
Socket AM3+
Process Technology 32 nm High-K Metal-Gate SOI

Major Source : CPU-World [24] Xbit-Labs [25]

AMD plans two series of Bulldozer-based processors for servers: Opteron 4200 series (code named Valencia, with up to eight cores) and Opteron 6200 series (code named Interlagos, with up to 16 cores).[26]

Performance [edit]

Performance on Linux [edit]

On 24 October 2011, the first generation tests done by Phoronix confirmed that the performance of Bulldozer CPU is somewhat less than expected.[27] In many tests the CPU has performed on same level as older generation Phenom 1060T.

The performance later substantially increased, as various compiler optimizations and CPU driver fixes were released.[28][29]

On 23 October 2012, updated Bulldozer codename "Vishera" was released and benchmarks under real applications by Phoronix revealed Vishera CPU to perform very efficiently under Linux compared to Intel products.[30]

Performance on Windows [edit]

The first Bulldozer CPUs were met with a mixed response. It was discovered that the FX-8150 performed poorly in benchmarks that were not highly threaded, falling behind the second-generation Intel Core i* series processors and being matched or even outperformed by AMD's own Phenom II X6 at lower clock speeds. In highly threaded benchmarks, the FX-8150 performed on par with the Phenom II X6, and the Intel Core i7 2600K, depending on the benchmark. Given the overall more consistent performance of the Intel Core i5 2500K at a lower price, these results left many reviewers underwhelmed. The processor was found to be extremely power-hungry under load, especially when overclocked, compared to Intel's Sandy Bridge.[31][32]

The Tom's Hardware website commented that the lower-than-expected performance in multi-threaded workloads may be because of the way Windows 7 currently schedules threads to the cores. They point out that "if Windows were able to utilize an FX-8150's four modules first, and then backfill each module's second core, it'd maximize performance with up to four threads running concurrently." This is similar to what happens on Intel CPUs with HyperThreading – Windows 7 "schedules to physical cores before utilizing logical (HyperThreaded) cores."[33]

On 13 October 2011, AMD stated on its blog that "there are some in our community who feel the product performance did not meet their expectations", but showed benchmarks on actual applications where it outperformed the Sandy Bridge i7 2600k and AMD X6 1100T.[34]

On 6 March 2012, AMD posted a knowledge base article stating that there was a compatibility issue with FX processors, and certain games on the widely used digital game distribution platform, Steam. AMD stated that they had provided a BIOS update to several motherboard manufacturers (namely: Asus, Gigabyte Technology, MSI, and ASRock) that would fix the issue.[35]

Overclocking [edit]

On 31 August 2011, AMD and a group of well-known overclockers including Brian McLachlan, Sami Mäkinen, Aaron Schradin, and Simon Solotko managed to set a new world record for CPU frequency using the unreleased and overclocked FX-8150 Bulldozer processor. Before that day, the record sat at 8.309 GHz, but the Bulldozer combined with liquid helium cooling reached a new high of 8.429 GHz. The record has since been beaten.[36]

Future revisions [edit]

2nd Generation (Piledriver) [edit]

AMD plans to update their 2nd generation FX-CPUs with the revised Piledriver cores used in the Richland APUs.[37] The Vishera 2.0 FX-CPUs are planned to be launched in June 2013.[38]

3rd Generation (Steamroller) [edit]

In 2011, AMD mentioned (by name) a third-generation Bulldozer-based line for 2013,[39] with working title Next Generation Bulldozer, on the 28 nm manufacturing process.[40] On 21 September 2011, leaked AMD slides indicated this third generation of Bulldozer core was codenamed Steamroller.[41][42] Steamroller will still feature two core based modules found in Bulldozer and Piledriver designs.[43] The focus of Steamroller is for greater parallelism.[44] Improvements will center on independent instruction decoders for each core within a module, better instruction schedulers, larger and smarter caches, more internal register resources and improved memory controller. AMD estimates that these improvements will increase instructions per cycle up to 30%.[43]

  • 3rd Generation FX-series CPU - Desktop Performance market (Unknown platform): AMD has stated that the FX-series will receive a Socket AM3+ version of a Steamroller based CPU in 2014.[40][45]
  • Kaveri A-series APU - Desktop Budget and Mainstream market (Unknown platform): The Trinity/Richland APU line is scheduled to be replaced by the Kaveri APU line as the 3rd generation A10, A8, A6, and A4 series for the desktop market in late 2013. The new APUs will feature 2-4 Steamroller cores and an improved graphics core.[46] Reports have also claimed it will feature both a DDR3 and GDDR5 integrated memory controller, although it appears that both memory types cannot be used together.[47]
  • Kaveri A-series APU - Notebook Mainstream and Performance market (Indus platform): Will be the same as mentioned in Desktop Budget/Mainstream market. The FCH chipset will be codenamed Bolton.

4th Generation (Excavator) [edit]

On 12 October 2011, AMD revealed Excavator to be the codename for the 4th generation Bulldozer core, scheduled for 2014 release.[48] Excavator will initially be implemented in the 4th Generation A-series Fusion APU line in 2014, while a revised version will be adopted in 2015 for the FX-series and Opteron lines.[40]

See also [edit]

References [edit]

  1. ^ Bulldozer 50% Faster than Core i7 and Phenom II, techPowerUp, retrieved 2012-01-23 
  2. ^ AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP, and FMA4 Instructions, AMD, 1 May 2009, retrieved 2009-05-08 
  3. ^ Striking a balance, Dave Christie, AMD Developer blogs, 7 May 2009, retrieved 2009-05-08 
  4. ^ AMD Sets New Mark in x86 Innovation with First Detailed Disclosures of Two New Core Designs, AMD, August 24, 2011, p. 1, retrieved September 18, 2011 
  5. ^ Analyst Day 2009 Summary, AMD, November 11, 2009, retrieved 2009-11-14 
  6. ^ AMD bestätigt: "Zambezi" ist inkompatibel zum Sockel AM3, Planet3dnow.de, retrieved 2012-01-23 
  7. ^ Analyst Day 2009 Presentations, AMD, November 11, 2009, retrieved 2009-11-14 
  8. ^ Bulldozer microarchitecture block, AnandTech, August 24, 2010 
  9. ^ Bulldozer module functional schematic, AMD, August 24, 2010 
  10. ^ More On Bulldozer, Tomshardware.com, 2010-08-24, retrieved 2012-01-23 
  11. ^ AMD Reveals Details About Bulldozer Microprocessors, AMD Reveals Details About Bulldozer Microprocessors, Xbitlabs.com, retrieved 2012-01-23 
  12. ^ Real World Technologies (2010-08-26), AMD's Bulldozer Microarchitecture, Realworldtech.com, retrieved 2012-01-23 
  13. ^ Bulldozer design power efficiency, AMD, August 24, 2010 
  14. ^ a b c AP (PDF), retrieved 2012-01-23 
  15. ^ XOP and FMA4 Instruction set in SSE5, Techreport.com, 2009-05-06, retrieved 2012-01-23 
  16. ^ AMD Financial Analyst Day 2010, Server Platforms Presentation, Ir.amd.com, 2010-11-09, retrieved 2012-01-23 
  17. ^ AMD Roadmap, retrieved 2012-01-23 
  18. ^ AMD (2012-05-14), AMD Opteron&TM; 4200 Series Processor Quick Reference Guide, www.amd.com, retrieved 2012-08-15 
  19. ^ AMD (2012-05-14), AMD Opteron&TM; 6200 Series Processor Quick Reference Guide, www.amd.com, retrieved 2012-08-15 
  20. ^ ASUS confirms AM3+ compatibility on AM3 boards, Event.asus.com, retrieved 2012-01-23 
  21. ^ MSI confirms AM3+ compatibility on AM3 boards, Event.msi.com, retrieved 2012-01-23 
  22. ^ AM3 processors will work in the AM3+ socket, but Bulldozer chips will not work in non-AM3+ motherboards[dead link]
  23. ^ AMD Ships First "Bulldozer" Processors 
  24. ^ AMD FX-Series processor families, Cpu-world.com, 2012-10-02, retrieved 2012-10-21 
  25. ^ Shilov, Anton (2012-09-21). "AMD Sets the FX "Vishera" Launch Date". X-bit laboratories. X-bit labs. Retrieved 2012-09-23.  More than one of |author= and |last= specified (help)
  26. ^ What Is Bulldozer?, 2010-08-02 
  27. ^ AMD FX-8150 Bulldozer On Ubuntu Linux, phoronix.com, 2011-10-24, retrieved 2012-12-13 
  28. ^ AMD Bulldozer Cache Aliasing Issue Fix, phoronix.com 
  29. ^ AMD's FX-8150 Bulldozer Benefits From New Compilers, Tuning, phoronix.com 
  30. ^ AMD FX-8350 "Vishera" Linux Benchmarks, phoronix.com, 2012-10-23, retrieved 2012-12-13 
  31. ^ Bulldozer Has Arrived: AMD FX-8150 Processor Review, X-bit labs, 2011-10-11, p. 13, retrieved 2012-01-23 
  32. ^ Bulldozer Has Arrived: AMD FX-8150 Processor Review, X-bit labs, 2011-10-11, p. 14, retrieved 2012-01-23 
  33. ^ Tom's Hardware review, Tomshardware.com, 2011-10-12, retrieved 2012-01-23 
  34. ^ Our Take on AMD FX, Blogs.amd.com, 2011-10-13, retrieved 2012-01-23 
  35. ^ STEAM Games on AMD FX platforms, support.amd.com, 2012-06-12, retrieved 2012-10-11 
  36. ^ AMD Bulldozer CPU beats world record again achieving 8.461ghz, geek.com, 2011-11-01, retrieved 2012-10-16 
  37. ^ http://blogs.amd.com/work/2012/02/02/your-new-amd-decoder-key/
  38. ^ http://wccftech.com/amd-richland-apu-feature-piledriver-cores-launching-2013-kabini-apu-radeon-hd-8000-series/
  39. ^ Anton Shilov (2012-01-19), AMD Plans to Release Twenty-Core Microprocessor in 2012, X-bit labs, retrieved 2012-01-23 
  40. ^ a b c url=http://ir.amd.com/phoenix.zhtml?c=74093&p=irol-2012analystday
  41. ^ Hosszútávú mobil útiterv szivárgott ki az AMD-től - PROHARDVER! Processzor hír, Prohardver.hu, 2011-09-21, retrieved 2012-01-23 
  42. ^ Nuove roadmap AMD sulle future APU in programma nel 2012 e nel 2013 per il mercato mobile, Xtremehardware.it, 2011-09-21, retrieved 2012-01-23 
  43. ^ a b http://www.xbitlabs.com/news/cpu/display/20130331080217_AMD_We_Are_On_Track_With_Steamroller_Micro_Architecture_in_2013.html
  44. ^ Su, Lisa (2012-02-02). "Consumerization, Cloud, Convergence." (Portable Document Format). AMD 2012 Financial Analyst Day. Sunnyvale, California: Advanced Micro Devices. p. 26. Retrieved 2012-02-04. 
  45. ^ http://www.theinquirer.net/inquirer/news/2208525/amd-sticks-with-socket-am3-for-steamroller
  46. ^ http://www.amd.com/us/press-releases/Pages/amd_unveils_new_apus.aspx
  47. ^ http://www.brightsideofnews.com/news/2013/3/5/amd-kaveri-unveiled-pc-architecture-gets-gddr5.aspx
  48. ^ The Bulldozer Review: AMD FX-8150 Tested, AnandTech, 2011-10-12, retrieved 2012-01-23 

External links [edit]