Jump to content

NetBurst: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Rapid Execution Engine: fixed formatting
Fix non-breakable spaces
Line 19: Line 19:
Within the L1 cache of the CPU, Intel has incorporated what it calls an Execution Trace Cache. This cache stores decoded [[micro-operation]]s, so that when executing a new instruction, instead of fetching and decoding the instruction again, the CPU can directly access the decoded micro-ops from the trace cache, thereby saving a considerable amount of time. Moreover the micro-ops are cached in their predicted path of execution, which means that when instructions are fetched by the CPU from the cache, they are already present in the correct order of execution.
Within the L1 cache of the CPU, Intel has incorporated what it calls an Execution Trace Cache. This cache stores decoded [[micro-operation]]s, so that when executing a new instruction, instead of fetching and decoding the instruction again, the CPU can directly access the decoded micro-ops from the trace cache, thereby saving a considerable amount of time. Moreover the micro-ops are cached in their predicted path of execution, which means that when instructions are fetched by the CPU from the cache, they are already present in the correct order of execution.


Despite all these enhancements, the NetBurst architecture created obstacles for engineers trying to scale up its performance. With this architecture, Intel was looking to attain speeds of 10 GHz, but with rising clock speed, Intel faced increasing problems with keeping power dissipation within acceptable limits. Intel reached limits at a speed of 3.8nbsp;GHz and has encountered problems trying to achieve even that. As a result, Intel decided to abandon NetBurst, and has since developed a newer microarchitecture, known as [[Intel Core (microarchitecture)|Core microarchitecture]] (inspired by the P6 Core of the [[Pentium Pro]] to the ''Tualatin'' [[Pentium III]]-S and most directly the [[Pentium M]]), to help them achieve their goals.
Despite all these enhancements, the NetBurst architecture created obstacles for engineers trying to scale up its performance. With this architecture, Intel was looking to attain speeds of 10 GHz, but with rising clock speed, Intel faced increasing problems with keeping power dissipation within acceptable limits. Intel reached limits at a speed of 3.8 GHz and has encountered problems trying to achieve even that. As a result, Intel decided to abandon NetBurst, and has since developed a newer microarchitecture, known as [[Intel Core (microarchitecture)|Core microarchitecture]] (inspired by the P6 Core of the [[Pentium Pro]] to the ''Tualatin'' [[Pentium III]]-S and most directly the [[Pentium M]]), to help them achieve their goals.


== Revisions ==
== Revisions ==
{{main|Pentium 4}}
{{main|Pentium 4}}
Intel replaced the original Willamette core with a redesigned version of the NetBurst architecture called ''Northwood'' in January 2002. The Northwood design combined an increased cache size, a smaller 130nbsp;nm fabrication process, and [[hyper-threading]] technology (although initially all models but the 3.06nbsp;GHz model had this feature disabled) to produce a more modern, higher-performing version of the NetBurst architecture.
Intel replaced the original Willamette core with a redesigned version of the NetBurst architecture called ''Northwood'' in January 2002. The Northwood design combined an increased cache size, a smaller 130 nm fabrication process, and [[hyper-threading]] technology (although initially all models but the 3.0 GHz model had this feature disabled) to produce a more modern, higher-performing version of the NetBurst architecture.


In February 2004, Intel introduced another, more radical revision of the architecture called ''Prescott''. The ''Prescott'' was produced on a 90 nm process, and included several major design changes, including the addition of an even larger cache (from 512 [[kiB]] in the ''Northwood'' to 1 MiB, and later 2 MiB), a much deeper [[instruction pipeline]] (31 stages as compared to 20 in the ''Northwood''), a heavily improved [[branch predictor]], the introduction of the [[SSE3]] [[SIMD]] instructions, and later, the implementation of Intel 64, Intel's branding for their compatible implementation of the [[x86-64]] 64-bit version of the [[x86]] architecture (as with [[hyper-threading]], all ''Prescott'' chips have hardware to support this feature, but it was initially only enabled on high-end [[Xeon]] processors before being officially introduced in processors with the [[Pentium brand]]). Despite having many new features, the ''Prescott'' often performed worse than a similarly-clocked ''Northwood'', and many engineers felt that the real-world performance of the processor was compromised by attempting to achieve the highest clock speed possible.{{Fact|date=February 2007}} Power consumption and heat dissipation also became a major issue with ''Prescott'', as it is one of the hottest-running and power-hungry x86 microprocessors in history. Power and heat concerns have thus far prevented Intel from releasing a Prescott clocked above 3.8 GHz, or a mobile version of the core.
In February 2004, Intel introduced another, more radical revision of the architecture called ''Prescott''. The ''Prescott'' was produced on a 90 nm process, and included several major design changes, including the addition of an even larger cache (from 512 [[kiB]] in the ''Northwood'' to 1 MiB, and later 2 MiB), a much deeper [[instruction pipeline]] (31 stages as compared to 20 in the ''Northwood''), a heavily improved [[branch predictor]], the introduction of the [[SSE3]] [[SIMD]] instructions, and later, the implementation of Intel 64, Intel's branding for their compatible implementation of the [[x86-64]] 64-bit version of the [[x86]] architecture (as with [[hyper-threading]], all ''Prescott'' chips have hardware to support this feature, but it was initially only enabled on high-end [[Xeon]] processors before being officially introduced in processors with the [[Pentium brand]]). Despite having many new features, the ''Prescott'' often performed worse than a similarly-clocked ''Northwood'', and many engineers felt that the real-world performance of the processor was compromised by attempting to achieve the highest clock speed possible.{{Fact|date=February 2007}} Power consumption and heat dissipation also became a major issue with ''Prescott'', as it is one of the hottest-running and power-hungry x86 microprocessors in history. Power and heat concerns have thus far prevented Intel from releasing a Prescott clocked above 3.8 GHz, or a mobile version of the core.

Revision as of 08:39, 16 June 2009

The Intel NetBurst Microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of CPUs made by Intel. The first CPU to use this architecture was the Willamette core, released in November 2000 and the first of the Pentium 4 CPUs; all subsequent Pentium 4 and Pentium D variants have also been based on NetBurst. In mid 2001, Intel released the Foster core, which was also based on NetBurst, thus switching the Xeon CPUs to the new architecture as well. Pentium 4 based Celeron CPUs also use the NetBurst architecture.

NetBurst is sometimes referred to as the Intel P7, Intel 80786, or i786 architecture when comparing to previous generations.[citation needed] These are not official names; P7 was in fact used internally at Intel for what became the Itanium architecture.

Intel CPU core roadmaps from NetBurst and Pentium M to Sandy Bridge. NetBurst processors are those with the yellow background. Names in red text are cancelled processors.

Technology

The NetBurst architecture includes features such as Hyper Pipelined Technology and Rapid Execution Engine which are firsts in this particular microarchitecture.

Hyper Pipelined Technology

Intel chose this name for the 20 stage pipeline within the Willamette architecture. This is a significant increase in the number of stages when compared to the Pentium 3 which had only 10 stages in its pipeline. The Prescott architecture, the last core of the Pentium 4, has a 31 stage pipeline. Although a deeper pipeline has some disadvantages (primarily due to increased branch misprediction penalty) the greater number of stages in the pipeline allow the CPU to have higher clock speeds which will technically offset any loss in performance. A smaller instructions per clock (IPC) is an indirect consequence of pipeline depth—a matter of design compromise (a small number of long pipelines has a smaller IPC than a greater number of short pipelines). Another drawback of having more stages in a pipeline is an increase in the number of stages that need to be traced back in the event that the branch predictor makes a mistake, increasing the penalty paid for a mis-prediction. To address this issue, Intel devised the Rapid Execution Engine and has invested a great deal into its branch prediction technology, which Intel claims reduces mis-predictions by 33% over Pentium III.[1]

Rapid Execution Engine

With this technology, the ALUs in the core of the CPU actually operate at twice the core clock frequency. This means that in a 3.5 GHz CPU, the ALUs will effectively be operating at 7 GHz. The reason behind this is to generally make up for the low IPC count; additionally this considerably enhances the integer performance of the CPU. The downside is that certain instructions are now much slower (relatively and absolutely) than before, making optimization for multiple target CPUs difficult. An example is shift and rotate operations, which suffer from the lack of a barrel shifter which was present on every x86 CPU beginning with the 386 (and is also present on Athlon and Hammer).

Execution Trace Cache

Within the L1 cache of the CPU, Intel has incorporated what it calls an Execution Trace Cache. This cache stores decoded micro-operations, so that when executing a new instruction, instead of fetching and decoding the instruction again, the CPU can directly access the decoded micro-ops from the trace cache, thereby saving a considerable amount of time. Moreover the micro-ops are cached in their predicted path of execution, which means that when instructions are fetched by the CPU from the cache, they are already present in the correct order of execution.

Despite all these enhancements, the NetBurst architecture created obstacles for engineers trying to scale up its performance. With this architecture, Intel was looking to attain speeds of 10 GHz, but with rising clock speed, Intel faced increasing problems with keeping power dissipation within acceptable limits. Intel reached limits at a speed of 3.8 GHz and has encountered problems trying to achieve even that. As a result, Intel decided to abandon NetBurst, and has since developed a newer microarchitecture, known as Core microarchitecture (inspired by the P6 Core of the Pentium Pro to the Tualatin Pentium III-S and most directly the Pentium M), to help them achieve their goals.

Revisions

Intel replaced the original Willamette core with a redesigned version of the NetBurst architecture called Northwood in January 2002. The Northwood design combined an increased cache size, a smaller 130 nm fabrication process, and hyper-threading technology (although initially all models but the 3.0 GHz model had this feature disabled) to produce a more modern, higher-performing version of the NetBurst architecture.

In February 2004, Intel introduced another, more radical revision of the architecture called Prescott. The Prescott was produced on a 90 nm process, and included several major design changes, including the addition of an even larger cache (from 512 kiB in the Northwood to 1 MiB, and later 2 MiB), a much deeper instruction pipeline (31 stages as compared to 20 in the Northwood), a heavily improved branch predictor, the introduction of the SSE3 SIMD instructions, and later, the implementation of Intel 64, Intel's branding for their compatible implementation of the x86-64 64-bit version of the x86 architecture (as with hyper-threading, all Prescott chips have hardware to support this feature, but it was initially only enabled on high-end Xeon processors before being officially introduced in processors with the Pentium brand). Despite having many new features, the Prescott often performed worse than a similarly-clocked Northwood, and many engineers felt that the real-world performance of the processor was compromised by attempting to achieve the highest clock speed possible.[citation needed] Power consumption and heat dissipation also became a major issue with Prescott, as it is one of the hottest-running and power-hungry x86 microprocessors in history. Power and heat concerns have thus far prevented Intel from releasing a Prescott clocked above 3.8 GHz, or a mobile version of the core.

Intel has also released a dual-core version of the NetBurst architecture called Smithfield, which is actually two Prescott cores in a single die, and later Presler, which consists of two Cedar Mill cores on two separate dies (Cedar Mill being the 65 nm die-shrink of Prescott).

Future

Intel has replaced NetBurst with the Intel Core microarchitecture, released in July 2006, which is more directly derived from 1995's Pentium Pro or 2001's Pentium III-S than it is from NetBurst. August 8, 2008 marked the end of Intel NetBurst based processors.

Presler, a Pentium D core released in early 2006, is widely touted by analysts to be the last in the line of NetBurst, although the actual final NetBurst chip was the Cedar Mill core Celeron D 365 clocked at 3.60 GHz. The Conroe version of the Intel Core 2 processor, using the Core microarchitecture, is the successor to Presler.

NetBurst based chips

See also

References

  1. ^ [1]