Hyper-threading

Hyper-threading (officially Hyper-Threading Technology, and abbreviated HT Technology, HTT or HT) is Intel's term for its simultaneous multithreading implementation in their Atom, Core i3, Core i5, Core i7, Itanium, Pentium 4 and Xeon CPUs.

Hyper-threading is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors. For each processor core that is physically present, the operating system addresses two virtual processors, and shares the workload between them when possible. Hyper-threading requires not only that the operating system support multiple processors, but also that it be specifically optimised for HTT, and Intel recommends disabling HTT when using operating systems that have not been so optimized.^[1]

Details

Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)

This technology is transparent to operating systems and programs. All that is required to take advantage of hyper-threading is symmetric multiprocessing (SMP) support in the operating system, as the logical processors appear as standard separate processors.

It is possible to optimize operating system behavior on multi-processor hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's process scheduler is unaware of hyper-threading it will treat all four processors as being the same. If only two processes are eligible to run it might choose to schedule those processes on the two logical processors that happen to belong to one of the physical processors; that processor would become extremely busy while the other would be idle, leading to poorer performance than is possible with better scheduling. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA systems.

History

Hyper-Threading was first introduced in the Foster MP-based Xeon in 2002. It appeared on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then appeared in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor. Previous generations of Intel's processors based on the Core microarchitecture do not have Hyper-Threading, because the Core microarchitecture is a descendant of the P6 microarchitecture used in iterations of Pentium since the Pentium Pro through the Pentium III and the Celeron (Covington, Mendocino, Coppermine and Tualatin-based) and the Pentium II Xeon and Pentium III Xeon models.

Intel released the Nehalem (Core i7) in November 2008 in which hyper-threading makes a return. The first generation Nehalem contains 4 cores and effectively scales 8 threads. Since then, both 2- and 6-core models have been released, scaling 4 and 12 threads respectively.^[3]

The Intel Atom is an in-order processor with hyper-threading, for low power mobile PCs and low-price desktop PCs.^[4]

The Itanium 9300 launched with eight threads per processor (2 threads per core) through enhanced hyper-threading technology. Poulson, the next-generation Itanium, is scheduled to have additional hyper-threading enhancements.^[5]

The Intel Xeon 5500 server chips also utilize two-way hyper-threading^[6]^[7]

Performance

The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.

According to Intel the first implementation only used 5% more die area than the comparable non-hyperthreaded processor, but the performance was 15–30% better.

Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading Pentium 4. Tomshardware.com states "In Some Cases a P4 running at 3.0 GHz with HT on can even beat a P4 running at 3.6 GHz without HT turned on".^[8]^[9] Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial intelligence algorithms. The performance improvement seen is very application-dependent, however when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper Threading Technology is turned on. This is due to the replay system of the Pentium 4 tying up valuable execution resources, equalizing the processor resources between the two programs which adds a varying amount of execution time. (The Pentium 4 Prescott core gained a replay queue, which reduces execution time needed for the replay system. This is enough to completely overcome that performance hit.)

Drawback History

When the Intel Pentium 4 3.06 GHz HT was released it was difficult for some application programmers to decide whether it was best to use Hyper Threading technology or not for their specific applications^[10], because some programmers at the time were still testing their programs on an Operating System that was not optimized for Hyper Threading technology at the time (e.g. Windows 2000) ^[11]. Furthermore most computers at the time had Single-threaded processors instead of Bi-threaded processors.

In 2006, hyper-threading was criticised for being energy-inefficient. For example, specialist low-power CPU design company ARM has stated SMT can use up to 46% more power than dual core designs. Dual core processors are different from Dual CPU. Furthermore, they claim SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease.^[12]

Security

In May 2005 Colin Percival demonstrated that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys.^[13] Note that while the attack described in the paper was demonstrated on an Intel Pentium 4 processor with HTT, the same techniques could theoretically apply to any system where caches are shared between two or more non-mutually-trusted execution threads; see also side channel attack.

References

External links

Intel's high level overview of Hyper-threading
Hyper-threading on MSDN Magazine
HyperThreading Overview from OSDEV Community
An introductory article from Ars Technica
Hyper-Threading Technology Architecture and Microarchitecture, technical description of Hyper-Threading (1.2 MB PDF-file)
[1] Enter Patent Number 4,847,755
Merom, Conroe, Woodcrest lose HyperThreading

Security

KernelTrap discussion: Hyper-Threading Vulnerability

Performance

ZDnet: Hyperthreading hurts server performance, say developers
ARM is no fan of HyperThreading - Outlines problems of SMT solutions

Replay: Unknown Features of the NetBurst Core [2]

[1] Operating Systems that include optimizations for Hyper-Threading Technology

[Intel2-2] Intel Processor Spec Finder: SL6WK

[3] Intel explains the new Core i7 CPU

[4] ttp://www.intel.com/technology/atom/microarchitecture.htm

[5] ttp://microelectronics.cbronline.com/news/intel_launches_new_itanium_9300_series_processor_100208

[6] ttp://www.intel.com/p/en_US/products/server/processor

[7] ttp://www.intel.com/business/resources/demos/xeon5500/performance/demo.htm

[8] ttp://www.tomshardware.com/reviews/single-cpu-dual-operation,549-25.html

[9] ttp://www.youtube.com/watch?v=Is7frW9Z-rw

[10] ttp://forums.anandtech.com/archive/index.php/t-1273937.html

[11] ttp://www.intel.com/support/processors/sb/CS-017343.htm

[12] www.theinquirer.net

[Percival1-13] Cache Missing for Fun and Profit

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

v t e Parallel computing
General	Distributed computing Parallel computing Massively parallel Cloud computing High-performance computing Multiprocessing Manycore processor GPGPU Computer network Systolic array
Levels	Bit Instruction Thread Task Data Memory Loop Pipeline
Multithreading	Temporal Simultaneous (SMT) Simultaneous and heterogenous Speculative (SpMT) Preemptive Cooperative Clustered multi-thread (CMT) Hardware scout
Theory	PRAM model PEM model Analysis of parallel algorithms Amdahl's law Gustafson's law Cost efficiency Karp–Flatt metric Slowdown Speedup
Elements	Process Thread Fiber Instruction window Array
Coordination	Multiprocessing Memory coherence Cache coherence Cache invalidation Barrier Synchronization Application checkpointing
Programming	Stream processing Dataflow programming Models Implicit parallelism Explicit parallelism Concurrency Non-blocking algorithm
Hardware	Flynn's taxonomy SISD SIMD Array processing (SIMT) Pipelined processing Associative processing MISD MIMD Dataflow architecture Pipelined processor Superscalar processor Vector processor Multiprocessor symmetric asymmetric Memory shared distributed distributed shared UMA NUMA COMA Massively parallel computer Computer cluster Beowulf cluster Grid computer Hardware acceleration
APIs	Ateji PX Boost Chapel HPX Charm++ Cilk Coarray Fortran CUDA Dryad C++ AMP Global Arrays GPUOpen MPI OpenMP OpenCL OpenHMPP OpenACC Parallel Extensions PVM pthreads RaftLib ROCm UPC TBB ZPL
Problems	Automatic parallelization Deadlock Deterministic algorithm Embarrassingly parallel Parallel slowdown Race condition Software lockout Scalability Starvation
Category: Parallel computing