Hyper-threading (officially Hyper-Threading Technology or HT Technology, abbreviated HTT or HT) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multiple tasks at once) performed on x86 microprocessors. It first appeared in February 2002 on Xeon server processors and in November 2002 on Pentium 4 desktop CPUs. Later, Intel included this technology in Itanium, Atom, and Core 'i' Series CPUs, among others.
For each processor core that is physically present, the operating system addresses two virtual or logical cores, and shares the workload between them when possible. The main function of hyper-threading is to decrease the number of dependent instructions on the pipeline. It takes advantage of superscalar architecture (multiple instructions operating on separate data in parallel). They appear to the OS as two processors, thus the OS can schedule two processes at once. In addition two or more processes can use the same resources. If resources for one process are not available, then another process can continue if its resources are available.
Hyper-threading requires not only that the operating system supports SMT, but also that it be specifically optimized for HTT, and Intel recommends disabling HTT when using operating systems that have not been optimized for this chip feature.
Hyper-Threading Technology is a form of simultaneous multithreading technology introduced by Intel. Architecturally, a processor with Hyper-Threading Technology consists of two logical processors per core, each of which has its own processor architectural state. Each logical processor can be individually halted, interrupted or directed to execute a specified thread, independently from the other logical processor sharing the same physical core.
Unlike a traditional dual-core processor configuration that uses two separate physical processors, the logical processors in a Hyper-Threaded core share the execution resources. These resources include the execution engine, the caches, the system-bus interface and the firmware. These shared resources allow the two logical processors to work with each other more efficiently, and lets one borrow resources from the other when one is stalled. A processor stalls when it is waiting for data it has sent for so it can finish processing the present thread. The degree of benefit seen when using a hyper-threaded or multi core processor depends on the needs of the software, and how well it and the operating system are written to manage the processor efficiently.
Hyper-threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a hyper-threading processor to appear as the usual "physical" processor and an extra "logical" processor to the host operating system (HTT-unaware operating systems see two "physical" processors), allowing the operating system to schedule two threads or processes simultaneously and appropriately. When execution resources would not be used by the current task in a processor without hyper-threading, and especially when the processor is stalled, a hyper-threading equipped processor can use those execution resources to execute another scheduled task. (The processor may stall due to a cache miss, branch misprediction, or data dependency.)
This technology is transparent to operating systems and programs. The minimum that is required to take advantage of hyper-threading is symmetric multiprocessing (SMP) support in the operating system, as the logical processors appear as standard separate processors.
It is possible to optimize operating system behavior on multi-processor hyper-threading capable systems. For example, consider an SMP system with two physical processors that are both hyper-threaded (for a total of four logical processors). If the operating system's thread scheduler is unaware of hyper-threading, it will treat all four logical processors the same. If only two threads are eligible to run, it might choose to schedule those threads on the two logical processors that happen to belong to the same physical processor; that processor would become extremely busy while the other would idle, leading to poorer performance than is possible by scheduling the threads onto different physical processors. This problem can be avoided by improving the scheduler to treat logical processors differently from physical processors; in a sense, this is a limited form of the scheduler changes that are required for NUMA systems.
Denelcor, Inc. introduced multi-threading with the HEP (Heterogeneous Element Processor) in 1982. The HEP pipeline could not hold multiple instructions that were independent because they belonged to different processes. Only one instruction from a given process was allowed to be present in the pipeline at any point in time. Should an instruction from a given process block in the pipe, instructions from the other processes would continue after the pipeline drained.
Intel implemented hyper-threading on an x86 architecture processor in 2002 with the Foster MP-based Xeon. It was also included on the 3.06 GHz Northwood-based Pentium 4 in the same year, and then remained as a feature in every Pentium 4 HT, Pentium 4 Extreme Edition and Pentium Extreme Edition processor since. Previous generations of Intel's processors based on the Core microarchitecture do not have Hyper-Threading, because the Core microarchitecture is a descendant of the P6 microarchitecture used in iterations of Pentium since the Pentium Pro through the Pentium III and the Celeron (Covington, Mendocino, Coppermine and Tualatin-based) and the Pentium II Xeon and Pentium III Xeon models.
Intel released the Nehalem (Core i7) in November 2008 in which hyper-threading made a return. The first generation Nehalem contained four cores and effectively scaled eight threads. Since then, both two- and six-core models have been released, scaling four and twelve threads respectively.
The Itanium 9300 launched with eight threads per processor (two threads per core) through enhanced hyper-threading technology. Poulson, the next-generation Itanium, is scheduled to have additional hyper-threading enhancements.
|This section is outdated. (June 2013)|
The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.
Intel claims up to a 30% performance improvement compared with an otherwise identical, non-simultaneous multithreading Pentium 4. Tom's Hardware states "In some cases a P4 running at 3.0 GHz with HT on can even beat a P4 running at 3.6 GHz with HT turned off." Intel also claims significant performance improvements with a hyper-threading-enabled Pentium 4 processor in some artificial intelligence algorithms.
Overall the performance history of hyper-threading was a mixed one in the beginning. As one commentary on high performance computing from November 2002 notes:
Hyper-Threading can improve the performance of some MPI applications, but not all. Depending on the cluster configuration and, most importantly, the nature of the application running on the cluster, performance gains can vary or even be negative. The next step is to use performance tools to understand what areas contribute to performance gains and what areas contribute to performance degradation.
As noted above, performance improvement seen is very application-dependent; however, when running two programs that require full attention of the processor it can actually seem like one or both of the programs slows down slightly when Hyper-Threading Technology is turned on. This is due to the replay system of the Pentium 4 tying up valuable execution resources, equalizing the processor resources between the two programs which adds a varying amount of execution time. The Pentium 4 "Prescott" and the Xeon "Nocona" processors received a replay queue, which reduces execution time needed for the replay system. This is enough to completely overcome that performance hit.
When the first HT processors were released, many operating systems were not optimized for hyper-threading technology (e.g. Windows 2000 and Linux older than 2.4).
In 2006, hyper-threading was criticised for energy inefficiency. For example, specialist low-power CPU design company ARM stated simultaneous multithreading (SMT) can use up to 46% more power than ordinary dual-core designs. Furthermore, they claimed SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease. Intel disputed this claim, stating that hyper-threading is highly efficient because it uses resources that would otherwise be idle.
In May 2005 Colin Percival demonstrated that on the Pentium 4, a malicious thread can use a timing attack to monitor the memory access patterns of another thread with which it shares a cache, allowing the theft of cryptographic information. Potential solutions to this include the processor changing its cache eviction strategy, or the operating system preventing the simultaneous execution, on the same physical core, of threads with different privileges.
- "Intel Pentium 4 3.06GHz CPU with Hyper-Threading Technology: Killing Two Birds with a Stone...". X-bit labs. Retrieved 2014-06-04.
- Intel Required Components Interchangeability List for the Intel Pentium 4 Processor with HT Technology, includes list of Operating Systems that include optimizations for Hyper-Threading Technology; they are Windows XP Professional 64, Windows XP MCE, Windows XP Home, Windows XP Professional, some versions of Linux such as COSIX Linux 4.0, RedHat Linux 9 (Professional and Personal versions), RedFlag Linux Desktop 4.0 and SuSe Linux 8.2 (Professional and Personal versions)
- Intel Processor Spec Finder: SL6WK
- Michael E. Thomadakis (2011-03-17). "The Architecture of the Nehalem Processor and Nehalem-EP SMP Platforms" (PDF). Texas A&M University. p. 23. Retrieved 2014-03-21.
- Intel explains the new Core i7 CPU
- "Intel® Atom™ Processor Microarchitecture". Intel.com. 2011-03-18. Retrieved 2011-04-05.
- "Server Processor Index Page". Intel.com. 2011-03-18. Retrieved 2011-04-05.
- "Intel® Xeon® Processor 5500 Series". Intel.com. Retrieved 2011-04-05.
- How to Determine the Effectiveness of Hyper-Threading Technology with an Application
- "Summary: In Some Cases The P4 3.0HT Can Even Beat The 3.6 GHz Version : Single CPU in Dual Operation: P4 3.06 GHz with Hyper-Threading Technology". Tomshardware.com. 2002-11-14. Retrieved 2011-04-05.
- Tau Leng; Rizwan Ali; Jenwei Hsieh; Christopher Stanton (November 2002). "A Study of Hyper-Threading in High-Performance Computing Clusters". Dell. p. 4. Retrieved 12 November 2012.
- CPU performance evaluation Pentium 4 2.8 and 3.0
- "Replay: Unknown Features of the NetBurst Core. Page 15". Replay: Unknown Features of the NetBurst Core. http://www.xbitlabs.com. Retrieved 24 April 2011.
- "Xeon". Retrieved 13 April 2013.
- "Hyper-Threading Technology — Operating systems that include optimizations for Hyper-Threading Technology". Intel.com. 2011-09-19. Retrieved 2012-02-29.
- Sustainable Practices: Concepts, Methodologies, Tools and Applications. Information Resources Management Association. p. 666. ISBN 9781466648524.
- "ARM is no fan of HyperThreading". theinquirer.net. 2006-08-02. Retrieved 2012-02-29.
- Tom Jermoluk, SGI (1990s) (2010-10-13). "About MIPS and MIPS | TOP500 Supercomputing Sites". Top500.org. Retrieved 2011-04-05.
- "ARM’s 64bit shift provides clean-up opportunity."
- Rik Myslewski (8 May 2013). "Deep inside Intel's first viable mobile processor: Silvermont". The Register. Retrieved 13 January 2014.
- Cache Missing for Fun and Profit
- Intel – high level overview of Hyper-threading
- Hyper-threading on MSDN Magazine
- HyperThreading Overview from OSDEV Community (Wayback Machine)
- introductory article from Ars Technica
- Hyper-Threading Technology Architecture and Microarchitecture, technical description of Hyper-Threading (1.2 MB PDF-file; copy; published in Intel Technology Journal Q1, 2002)
- US Patent Number 4,847,755
- Merom, Conroe, Woodcrest lose HyperThreading
- ZDnet: Hyperthreading hurts server performance, say developers
- ARM is no fan of HyperThreading - Outlines problems of SMT solutions
- Replay: Unknown Features of the NetBurst Core