Jump to content

Transactional Synchronization Extensions: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
with unlocked multiplier ('''K''' versions: i7-4770K, i5-4670K, etc.)
→‎Implementation: combined for future
(One intermediate revision by the same user not shown)
Line 60: Line 60:
== Performance ==
== Performance ==
According to benchmarks, TSX can provide around 40% faster applications execution in specific workloads, and 4&ndash;5 times more database [[transactions per second]] (TPS).<ref>http://sc13.supercomputing.org/schedule/event_detail.php?evid=pap260</ref><ref>http://www.sisoftware.co.uk/?d=qa&f=ben_mem_hle</ref><ref>http://pcl.intel-research.net/publications/SC13-TSX.pdf</ref>
According to benchmarks, TSX can provide around 40% faster applications execution in specific workloads, and 4&ndash;5 times more database [[transactions per second]] (TPS).<ref>http://sc13.supercomputing.org/schedule/event_detail.php?evid=pap260</ref><ref>http://www.sisoftware.co.uk/?d=qa&f=ben_mem_hle</ref><ref>http://pcl.intel-research.net/publications/SC13-TSX.pdf</ref>

== Implementation ==
Main idea in TM implementation is to track load and store instruction, executed inside the transaction, and to detect intersections (conflicts) between read and write sets. There are two kinds of conflicts: RS conflict (reading of data changed by other thread) and WS conflict (trying to write data, used or changed by other transaction). Both conflicts need to abort transaction and rollback all changes generated by it.<ref name="kanter-2012-08-21-p1">{{cite web|url=http://www.realworldtech.com/haswell-tm-alt/|title=Haswell Transactional Memory Alternatives|date=August 21, 2012|accessdate=12 November 2013|last=Kanter|first=David|publisher=Real World Technologies }}</ref>

According to David Kanter (RWT), TSX may be implemented in L1D or in L1D and L2 caches, because tracking of write-set and read-set has cache-line granularity (64 Bytes).<ref>{{cite web|url=http://www.realworldtech.com/haswell-tm/3/|title=Analysis of Haswell’s Transactional Memory|last=Kanter|first=David |date=February 15, 2012|publisher=Real World Technologies|accessdate=12 November 2013}}</ref> This variant allow Intel support large transactions, because L1D has 512 cache lines and L2 has 4K lines (some capacity is lost due to 8-way associativity).<ref name="kanter-2012-08-21-p1"/>

Other variant of implementing TSX is to extend functions of OOO pipeline modules: Memory Ordering Buffer (MOB) and Re-Order Buffer (ROB). In this variant transactions may be handled directly in out-of-order hardware with small and simple modifications of pipeline and Store Buffer, but only small transactions can be supported (no more than several hundreds).<ref name="kanter-2012-08-21-p2">{{cite web|url=http://www.realworldtech.com/haswell-tm-alt/2/|title=Haswell Transactional Memory Alternatives|date=August 21, 2012|accessdate=12 November 2013|last=Kanter|first=David|publisher=Real World Technologies }}</ref>

Kanter concludes that cache-based approach is more probable for Haswell and future microarchitectures (Skylake or later) could use combined cache and MOB based approach.<ref name="kanter-2012-08-21-p3">{{cite web|url=http://www.realworldtech.com/haswell-tm-alt/3/|title=Haswell Transactional Memory Alternatives|date=August 21, 2012|accessdate=12 November 2013|last=Kanter|first=David|publisher=Real World Technologies }}: "Overall, Haswell is more likely to use the cache-based TM system."</ref>


==See also==
==See also==

Revision as of 02:27, 12 November 2013

Transactional Synchronization Extensions (TSX) is an extension to the x86 instruction set architecture that adds hardware transactional memory support, speeding up execution of multi-threaded software through lock elision.

TSX was documented by Intel in February 2012, and debuted in June 2013 in Intel's microprocessors based on the Haswell microarchitecture.[1][2][3] The flagship Core chips with unlocked multiplier (K versions: i7-4770K, i5-4670K, etc.), and several other Haswell CPUs, do not support TSX.[4]

Features

TSX provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision (HLE) is an instruction prefix-based interface designed to be backward compatible with processors without TSX support. Restricted Transactional Memory (RTM) is a new instruction set interface that provides greater flexibility for programmers.[5]

TSX enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses, while aborting and rolling back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.[5]

In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow (fallback) path is still a normal lock.

Hardware Lock Elision

Hardware Lock Elision (HLE) adds two new instruction prefixes, XACQUIRE and XRELEASE. These two prefixes reuse the opcodes of the existing REPNE / REPE prefixes (F2H / F3H). On processors that do not support TSX, REPNE / REPE prefixes are ignored on instructions for which the XACQUIRE / XRELEASE are valid, thus enabling backward compatibility.[6]

The XACQUIRE prefix hint can only be used with the following instructions with an explicit LOCK prefix: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. The XCHG instruction can be used without the LOCK prefix as well.

The XRELEASE prefix hint can only be used with the following instructions with an explicit LOCK prefix: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. The XCHG instruction can be used without the LOCK prefix as well. The  MOV mem, reg  and  MOV mem, imm  instructions can be used as well.

HLE allows optimistic execution of a critical section by eliding the write to a lock, so that the lock appears to be free to other threads. A failed transaction results in execution restarting from the XACQUIRE-prefixed instruction, but treating the instruction as if the XACQUIRE prefix were not present.

Restricted Transactional Memory

Restricted Transactional Memory (RTM) is an alternative implementation to HLE which gives the programmer the flexibility to specify a fallback code path that is executed when a transaction cannot be successfully executed.

RTM adds three new instructions: XBEGIN, XEND and XABORT. The XBEGIN and XEND instructions mark the start and the end of a transactional code region; the XABORT instruction explicitly aborts a transaction. Transaction failure redirects the processor to the fallback code path specified by the XBEGIN instruction, with the abort status returned in the EAX register.

EAX Register
Bit Position
Meaning
0 Set if abort caused by XABORT instruction.
1 If set, the transaction may succeed on a retry. This bit is always clear if bit 0 is set.
2 Set if another logical processor conflicted with a memory address that was part of the transaction that aborted.
3 Set if an internal buffer overflowed.
4 Set if debug breakpoint was hit.
5 Set if an abort occurred during execution of a nested transaction.
23:6 Reserved.
31:24 XABORT argument (only valid if bit 0 set, otherwise reserved).

XTEST instruction

TSX provides a new XTEST instruction that returns whether the processor is executing a transactional region.

Performance

According to benchmarks, TSX can provide around 40% faster applications execution in specific workloads, and 4–5 times more database transactions per second (TPS).[7][8][9]

Implementation

Main idea in TM implementation is to track load and store instruction, executed inside the transaction, and to detect intersections (conflicts) between read and write sets. There are two kinds of conflicts: RS conflict (reading of data changed by other thread) and WS conflict (trying to write data, used or changed by other transaction). Both conflicts need to abort transaction and rollback all changes generated by it.[10]

According to David Kanter (RWT), TSX may be implemented in L1D or in L1D and L2 caches, because tracking of write-set and read-set has cache-line granularity (64 Bytes).[11] This variant allow Intel support large transactions, because L1D has 512 cache lines and L2 has 4K lines (some capacity is lost due to 8-way associativity).[10]

Other variant of implementing TSX is to extend functions of OOO pipeline modules: Memory Ordering Buffer (MOB) and Re-Order Buffer (ROB). In this variant transactions may be handled directly in out-of-order hardware with small and simple modifications of pipeline and Store Buffer, but only small transactions can be supported (no more than several hundreds).[12]

Kanter concludes that cache-based approach is more probable for Haswell and future microarchitectures (Skylake or later) could use combined cache and MOB based approach.[13]

See also

References

  1. ^ "Transactional Synchronization in Haswell". Software.intel.com. Retrieved 2012-02-07.
  2. ^ "Transactional memory going mainstream with Intel Haswell". Ars Technica. 2012-02-08. Retrieved 2012-02-09.
  3. ^ "The Core i7-4770K Review". Tom's Hardware. 2013-06-01. Retrieved 2012-06-03.
  4. ^ "Intel® Core™ i7-4770K Processor". Intel. 2013-06-01. Retrieved 2012-06-05.
  5. ^ a b Johan De Gelas (2012-09-20). "Making Sense of the Intel Haswell Transactional Synchronization eXtensions". AnandTech. Retrieved 2013-10-20.
  6. ^ "Hardware Lock Elision Overview". intel.com. Retrieved 2013-10-27.
  7. ^ http://sc13.supercomputing.org/schedule/event_detail.php?evid=pap260
  8. ^ http://www.sisoftware.co.uk/?d=qa&f=ben_mem_hle
  9. ^ http://pcl.intel-research.net/publications/SC13-TSX.pdf
  10. ^ a b Kanter, David (August 21, 2012). "Haswell Transactional Memory Alternatives". Real World Technologies. Retrieved 12 November 2013.
  11. ^ Kanter, David (February 15, 2012). "Analysis of Haswell's Transactional Memory". Real World Technologies. Retrieved 12 November 2013.
  12. ^ Kanter, David (August 21, 2012). "Haswell Transactional Memory Alternatives". Real World Technologies. Retrieved 12 November 2013.
  13. ^ Kanter, David (August 21, 2012). "Haswell Transactional Memory Alternatives". Real World Technologies. Retrieved 12 November 2013.: "Overall, Haswell is more likely to use the cache-based TM system."

External links