Test and Test-and-Set Lock

In computer science, test and test-and-set lock (TTSL) implementation is a synchronization primitive supported by the hardware for multiprocessor environments. It is an improvement over the Test and Set lock that uses the atomic test-and-set instruction. TTSL attempts to reduce the traffic incurred in test-and-set by attempting to acquire the lock only when there is a good chance of succeeding.

Lock Mechanism

The TTSL lock mechanism first checks with a load instruction to check if the lock is available. In contrast to Test-and-Set, while a lock is being held by another processor, the lock spins until the lock is free instead of attempting the atomic test-and-set instruction directly. When the lock becomes available i.e. lock is false, then the thread tries to execute the test-and-set instruction. The lock could be acquired by another processor meanwhile before the attempt at test-and-set and in that case, the code jumps back into the spin mode again until the lock is available.

boolean locked = false; // shared lock variable
procedure lock() {
  do {
    while (locked == true) {}; // spin until lock seems free
  } while TestAndSet(locked) // actual atomic locking
}

Exit protocol is:

 procedure unlock() {
  locked = false;
}

Thus the expensive atomic memory operations happen less often than in simple spin around test-and-set and result in less failed lock acquisitions.

TTSL vs. Test and Set

Advantages

Bus transactions are reduced in TTSL as when the lock is in acquired state (i.e. lock is true), any attempt at acquiring the lock does not happen, thus avoiding invalidation of all other cache blocks in the process as it happens in test-and-set lock. As the number of threads/processors trying to acquire the lock increases the performance enhancement becomes more profound. Since the bus bandwidth is reduced, the resource contention^[1] at any other critical section also is avoided. Since the lock uses only one variable, the storage costs are same as compared to the test-and-set lock.

Disadvantages

TTSL always checks lock availability before trying to lock it. This causes higher "uncontended" latency, because this extra step isn't needed when the lock is free (i.e. uncontended).

Another drawback is specific to processors requiring separate bus lines for atomic instructions.^[2] Such instructions are very specific to these kinds of processors and only works for bus based systems. Also, if the locks being used are too fine grained and use a lot of lock variables, the single bus line system for atomicity causes serialization of the multiple lock requests, thus reducing the performance.

Assembly language implementation of critical sections

 lockit:
  LD R2,0(R1)               //load of lock
  BNZ R2,lockit             //not available-spin
  DADDUI R2,R0,#1           //load locked value
  EXCH R2,0(R1)             //swap
  BNZ R2,lockit             //branch if lock wasn’t 0

unlockit:
  ST lockit,0               //set lockit as 0
  ret

A simple locking mechanism^[3] is used, where value 0 is used to indicate that lock is free, and 1 is used to indicate that lock is unavailable. In order to set a lock a thread tries to read the lock variable, if the value of read indicates that lock is 0, then the thread successfully acquires a lock of the critical section by storing a value of 1 and invalidates all other caches in other threads. The other threads keep on reading the cache block and execute a test-and-set instruction only when the lock is released. When the thread which had acquired a lock wants to exit the critical section, it stores a value of 0 in R2, the threads which are spinning across the variable then tries to execute a test-and-set instruction and acquires a lock.

References

Category:Concurrency control Category:Computer arithmetic

^ Andrews, Gregory R. (2000). Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley. pp. 100–101. ISBN 0-201-35752-6.
^ Solihin, Yan. Fundamentals of Parallel Multicore Architecture. Chapman and Hall/CRC. p. 256. ISBN 9781482211184.
^ "Hardware and Software Synchronisation" (PDF). COMP 140 Advanced Computer Architecture. 26 June 2014. Retrieved 16 November 2016.

[1] Andrews, Gregory R. (2000). Foundations of Multithreaded, Parallel, and Distributed Programming. Addison-Wesley. pp. 100–101. ISBN 0-201-35752-6.

[2] Solihin, Yan. Fundamentals of Parallel Multicore Architecture. Chapman and Hall/CRC. p. 256. ISBN 9781482211184.

[3] "Hardware and Software Synchronisation" (PDF). COMP 140 Advanced Computer Architecture. 26 June 2014. Retrieved 16 November 2016.

[1]

[2]

[3]