Ticket lock

In computer science, a ticket lock is a synchronization mechanism, or locking algorithm, that is a type of spinlock that uses "tickets" to control which thread of execution is allowed to enter a critical section.

Overview

The basic concept of a ticket lock is similar to the ticket queue management system. This is the method that many bakeries and delis use to serve customers in the order that they arrive, without making them stand in a line. Generally, there is some type of dispenser from which customers pull sequentially numbered tickets upon arrival. The dispenser usually has a sign above or near it stating something like "Please take a number". There is also typically a dynamic sign, usually digital, that displays the ticket number that is now being served. Each time the next ticket number (customer) is ready to be served, the "Now Serving" sign is incremented and the number called out. This allows all of the waiting customers to know how many people are still ahead of them in the queue or line.

Like this system, a ticket lock is a first in first out (FIFO) queue-based mechanism. It adds the benefit of fairness of lock acquisition and works as follows; there are two integer values which begin at 0. The first value is the queue ticket, the second is the dequeue ticket. The queue ticket is the thread's position in the queue, and the dequeue ticket is the ticket, or queue position, that now has the lock (Now Serving).

When a thread arrives, it atomically obtains and then increments the queue ticket. The atomicity of this operation is required to prevent two threads from simultaneously being able to obtain the same ticket number. It then compares its ticket value, before the increment, with the dequeue ticket's value. If they are the same, the thread is permitted to enter the critical section. If they are not the same, then another thread must already be in the critical section and this thread must busy-wait or yield. When a thread leaves the critical section controlled by the lock, it atomically increments the dequeue ticket. This permits the next waiting thread, the one with the next sequential ticket number, to enter the critical section.^[1]

Fairness of lock acquisition

The notion of fairness in lock acquisition applies to the order in which threads acquire a lock successfully.^[2] If some type of fairness is implemented, it prevents a thread from being starved out of execution for a long time due to inability to acquire a lock in favor of other threads. With no fairness guarantees, a situation can arise where a thread (or multiple threads) can take a disproportionately long time to execute as compared to others. A simple example will now be presented to show how a thread could be excessively delayed due to a lack of fairness in lock acquisition.

Assume a case where three threads, each executing on one of three processors, are executing the following pseudocode that uses a lock with no consideration for fairness.

while (1) {
    lock {
        …
        critical section
        …
    }
    unlock
}

Now further assume the physical arrangement of the three processors, P1, P2, and P3, results in a non-uniform memory access time to the location of the shared lock variable. The order of increasing access time to the lock variable for the three processors is P1 < P2 < P3. So P1 is always the most advantaged at acquiring the lock, followed by P2, with P3 being most disadvantaged. How this situation leads to thread starvation in the absence of a fairness guarantee is shown in the following illustration of the execution of the above pseudocode by these three processors.

Starvation of P3
Time	P1	P2	P3
1	lock attempt (success)	lock attempt (failed)	lock attempt (failed)
2	critical section	spin	spin
3	release lock	lock attempt (success)	lock attempt (failed)
4	...	critical section	spin
5	lock attempt (failed)	...	...
6	lock attempt (success)	release lock	lock attempt (failed)
7	critical section	spin	spin
...	...	...	...

Initially, the lock is free and all three processors attempt to acquire the lock simultaneously (Time 1). Due to P1 having the fastest access time to the lock, it acquires it first and enters the critical section. P2 and P3 now spin while P1 is in the critical section (Time 2). Upon exiting the critical section (Time 3), P1 executes an unlock, releasing the lock. Since P2 has faster access to the lock than P3, it acquires the lock next and enters the critical section (Time 4). While P2 is in the critical section, P1 once again attempts to acquire the lock but can’t (Time 5), forcing it to spin wait along with P3. Once P2 finishes the critical section and issues an unlock, both P1 and P3 simultaneously attempt to acquire it once again (Time 6). But P1, with its faster access time wins again, thus entering the critical section (Time 7). This pattern of P3 being unable to obtain the lock will continue indefinitely until either P1 or P2 stops attempting to acquire it.

This illustrates the need to ensure some level of fairness in lock acquisition in certain circumstances. Not all locks have mechanisms that ensure any level of fairness, leaving the potential for situations similar to that illustrated above. See the Comparison of locks section below for examples of locks that don't implement any fairness guarantees.

Implementation of ticket lock

In a Non-Uniform Memory Architecture (NUMA) system it is important to have a lock implementation that guarantees some level of fairness of lock acquisition. The ticket lock is an implementation of spinlock that adds this desired attribute. The following pseudocode^[1]^[3] shows the operations for initializing the lock, acquiring the lock, and releasing the lock. A call to ticketLock_acquire would precede the critical section of the code and ticketLock_release would follow it. Each processor will keep track of its turn via the value of each processor's my_ticket.

Yan Solihin's pseudocode example is listed in the diagram below.^[1]

ticketLock_init(int *next_ticket, int *now_serving)
{
    *now_serving = *next_ticket = 0;
}

ticketLock_acquire(int *next_ticket, int *now_serving)
{
    my_ticket = fetch_and_inc(next_ticket);
    while (*now_serving != my_ticket) {}
}

ticketLock_release(int *now_serving)
{
    ++*now_serving;
}

Following along with the pseudocode above we can see that each time a processor tries to acquire a lock with ticketLock_acquire(), fetch_and_inc is called, returning the current value of next_ticket into the thread private my_ticket and incrementing the shared next_ticket. It is important to note that the fetch and increment is done atomically, thereby not allowing any other concurrent attempts at access. Once my_ticket has been received, each thread will spin in the while loop while now_serving isn't equal to its my_ticket. Once now_serving becomes equal to a given thread's my_ticket they are allowed to return from ticketLock_acquire() and enter the critical section of code. After the critical section of the code, the thread performs ticketLock_release() which increments now_serving. This allows the thread with the next sequential my_ticket to exit from ticketLock_acquire() and enter the critical section.^[3] Since the my_ticket values are acquired in the order of thread arrival at the lock, subsequent acquisition of the lock is guaranteed to also be in this same order. Thus, fairness of lock acquisition is ensured, enforcing a FIFO ordering.

The following table shows an example of ticket lock in action in a system with four processors (P1, P2, P3, P4) competing for access to the critical section. Following along with the "Action" column, the outcome based on the above pseudocode can be observed. For each row, the variable values shown are those after the indicated action(s) have completed. The key point to note from the example is that the initial attempts by all four processors to acquire the lock results in only the first to arrive actually getting the lock. All subsequent attempts, while the first still holds the lock, serves to form the queue of processors waiting their turn in the critical section. This is followed by each getting the lock in turn, allowing the next in line to acquire it as the previous holder leaves. Also note that another processor can arrive at any time during the sequence of lock acquire/releases by other processors, and simply waits its turn.

**Four Processor Ticket Lock Example**
Row	Action	next_ticket	now_serving	P1 my_ticket	P2 my_ticket	P3 my_ticket	P4 my_ticket
1	Initialized to 0	0	0	-	-	-	-
2	P1 tries to acquire lock (succeed)	1	0	0	-	-	-
3	P3 tries to acquire lock (fail + wait)	2	0	0	-	1	-
4	P2 tries to acquire lock (fail + wait)	3	0	0	2	1	-
5	P1 releases lock, P3 acquires lock	3	1	0	2	1	-
6	P3 releases lock, P2 acquires lock	3	2	0	2	1	-
7	P4 tries to acquire lock (fail + wait)	4	2	0	2	1	3
8	P2 releases lock, P4 acquires lock	4	3	0	2	1	3
9	P4 releases lock	4	4	0	2	1	3
10	...	4	4	0	2	1	3

The first step, prior to use of the lock, is initialization of all lock variables (Row 1). Having next_ticket and now_serving initialized to 0 ensures that the first thread that attempts to get the lock will get ticket 0, thus acquiring the lock due to its ticket matching now_serving. So when P1 tries to acquire the lock it immediately succeeds and next_ticket is incremented to 1 (Row 2). When P3 tries to acquire the lock it gets 1 for its my_ticket, next ticket is incremented to 2, and it must wait since now_serving is still 0 (Row 3). Next, when P2 attempts to acquire the lock it gets 2 for its my_ticket, next_ticket is incremented to 3, and it must also wait due to now_serving still being 0 (Row 4). P1 now releases the lock by incrementing now_serving to 1, thus allowing P3 to acquire it due its my_ticket value of 1 (Row 5). Now P3 releases the lock, incrementing now_serving to 2, allowing P2 to acquire it (Row 6). While P2 has the lock, P4 attempts to acquire it, gets a my_ticket value of 3, increments next_ticket to 4, and must wait since now_serving is still 2 (Row 7). When P2 releases the lock, it increments now_serving to 3, allowing P4 to get it (Row 8). Finally, P4 releases the lock, incrementing now_serving to 4 (Row 9). No currently waiting threads have this ticket number, so the next thread to arrive will get 4 for its ticket and immediately acquire the lock.

Comparison of locks

The Linux kernel implementation can have lower latency than the simpler test-and-set or exchange based spinlock algorithms on modern machines. Consider the table below when comparing various types of spin based locks. The more basic locking mechanisms have lower uncontended latency than the advanced locking mechanisms.^[1]

Comparing Performance of Different Locking Mechanisms^[1]
Criteria	test-and-set	Test and Test-and-set	Load-link/store-conditional	Ticket	ABQL
Uncontended latency	Lowest	Lower	Lower	Higher	Higher
1 Release max traffic	Ө(p)	Ө(p)	Ө(p)	Ө(p)	Ө(1)
Wait traffic	High	-	-	-	-
Storage	Ө(1)	Ө(1)	Ө(1)	Ө(1)	Ө(p)
Fairness guarantee	No	No	No	Yes	Yes

Advantages

One advantage of a ticket lock over other spinlock algorithms is that it is fair. The waiting threads are processed in a first-in first-out basis as the dequeue ticket integer increases, thus preventing starvation.
A ticket lock algorithm also prevents the thundering herd problem occurring since only one thread at a time tries to enter the critical section.
Storage is not necessarily a problem as all threads spin on one variable, unlike array-based queueing locks (ABQL) who have threads spin on individual elements of an array.^[1]

Disadvantages

One disadvantage is that there is a higher uncontended latency due to the extra instructions required to read and test the value that all threads are spinning on before a lock is released.^[1]
Another major disadvantage of a ticket lock is that it is non-scalable. Research indicated that as processors are added to a Ticket Lock system, performance appears to decay exponentially.^[4]
Another problem comes from releasing a lock. All threads are spinning on one variable, so when the lock is released there are Ө(p) invalidations (as well as Ө(p) acquisitions). This is because all threads must reload their block into the cache and perform a test to determine their admittance to the critical section.^[1]

History

The ticket lock was introduced by Mellor-Crummey and Scott in 1991.^[3] This algorithm was introduced into the Linux kernel in 2008 due to its advantages,^[5] but was omitted in paravirtualized environments where it had disadvantages.^[6] As of July 2010^[update], work is in progress to enable the use of ticket locks in paravirtualization.^[7] As of March 2015 this type of locking scheme has been reemployed by Red Hat Enterprise Linux in their system.^[8]

Related work

Lamport's bakery algorithm uses a similar concept of a "ticket" or "counter" but does not make the use of atomic hardware operations. It was designed for fault tolerance rather than performance. Rather than all processors continuously examining the release counter, the bakery lock spins on examining the tickets of its peers.^[3]
Queue-based spin locks guarantee fairness by maintaining a queue of waiters and by granting the lock to the first waiter during an unlock operation.^[9]

References

^ ^a ^b ^c ^d ^e ^f ^g ^h Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. Solihin Pub. pp. 262–269. ISBN 978-0-9841630-0-7.
^ Sottile, Matthew J.; Mattson, Timothy G.; Rasmussen, Craig E (Sep 28, 2009). Introduction to Concurrency in Programming Languages. Boca Raton, FL, USA: CRC Press. p. 56. ISBN 978-1-4200-7214-3.
^ ^a ^b ^c ^d John M. Mellor-Crummey and Michael L. Scott; et al. (February 1991). "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors". ACM TOCS.
^ Boyd-Wickizer, Silas, et al. "Non-scalable locks are dangerous." Proceedings of the Linux Symposium. 2012. http://pdos.csail.mit.edu/papers/linux:lock.pdf
^ Jonathan Corbet, Ticket spinlocks, Linux Weekly News, February 6, 2008. Retrieved 23. July, 2010.
^ Jeremy Fitzhardinge, paravirt: introduce a "lock-byte" spinlock implementation, Linux kernel, July 7, 2008
^ Revert "Merge branch 'xen/pvticketlock' into xen/next"
^ "Ticket Spinlocks".
^ Michael L. Scott and William N. Scherer III. Scalable Queue-Based Spin Locks with Timeout. Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, pp. 44-52, 2001.

[:0-1] ^ ^a ^b ^c ^d ^e ^f ^g ^h Solihin, Yan (2009). Fundamentals of parallel computer architecture : multichip and multicore systems. Solihin Pub. pp. 262–269. ISBN 978-0-9841630-0-7.

[Sottile-2] Sottile, Matthew J.; Mattson, Timothy G.; Rasmussen, Craig E (Sep 28, 2009). Introduction to Concurrency in Programming Languages. Boca Raton, FL, USA: CRC Press. p. 56. ISBN 978-1-4200-7214-3.

[:1-3] John M. Mellor-Crummey and Michael L. Scott; et al. (February 1991). "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors". ACM TOCS.

[4] Boyd-Wickizer, Silas, et al. "Non-scalable locks are dangerous." Proceedings of the Linux Symposium. 2012. http://pdos.csail.mit.edu/papers/linux:lock.pdf

[5] Jonathan Corbet, Ticket spinlocks, Linux Weekly News, February 6, 2008. Retrieved 23. July, 2010.

[6] Jeremy Fitzhardinge, paravirt: introduce a "lock-byte" spinlock implementation, Linux kernel, July 7, 2008

[7] Revert "Merge branch 'xen/pvticketlock' into xen/next"

[8] "Ticket Spinlocks".

[9] Michael L. Scott and William N. Scherer III. Scalable Queue-Based Spin Locks with Timeout. Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, pp. 44-52, 2001.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]