This article includes a list of general references, but it remains largely unverified because it lacks sufficient corresponding inline citations. (February 2015) (Learn how and when to remove this template message)
In computer science, load-linked/store-conditional (LL/SC), sometimes known as load-reserved/store-conditional (LR/SC), are a pair of instructions used in multithreading to achieve synchronization. Load-link returns the current value of a memory location, while a subsequent store-conditional to the same memory location will store a new value only if no updates have occurred to that location since the load-link. Together, this implements a lock-free atomic read-modify-write operation.
Comparison of LL/SC and compare-and-swap
If any updates have occurred, the store-conditional is guaranteed to fail, even if the value read by the load-link has since been restored. As such, an LL/SC pair is stronger than a read followed by a compare-and-swap (CAS), which will not detect updates if the old value has been restored (see ABA problem).
Real implementations of LL/SC do not always succeed even if there are no concurrent updates to the memory location in question. Any exceptional events between the two operations, such as a context switch, another load-link, or even (on many platforms) another load or store operation, will cause the store-conditional to spuriously fail. Older implementations will fail if there are any updates broadcast over the memory bus. This is often called weak LL/SC by researchers, as it breaks many theoretical LL/SC algorithms. Weakness is relative, and some weak implementations can be used for some algorithms.
LL/SC is more difficult to emulate than CAS. Additionally, stopping running code between paired LL/SC instructions, such as when single-stepping through code, can prevent forward progress, making debugging tricky.
LL/SC instructions are supported by:
- Alpha: ldl_l/stl_c and ldq_l/stq_c
- PowerPC/Power ISA: lwarx/stwcx and ldarx/stdcx
- MIPS: ll/sc
- ARM: ldrex/strex (ARMv6 and v7), and ldxr/stxr (ARM version 8)
- RISC-V: lr/sc
- ARC: LLOCK/SCOND
Some CPUs[which?] require the address being accessed exclusively to be configured in write-through mode.
Typically, CPUs track the load-linked address at a cache-line or other granularity, such that any modification to any portion of the cache line (whether via another core's store-conditional or merely by an ordinary store) is sufficient to cause the store-conditional to fail.
All of these platforms provide weak[clarification needed] LL/SC. The PowerPC implementation allows an LL/SC pair to wrap loads and even stores to other cache lines (although this approach is vulnerable to false cache line sharing). This allows it to implement, for example, lock-free reference counting in the face of changing object graphs with arbitrary counter reuse (which otherwise requires double compare-and-swap, DCAS). RISC-V provides an architectural guarantee of eventual progress for LL/SC sequences of limited length.
Some ARM implementations define platform dependent blocks, ranging from 8 bytes to 2048 bytes, and an LL/SC attempt in any given block fails if there is between the LL and SC a normal memory access inside the same block. Other ARM implementations fail if there is a modification anywhere in the whole address space. The former implementation is the stronger and most practical.
LL/SC has two advantages over CAS when designing a load-store architecture: reads and writes are separate instructions, as required by the design philosophy (and pipeline architecture); and both instructions can be performed using only two registers (address and value), fitting naturally into common 2-operand ISAs. CAS, on the other hand, requires three registers (address, old value, new value) and a dependency between the value read and the value written. x86, being a CISC architecture, does not have this constraint; though modern chips may well translate a CAS instruction into separate LL/SC micro-operations internally.
Hardware LL/SC implementations typically do not allow nesting of LL/SC pairs. A nesting LL/SC mechanism can be used to provide a MCAS primitive (multi-word CAS, where the words can be scattered). In 2013, Trevor Brown, Faith Ellen, and Eric Ruppert have implemented in software a multi-address LL/SC extension (which they call LLX/SCX) that relies on automated code generation; they have used it to implement one of the best performing concurrent binary search tree (actually a chromatic tree), slightly beating the JDK CAS-based skip list implementation.
- "S-1 project". Stanford Computer Science wiki. 2018-11-30.
- Andrew Waterman; Krste Asanović, eds. (2017-05-07). "7.2 Load-Reserved/Store-Conditional Instructions". The RISC-V Instruction Set Manual, Volume 1: User-Level ISA, Version 2.2 (PDF).
- Keno Fischer (2020-05-02). "Julia 1.5 Feature Preview: Time Traveling (Linux) Bug Reporting". Retrieved 2020-05-14.
- James H. Anderson; Mark Moir (1995). "Universal constructions for multi-object operations". PODC '95 Proceedings of the fourteenth annual ACM symposium on Principles of distributed computing. ACM. pp. 184–193. doi:10.1145/224964.224985. ISBN 0-89791-710-3. See their Table 1, Figures 1 & 2 and Section 2 in particular.
- James R. Larus; Ravi Rajwar (2007). Transactional Memory. Morgan & Claypool. p. 55. ISBN 978-1-59829-124-7.
- Keir Fraser (February 2004). Practical lock-freedom (PDF) (Technical report). University of Cambridge Computer Laboratory. p. 20. UCAM-CL-TR-579.
- Brown, Trevor; Ellen, Faith; Ruppert, Eric (2013). "Pragmatic primitives for non-blocking data structures" (PDF). PODC '13 Proceedings of the 2013 ACM symposium on Principles of distributed computing. ACM. pp. 13–22. doi:10.1145/2484239.2484273. ISBN 978-1-4503-2065-8. See also slides
- Trevor Brown; Faith Ellen; Eric Ruppert (2014). "A general technique for non-blocking trees" (PDF). PPoPP '14 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. ACM. pp. 329–342. doi:10.1145/2555243.2555267. ISBN 978-1-4503-2656-8.
- Jensen, Eric H.; Hagensen, Gary W.; Broughton, Jeffrey M. (November 1987). A New Approach to Exclusive Data Access in Shared Memory Multiprocessors (PDF) (Technical report). Lawrence Livermore National Laboratory. UCRL-97663. Archived from the original (PDF) on 2017-02-02. Retrieved 2012-02-22.
- Bruner, John D.; Hagensen, Gary W.; Jensen, Eric H.; Pattin, Jay C.; Broughton, Jeffrey M. (11 November 1987). Cache Coherence on the S-1 AAP (PDF) (Technical report). Lawrence Livermore National Laboratory. UCRL-97646. Archived from the original (PDF) on 2 February 2017. Retrieved 10 November 2013.
- Detlefs, D.; Martin, P.; Moir, M.; Steele, Jr., Guy L. (2001). "Lock-free reference counting". PODC '01 Proceedings of the twentieth annual ACM symposium on Principles of distributed computing. ACM. pp. 190–9. doi:10.1145/383962.384016. ISBN 1-58113-383-9.
- Reinholtz, Kirk (December 2004). "Atomic Reference Counting Pointers". C/C++ Users Journal.[permanent dead link]
- Sites, R. L. (February 1993). "Alpha AXP architecture". Comm. ACM. 36 (2): 33–44. doi:10.1145/151220.151226.