ABA problem

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and "value is the same" is used to indicate "nothing has changed". However, another thread can execute between the two reads and change the value, do other work, then change the value back, thus fooling the first thread into thinking "nothing has changed" even though the second thread did work that violates that assumption.

The ABA problem occurs when multiple threads (or processes) accessing shared memory interleave. Below is the sequence of events that will result in the ABA problem:

  • Process P_1 reads value A from shared memory,
  • P_1 is preempted, allowing process P_2 to run,
  • P_2 modifies the shared memory value A to value B and back to A before preemption,
  • P_1 begins execution again, sees that the shared memory value has not changed and continues.

Although P_1 can continue executing, it is possible that the behavior will not be correct due to the "hidden" modification in shared memory.

A common case of the ABA problem is encountered when implementing a lock-free data structure. If an item is removed from the list, deleted, and then a new item is allocated and added to the list, it is common for the allocated object to be at the same location as the deleted object due to optimization. A pointer to the new item is thus sometimes equal to a pointer to the old item which is an ABA problem.

Examples[edit]

The first example of the ABA problem is a real-life scenario:

Charlie's boss wants to send some valuable documents to another city, so he places them in a briefcase, locks it, and hands it to Charlie. Charlie takes the briefcase to the train station, sets it on the seat next to his, and keeps an eye on it to ensure it is still there. Albert and his partner Bill have an identical looking briefcase with nothing important inside, and they notice Charlie's behavior. Bill approaches Charlie and asks for directions. While Charlie is answering, Albert swaps briefcases and walks away with Charlie's briefcase. When Charlie finishes talking to Bill, he checks again, sees what looks like his briefcase, grabs it, and boards a train. Later, when Charlie delivers the briefcase to a colleague in another city, the briefcase is opened, and Charlie is both confused and in trouble.

A second example also occurs in everyday life:

Natalie is waiting in her car at a red traffic light with her children. Her children start fighting with each other while waiting, and she leans back to scold them. Once their fighting stops, Natalie checks the light again and notices that it's still red. However, while she was focusing on her children, the light had changed to green, and then back again. Natalie doesn't think the light ever changed, but the people waiting behind her are very mad and honking their horns now.

In this scenario, the 'A' state is when the traffic light is red, and the 'B' state is when it's green. Originally, the traffic light starts in 'A' state. If Natalie looked at the light she would have noticed the change. But she only looked when the light was red (state 'A'). There is no way to tell if the light turned green during the time of no observation.

See the workaround section to find out how Natalie could have prevented her misfortune.

Next, consider a software example of ABA using a lock-free stack:

  /* Naive lock-free stack which suffers from ABA problem.*/
  class Stack {
    std::atomic<Obj*> top_ptr;
    //
    // Pops the top object and returns a pointer to it.
    //
    Obj* Pop() {
      while(1) {
        Obj* ret_ptr = top_ptr;
        if (!ret_ptr) return std::nullptr;
        // For simplicity, suppose that we can ensure that this dereference is safe
        // (i.e., that no other thread has popped the stack in the meantime).
        Obj* next_ptr = ret_ptr->next;
        // If the top node is still ret, then assume no one has changed the stack.
        // (That statement is not always true because of the ABA problem)
        // Atomically replace top with next.
        if (top_ptr.compare_exchange_weak(ret_ptr, next_ptr)) {
          return ret_ptr;
        }
        // The stack has changed, start over.
      }
    }
    //
    // Pushes the object specified by obj_ptr to stack.
    //
    void Push(Obj* obj_ptr) {
      while(1) {
        Obj* next_ptr = top_ptr;
        obj_ptr->next = next_ptr;
        // If the top node is still next, then assume no one has changed the stack.
        // (That statement is not always true because of the ABA problem)
        // Atomically replace top with obj.
        if (top_ptr.compare_exchange_weak(next_ptr, obj_ptr)) {
          return;
        }
        // The stack has changed, start over.
      }
    }
  };

This code can normally prevent problems from concurrent access, but suffers from ABA problems. Consider the following sequence:

Stack initially contains top → A → B → C

Thread 1 starts running pop:

 ret = A;
 next = B;

Thread 1 gets interrupted just before the compare_exchange_weak...

  { // Thread 2 runs pop:
    ret = A;
    next = B;
    compare_exchange_weak(A, B)  // Success, top = B
    return A;
  } // Now the stack is top → B → C
  { // Thread 2 runs pop again:
    ret = B;
    next = C;
    compare_exchange_weak(B, C)  // Success, top = C
    return B;
  } // Now the stack is top → C
  delete B;
  { // Thread 2 now pushes A back onto the stack:
    A->next = C;
    compare_exchange_weak(C, A)  // Success, top = A
  }

Now the stack is top → A → C

When Thread 1 resumes:

 compare_exchange_weak(A, B)

This instruction succeeds because it finds top == ret (both are A), so it sets top to next (which is B). As B has been deleted the program will access freed memory when it tries to look the first element on the stack. In C++, as shown here, accessing freed memory is undefined behavior: this may result in crashes, data corruption or even just silently appear to work correctly. ABA bugs, such as this, can be difficult to debug.

Workarounds[edit]

Going back to the previous example of Charlie and his briefcase, what could Charlie have done differently?

There are a number of ways that Charlie could have prevented this from happening, even though he can't open the briefcase. For one, he could've chained the real briefcase to his arm. That way, Albert would have to cut the chain to steal the briefcase, and Charlie would notice the cut chain and sound the alarm. This is what the LL/SC instruction on some processors attempts to do. Another solution would be for Charlie to write down the serial number of his real briefcase, and check it after every time he looks away from it. This is how Double-Word Compare-And-Swap Tagging works. Automatic Garbage Collection, along with other techniques like Hazard Pointers, deal with this problem by ensuring that there is no other briefcase in the world that looks like Charlie's briefcase. When Charlie, his boss, and whoever else cares about the briefcase don't need it anymore, it is destroyed. Then, and only then, is another briefcase that looks like his allowed to be created.

Natalie should have been more observant and watched the traffic light at all times. But such an attempt is inefficient in programming. It would be better to use a traffic system that notifies all waiting motorists about all changes. Then ABA problem can be circumvented when all motorists and the traffic system itself are actors in a system where all changes are sent to all parties that wish to be informed.

Below are examples of code mechanisms that implement the ideas above.

Tagged state reference[edit]

A common workaround is to add extra "tag" or "stamp" bits to the quantity being considered. For example, an algorithm using compare and swap on a pointer might use the low bits of the address to indicate how many times the pointer has been successfully modified. Because of this, the next compare-and-swap will fail, even if the addresses are the same, because the tag bits will not match. This does not completely solve the problem, as the tag bits will eventually wrap around, but helps to avoid it. Some architectures provide a double-word compare and swap, which allows for a larger tag. This is sometimes called ABAʹ since the second A is made slightly different from the first. Such tagged state references are also use in transactional memory.

Intermediate nodes[edit]

A correct but expensive approach is to use intermediate nodes that are not data elements and thus assure invariants as elements are inserted and removed [Valois].

Deferred reclamation[edit]

Another approach is to defer reclamation of removed data elements. One way to defer reclamation is to run the algorithm in an environment featuring an automatic garbage collector. Another way to defer reclamation is to use one or more hazard pointers, which are pointers to locations that otherwise cannot appear in the list. Each hazard pointer represents an intermediate state of an in-progress change; the presence of the pointer assures further synchronization [Doug Lea]. Yet another way to defer reclamation is to use read-copy update (RCU), which involves enclosing the update in an RCU read-side critical section and then waiting for an RCU grace period before freeing any removed data elements. Using RCU in this way guarantees that any data element removed cannot reappear until all currently executing operations have completed.

Some architectures provide "larger" atomic operations such that, as example, both forward and backward links in a doubly linked list can be updated atomically.

Some architectures provide a load linked, store conditional instruction in which the store is performed only when there are no other stores of the indicated location. This effectively separates the notion of "storage contains value" from "storage has been changed". Examples include DEC Alpha, MIPS, PowerPC and ARM (v6 and later). However no practical implementations of load linked will directly solve the ABA problem. [Michael]

Actor model[edit]

Further information: Actor model

Actors only send messages and contain all states inside the actors. There is no global state that would allow ABA problems.

See also[edit]

References[edit]