Synchronization (computer science)
|This article needs additional citations for verification. (November 2014)|
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data Synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.
- 1 Thread or process synchronization
- 1.1 Classic Problems of Synchronization
- 1.2 Hardware Synchronization
- 1.3 Synchronization strategies in programming languages
- 1.4 Synchronization examples
- 1.5 See
- 2 Data synchronization
- 3 Mathematical foundations
- 4 See also
- 5 References
- 6 External links
Thread or process synchronization
Thread synchronization or serialization, strictly defined, is the application of particular mechanisms to ensure that two concurrently-executing threads or processes do not execute specific portions of a program at the same time, referred to as mutual exclusion. If one thread has begun to execute a serialized portion of the program called a critical section, any other thread trying to execute this portion must wait until the first thread finishes. If such synchronization measures are not taken, it can result in a race condition where variable values are unpredictable and depend on the timings of the thread or process context switch.
To further clarify thread or process synchronization, consider the following example: Three processes - Process 1, Process 2, and Process 3 - are executed concurrently and need to access a shared resource (critical section), Figure 1. To avoid conflicts when accessing shared resource (critical section), processes Process 1, Process 2, and Process 3 must be synchronized. Thus, when Process 1 accesses shared resource (critical section), and Process 2 also tries to access shared resource (critical section), Process 2’s access of shared resource (critical section) must be avoided with security measures or using some synchronization techniques until Process 1 finishes its operation and comes out of shared resource (critical section), Figure 2.
One very common synchronization requirement encountered by computer programmers is the strict order of two tasks. Task A must be completed before Task B can start. In simple words for example, we cannot board the train until we have bought a ticket. Similarly, we cannot enter the intersection until the traffic light is green and we cannot get money from an ATM until after we have entered our PIN.
In addition to providing mutual exclusion, synchronization must also guard against the following:
- Deadlock: This happens when two or more processes or threads are waiting for a lock to enter a critical section, which the other process is holding. This results in them indefinitely stalling and them making no further progress.
- Starvation: In some instances, a process or thread could have unbounded waiting time to acquire a lock and never make any progress.
- Priority inversion: In systems that have priorities for processes and threads, a medium priority task can pre-empt a higher priority task when using synchronization thereby violating the system rules. This is especially severe in real time systems which can cause missed deadlines.
- Busy waiting occurs when one thread continuously checks if it may proceed, robbing other threads of processing time.
Synchronization is used to control access to state both in small-scale multiprocessing systems—in multithreaded environments and multiprocessor computers—and in distributed computers consisting of thousands of units—in banking and database systems, in web servers, and so on.
Classic Problems of Synchronization
Following are the classic problems of synchronization:
- The Producer-Consumer Problem (also called the The Bounded Buffer Problem)
- The Readers-Writers Problem
- The Dining Philosophers Problem
These problems are used to test nearly every newly proposed synchronization scheme or primitive.
Many systems provide hardware support for critical section code.
Single processor or Uniprocessor system could disable interrupts by executing currently running code without preemption, which is very inefficient on multiprocessor systems. The key ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location. Without such a capability, the cost of building basic synchronization primitives will be too high and will increase as the processor count increases. There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically. These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as locks and barriers. In general, architects do not expect users to employ the basic hardware primitives, but instead expect that the primitives will be used by system programmers to build a synchronization library, a process that is often complex and tricky. Many modern hardware provides special atomic hardware instructions by either test-and-set the memory word or compare-and-swap contents of two memory words.
Synchronization strategies in programming languages
- Synchronized Method: Includes the synchronized keyword in its declaration. When a thread invokes a synchronized method, synchronized method automatically acquires the intrinsic lock for that method's object and releases it when the method returns, even if that return was caused by an uncaught exception.
- Synchronized Statement: Declares a block of code to be synchronized. Unlike synchronized methods, synchronized statements should specify the objects that provide the intrinsic lock. These statements are useful for improving concurrency with fine-grained synchronization, as they enable the avoidance of unnecessary blocking.
The .NET framework provides synchronization primitives using the multi-threaded applications controlled without any race conditions. Synchronization is designed to be cooperative, demanding that every thread or process follow the synchronization mechanism before accessing protected resources (critical section) for consistent results. Locking, signaling, lightweight synchronization types, spinwait and interlocked operations are mechanisms related to synchronization in .NET.
Following are some synchronization examples with respect to different platforms:
Synchronization in Windows
- Uses interrupt masks to protect access to global resources (critical section) on uniprocessor systems.
- Uses spinlocks on multiprocessor systems: Spinlocking-thread will never be preempted.
- Also provides dispatcher objects user-land which may act mutexes, semaphores, events, and timers:
- Events: An event acts much like a condition variable.
- Timers: Timers notify one or more thread when time expired.
- Dispatcher: Dispatcher objects either signaled-state (object available) or non-signaled state (thread will block).
Synchronization in Linux
- Prior to kernel Version 2.6, disables interrupts to implement short critical sections.
- Version 2.6 and later, fully preemptive.
- Linux provides:
- reader-writer versions of both
- On single-CPU system, spinlocks replaced by enabling and disabling kernel preemption.
Synchronization in Solaris
- Solaris controls access to critical sections using five tools:
- condition variables
- adaptive mutexes
- reader-writer locks
Adaptive mutexes are basically binary semaphores that are implemented differently depending upon the conditions:
- On a single processor system, the semaphore sleeps when it is blocked, until the block is released.
- On a multi-processor system, if the thread that is blocking the semaphore is running on the same processor as the thread that is blocked, or if the blocking thread is not running at all, then the blocked thread sleeps just like a single processor system.
- However if the blocking thread is currently running on a different processor than the blocked thread, then the blocked thread does a spinlock, under the assumption that the block will soon be released.
- Adaptive mutexes are only used for protecting short critical sections, where the benefit of not doing context switching is worth a short bit of spinlocking. Otherwise traditional semaphores and condition variables are used.
Reader-writer locks are used only for protecting longer sections of code which are accessed frequently but which are changed infrequently.
A turnstile is a queue of threads waiting on a lock.
- Each synchronized object which has threads blocked waiting for access to it needs a separate turnstile. For efficiency, however, the turnstile is associated with the thread currently holding the object, rather than the object itself.
- In order to prevent priority inversion, the thread holding a lock for an object will temporarily acquire the highest priority of any process in the turnstile waiting for the blocked object. This is called a priority-inheritance protocol.
- User threads are controlled the same as for kernel threads, except that the priority-inheritance protocol does not apply.
Pthreads API is OS-Independent. It provides:
- mutex locks
- condition variables
- read-write locks
- Lock (computer science) and mutex
- Monitor (synchronization)
- Semaphore (programming)
- Simple Concurrent Object-Oriented Programming (SCOOP)
- JAVA concurrency
A distinctly different (but related) concept is that of data synchronization. This refers to the need to keep multiple copies of a set of data coherent with one another or to maintain data integrity, Figure 3. For example, database replication is used to keep multiple copies of data synchronized with database servers that store data in different locations.
Challenges in data synchronization
Following are some challenges which you may face in data synchronization:
- Data Formats Complexity
- Data Quality
Data Formats Complexity
As the organization grows and evolves, the data formats vary which results not only in building a simple interface between the two applications (source and target), but also in a need to transform the data while passing them to the target application(s). ETL tools can be helpful at this stage.
Today the systems are all real time. Customers want to see what the status of their order in e-shop is; the status of a parcel delivery - a real time parcel tracking; what the current balance on their account is; etc. So here is the need to have the system real-time updated as well to enable smooth manufacturing process, e.g. ordering material when enterprise is running out stock; synchronizing customer orders with manufacturing process, etc. There are thousands of examples from real life when the real time is becoming either advantage or a must to be successful and competitive.
Different systems may have different policies to enforce data security and access levels. Even though the security is maintained correctly in the source system which captures the data, the security and information access privileges must be enforced on the target systems as well to prevent any potential misuse of the information. This is particularly an issue when handling personal information or any piece of confidential information under Non Disclosure Agreement (NDA). Any intermediate results of the data transfer as well as the data transfer itself must be encrypted.
Maintaining data in one place and sharing with other applications is best practice in managing and improving data quality. This prevents inconsistencies in the data caused by updating the same data in one system.
The data synchronization process consists of five basic phases:
- Data extraction from the source (master/main) system
- Data transfer
- Data transformation
- Data transfer
- Data load to the target system
In case of large data, each of these steps may impact performance. Therefore, the synchronization needs to be carefully planned to avoid any negative impact.
- File synchronization, such as syncing a hand-held MP3 player to a desktop computer.
- Cluster file systems, which are file systems that maintain data or indexes in a coherent fashion across a whole computing cluster.
- Cache coherency, maintaining multiple copies of data in sync across multiple caches.
- RAID, where data is written in a redundant fashion across multiple disks, so that the loss of any one disk does not lead to a loss of data.
- Database replication, where copies of data on a database are kept in sync, despite possible large geographical separation.
- Journaling, a technique used by many modern file systems to make sure that file metadata are updated on a disk in a coherent, consistent manner.
Synchronization was originally a process based concept whereby a lock could be obtained on an object. Its primary usage was in databases. There are two types of (file) lock; read-only and read-write. Read-only locks may be obtained by many processes or threads. Read-write locks are exclusive, as they may only be used by a single process/thread at a time.
Although locks were derived for file databases, data is also shared in memory between processes and threads. Sometimes more than one object (or file) is locked at a time. If they are not locked simultaneously they can overlap, causing a deadlock exception.
An abstract mathematical foundation for synchronization primitives is given by the history monoid. There are also many higher-level theoretical devices, such as process calculi and Petri nets, which can be built on top of the history monoid.
- Futures and promises, synchronization mechanisms in pure functional paradigms.
- Janssen, Cory. "Thread Synchronization". Techopedia. Retrieved 23 November 2014.
- Fatheisian, Halleh; Rosenberger, Eric. "Synchronization". Department of Computer Science, George Mason University. Retrieved 23 November 2014.
- Silberschatz, Abraham; Gagne, Greg; Galvin, Peter Baer (July 11, 2008). "Chapter 6: Process Synchronization". Operating System Concepts (Eighth ed.). John Wiley & Sons. ISBN 978-0-470-12872-5.
- Hennessy, John L.; Patterson, David A. (September 30, 2011). "Chapter 5: Thread-Level Parallelism". Computer Architecture: A Quantitative Approach (Fifth ed.). Morgan Kaufmann. ISBN 978-0-123-83872-8.
- "Synchronization Primitives in .Net framework". MSDN - The Microsoft Developer Network. Microsoft. Retrieved 23 November 2014.
- Silberschatz, Abraham; Gagne, Greg; Galvin, Peter Baer (December 7, 2012). "Chapter 5: Process Synchronization". Operating System Concepts (Ninth ed.). John Wiley & Sons. ISBN 978-1-118-06333-0.
- "Data Synchronization". Javlin Inc. Retrieved 23 November 2014.
- Schneider, Fred B. (1997). On concurrent programming. Springer-Verlag New York, Inc. ISBN 0-387-94942-9.
- Anatomy of Linux synchronization methods at IBM developerWorks
- The Little Book of Semaphores, by Allen B. Downey