Synchronization (computer science)
|This article needs additional citations for verification. (November 2014)|
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. Process synchronization refers to the idea that multiple processes are to join up or handshake at a certain point, in order to reach an agreement or commit to a certain sequence of action. Data Synchronization refers to the idea of keeping multiple copies of a dataset in coherence with one another, or to maintain data integrity. Process synchronization primitives are commonly used to implement data synchronization.
- 1 Thread or process synchronization
- 1.1 Classic Problems of Synchronization
- 1.2 Hardware Synchronization
- 1.3 Synchronization strategies in programming languages
- 1.4 Synchronization examples
- 1.5 See
- 2 Data synchronization
- 3 Mathematical foundations
- 4 See also
- 5 References
- 6 External links
Thread or process synchronization
Thread synchronization, also known as serialization, is defined as a mechanism which ensures that two or more concurrent processes or threads do not simultaneously execute some particular program segment known as mutual exclusion. When one thread starts executing the critical section (serialized segment of the program) the other thread should wait until the first thread finishes. If proper synchronization techniques are not applied, it may cause a race condition where, the values of variables may be unpredictable and vary depending on the timings of context switches of the processes or threads.
For example, suppose that there are three processes namely, 1, 2 and 3. All three of them are concurrently executing and then need to share a common resource (critical section) as shown in Figure 1. Synchronization should be used here to avoid any conflicts for accessing this shared resource. Hence, when Process 1 and 2 both try to access that resource it should be assigned to only one process at a time. If it is assigned to Process 1, the other process (Process 2) needs to wait until Process 1 frees that resource (as shown in Figure 2). 
Another synchronization requirement which needs to be considered is the order in which particular processes or threads should be executed. For example, we cannot board a plane until we buy the required ticket. Similarly, we cannot check emails before validating our credentials (i.e., user name and password). In the same way, an ATM will not provide any service until we provide it with a correct PIN.
Other than mutual exclusion, synchronization also deals with the following:
- Deadlock: This occurs when many processes are waiting for a shared resource (critical section) which is being held by some other process. In this case the processes just keep waiting and execute no further.
- Starvation: A process is waiting to enter the critical section but other processes keep on executing the critical section and the first process just keeps on waiting.
- Priority inversion: In it when a high priority process is in the critical section, it may be interrupted by a medium priority process. This is the violation of rules BUT this may happen and may lead to some serious consequences when dealing with real-time problems.
- Busy waiting: It occurs when a process is waiting for its turn but simultaneously it is continuously checking that now its turn to process or not. This checking is basically robbing the processing time of other processes.
Processes access to critical section is controlled by using synchronization techniques. This may apply to a number of domains.
Classic Problems of Synchronization
The following are some classic problems of synchronization:
- The Producer-Consumer Problem (also called the The Bounded Buffer Problem)
- The Readers-Writers Problem
- The Dining Philosophers Problem
These problems are used to test nearly every newly proposed synchronization scheme or primitive.
Many systems provide hardware support for critical section code.
A single processor or Uniprocessor system could disable interrupts by executing currently running code without preemption, which is very inefficient on multiprocessor systems. "The key ability we require to implement synchronization in a multiprocessor is a set of hardware primitives with the ability to atomically read and modify a memory location. Without such a capability, the cost of building basic synchronization primitives will be too high and will increase as the processor count increases. There are a number of alternative formulations of the basic hardware primitives, all of which provide the ability to atomically read and modify a location, together with some way to tell if the read and write were performed atomically. These hardware primitives are the basic building blocks that are used to build a wide variety of user-level synchronization operations, including things such as locks and barriers. In general, architects do not expect users to employ the basic hardware primitives, but instead expect that the primitives will be used by system programmers to build a synchronization library, a process that is often complex and tricky." Many modern hardware provides special atomic hardware instructions by either test-and-set the memory word or compare-and-swap contents of two memory words.
Synchronization strategies in programming languages
- Synchronized Method: It includes the synchronized keyword in the declaration of the method. So when any thread invokes this synchronized method, that method acquires the intrinsic lock by its own (automatically) for that method's object and it releases the lock when the method returns, even if the return was caused by some uncaught exception.
- Synchronized Statement: Here we declare a block of code to be synchronized. Unlike synchronized methods, synchronized statements need to specify the objects that provide the intrinsic lock. To improve the concurrency with fine-grained synchronization, synchronized statements are very useful because they prevent unnecessary blocking.
In .NET framework, one can use synchronization primitives using the multi-threaded applications which are controlled without any kind of race conditions. "Synchronization is designed to be cooperative, demanding that every thread or process follow the synchronization mechanism before accessing protected resources (critical section) for consistent results." In .NET, Locking, signaling, lightweight synchronization types, spinwait and interlocked operations are some of mechanisms related to synchronization.
Following are some synchronization examples with respect to different platforms:
Synchronization in Windows
- Interrupt Masks are used to protect access to global resources (critical section) on uni-processor systems.
- Spinlocks: In multiprocessor systems, spinlocks are used because spinlocking-thread will never be preempted.
- "Also provides dispatcher objects user-land which may act mutexes, semaphores, events, and timers":
- Events: "An event acts much like a condition variable."
- Timers: "Timers notify one or more thread when time expired."
- Dispatcher: "Dispatcher objects either signaled-state (object available) or non-signaled state (thread will block)."
Synchronization in Linux
- "Prior to kernel Version 2.6, disables interrupts to implement short critical sections."
- "Version 2.6 and later, fully preemptive."
- Linux provides:
- reader-writer versions of both
- Enabling and disabling of kernel preemption replaced spinlocks on single-CPU systems.
Synchronization in Solaris
To control access to critical section in Solaris, following five tools are used:
- condition variables
- adaptive mutexes
- reader-writer locks
"Adaptive mutexes are basically binary semaphores that are implemented differently depending upon the conditions":
- "On a single processor system, the semaphore sleeps when it is blocked, until the block is released."
- "On a multi-processor system, if the thread that is blocking the semaphore is running on the same processor as the thread that is blocked, or if the blocking thread is not running at all, then the blocked thread sleeps just like a single processor system."
- "However if the blocking thread is currently running on a different processor than the blocked thread, then the blocked thread does a spinlock, under the assumption that the block will soon be released."
- "Adaptive mutexes are only used for protecting short critical sections, where the benefit of not doing context switching is worth a short bit of spinlocking. Otherwise traditional semaphores and condition variables are used."
For the longer section of codes which are accessed very frequently but don't change very often, Reader-writer locks are used.
It is a queue of threads which are waiting on acquired lock.
- "Each synchronized object which has threads blocked waiting for access to it needs a separate turnstile. For efficiency, however, the turnstile is associated with the thread currently holding the object, rather than the object itself."
- "In order to prevent priority inversion, the thread holding a lock for an object will temporarily acquire the highest priority of any process in the turnstile waiting for the blocked object. This is called a priority-inheritance protocol."
- "User threads are controlled the same as for kernel threads, except that the priority-inheritance protocol does not apply."
Pthreads is an OS-Independent API and it provides:
- mutex locks
- condition variables
- read-write locks
- Lock (computer science) and mutex
- Monitor (synchronization)
- Semaphore (programming)
- Simple Concurrent Object-Oriented Programming (SCOOP)
- JAVA concurrency
A distinctly different (but related) concept is that of data synchronization. This refers to the need to keep multiple copies of a set of data coherent with one another or to maintain data integrity, Figure 3. For example, database replication is used to keep multiple copies of data synchronized with database servers that store data in different locations.
- File synchronization, such as syncing a hand-held MP3 player to a desktop computer.
- Cluster file systems, which are file systems that maintain data or indexes in a coherent fashion across a whole computing cluster.
- Cache coherency, maintaining multiple copies of data in sync across multiple caches.
- RAID, where data is written in a redundant fashion across multiple disks, so that the loss of any one disk does not lead to a loss of data.
- Database replication, where copies of data on a database are kept in sync, despite possible large geographical separation.
- Journaling, a technique used by many modern file systems to make sure that file metadata are updated on a disk in a coherent, consistent manner.
Challenges in data synchronization
Some of the challenges which user may face in data synchronization:
- Data Formats Complexity
- Data Quality
Data Formats Complexity
When we start doing something, the data we have usually is in a very simple format. It varies with time as the organization grows and evolves and "results not only in building a simple interface between the two applications (source and target), but also in a need to transform the data while passing them to the target application(s)". ETL (Extraction Transformation Loading) tools can be very helpful at this stage for managing data format complexities.
This is an era of real time systems. "Customers want to see what the status of their order in e-shop is; the status of a parcel delivery - a real time parcel tracking; what the current balance on their account is; etc." This shows the need of a real-time system, which is being updated as well to enable smooth manufacturing process in real-time, "e.g. ordering material when enterprise is running out stock; synchronizing customer orders with manufacturing process, etc." From real life, there exist so many examples where real-time processing gives successful and competitive advantage.
There are no fixed rules and policies to enforce data security. It may vary depending on the system which you are using. "Even though the security is maintained correctly in the source system which captures the data, the security and information access privileges must be enforced on the target systems as well to prevent any potential misuse of the information." This is a serious issue and particularly when it comes for handling secret, confidential and personal information. So because of the sensitivity and confidentiality, data transfer and all in-between information must be encrypted.
Data quality is another serious constraint. For better management and to maintain good quality of data, the common practice is to store the data at one location and share with different people and different systems and/or applications from different locations. It helps in preventing inconsistencies in the data.
There are five different phases involved in the data synchronization process:
- Data extraction from the source (master/main) system
- Data transfer
- Data transformation
- Data transfer
- Data load to the target system
Each of these steps is very critical. In case of large amounts of data, the synchronization process needs to be carefully planned and executed to avoid any negative impact on performance.
Synchronization was originally a process based concept whereby a lock could be obtained on an object. Its primary usage was in databases. There are two types of (file) lock; read-only and read-write. Read-only locks may be obtained by many processes or threads. Read-write locks are exclusive, as they may only be used by a single process/thread at a time.
Although locks were derived for file databases, data is also shared in memory between processes and threads. Sometimes more than one object (or file) is locked at a time. If they are not locked simultaneously they can overlap, causing a deadlock exception.
An abstract mathematical foundation for synchronization primitives is given by the history monoid. There are also many higher-level theoretical devices, such as process calculi and Petri nets, which can be built on top of the history monoid.
- Futures and promises, synchronization mechanisms in pure functional paradigms.
- Janssen, Cory. "Thread Synchronization". Techopedia. Retrieved 23 November 2014.
- Fatheisian, Halleh; Rosenberger, Eric. "Synchronization". Department of Computer Science, George Mason University. Retrieved 23 November 2014.
- Silberschatz, Abraham; Gagne, Greg; Galvin, Peter Baer (July 11, 2008). "Chapter 6: Process Synchronization". Operating System Concepts (Eighth ed.). John Wiley & Sons. ISBN 978-0-470-12872-5.
- Hennessy, John L.; Patterson, David A. (September 30, 2011). "Chapter 5: Thread-Level Parallelism". Computer Architecture: A Quantitative Approach (Fifth ed.). Morgan Kaufmann. ISBN 978-0-123-83872-8.
- "Synchronization Primitives in .Net framework". MSDN - The Microsoft Developer Network. Microsoft. Retrieved 23 November 2014.
- Silberschatz, Abraham; Gagne, Greg; Galvin, Peter Baer (December 7, 2012). "Chapter 5: Process Synchronization". Operating System Concepts (Ninth ed.). John Wiley & Sons. ISBN 978-1-118-06333-0.
- "Data Synchronization". Javlin Inc. Retrieved 23 November 2014.
- Schneider, Fred B. (1997). On concurrent programming. Springer-Verlag New York, Inc. ISBN 0-387-94942-9.
- Anatomy of Linux synchronization methods at IBM developerWorks
- The Little Book of Semaphores, by Allen B. Downey