Asked By : user2555595
Answered By : Gilles
Low-level atomicity primitives
At the level of the interface between hardware and software, atomicity is typically provided by an operation that combines a read and a write operation. Here are two common examples:
- test-and-set: read the old value of a memory location, and write a new value. This is an atomic operation: the read is always executed together with the subsequent write. If two threads execute test-and-set on the same location then one thread performs its read and its write first, then the other thread performs its read and its write.
- compare-and-swap: read the current value of a memory location, and if it has a given value, write a new value to it. This is an atomic operation: the read and comparison are always executed together with the subsequent write if any. If two threads execute compare-and-swap on the same location, then one thread performs its read and comparison and may or may not perform a write based on the current value, then the other thread compares the value that may or may not have changed and may or may not perform its write.
These primitives are awkward to use in applications for several reasons.
- They are not portable: different CPU types provide different atomic primitives. Unless you’re writing software that’s only meant for use on one particular CPU type, these primitives need to be wrapped inside a higher-level interface.
- They are not the whole story. This aspect is often neglected in introductory concurrency courses, which is understandable because it can get complicated. However, I’ve found that programmers who write low-level concurrent code are not always of it, which is more problematic. These primitives are not always sufficient to implement concurrency between threads that are running on different processors on a multiprocessor machine. (Here, by processor, I mean separate execution threads — you may prefer to use the word “core” instead.) If different processors share the same memory but have separate caches, then processor 1 may perform an atomic test-and-set to its cache while processor 2 performs an atomic test-and-set to its cache — and then there is no synchronization between the two processors. Depending on the processor type, the software may need to trigger a cache flush or invalidation.
- Continuing on “not the whole story”, even on a single processor, memory instructions may be executed out of order, to optimize the use of the memory bus. Consider for example a program uses one a CAS instruction to start a critical section that contains assignments. If the processor decides to reorder the assignments before the CAS instruction, the desired synchronization isn’t happening! The software may need to generate appropriate memory barrier instructions.
Furthermore, these primitives only provide atomicity. Synchronization is more than that.
Synchronization is atomicity plus events
Synchronization problems broadly fall into two categories:
- Avoid certain orders of operations which lead to wrong results — a common case is performing a series of elementary operations atomically.
- React to an event: do something when something else happens.
Consider what happens when two threads are competing for a critical section. One thread gets in, and the other thread doesn’t thanks to the use of a low-level primitive. Now what? The other thread needs to wait until the resource is free. At the hardware level, the most common mechanism to wait for an event is to wait for an interrupt. This is the pretty much universal mechanism to trigger software events from hardware events. Interrupts are rarely applicable to synchronization between software. Waiting for a software event is usually provided by the operating system scheduler or the runtime library: the thread that needs to wait is suspended and placed on an event queue that is associated with the resource that the thread is waiting on. When the resource is released by its current holder, the thread at the front of the event queue is woken up. Depending on the design, it may either be assigned the resource at this point, or it may need to try acquiring the resource again. There are cases when it is a good strategy to busy-wait: if the critical section is short and the threads are executing on separate processors in a multiprocessor machine, then it may be more efficient for the thread to try and try again until it managed to acquire the resource, than to go through the whole business with an event queue (which itself must be synchronized between the processors — see the part about cache coherency above). This is only a good strategy for short critical sections, otherwise the busy wait hogs the CPU (which can’t be used to run other threads, and is consuming energy). This usually doesn’t apply to single-processor systems as the active waiting is hogging the CPU, delaying the other thread from finishing, although it can sometimes be applicable to system with cooperative multiprocessing where the waiting thread yields execution to other threads. A synchronization primitive such as a mutex is typically not executed as a critical section. For example, consider a typical spinlock operation (a mutex with busy-waiting) based on compare-and-swap in a coherent memory model (with no operation reordering and no need for cache operations). The mutex variable is initialized with the value 0. To acquire the mutex:
- With CAS, set the mutex variable to 1 if it was equal to 0.
- If CAS returns 0, the lock acquisition succeeded, so return to the caller.
- Otherwise, the lock acquisition failed. Jump to step 1.
There is no need for the logic in steps 2 and 3 to be synchronized with what other threads are doing.
Best Answer from StackOverflow
Question Source : http://cs.stackexchange.com/questions/17945