Question

Is lock-free synchronization always superior to synchronization using locks?

In C++, there is one atomic type std::atomic<T>. This atomic type may be lock-free or maybe not depending on the type T and on the current platform. If a lock-free implementation for a type is available in a platform for a type T, then most compilers would provide lock-free atomic<T>. In this case, even if I want non-lock-free atomic<T> I can't have it.

C++ standards decided to keep only one std::atomic<T> instead of one std::atomic<T> and one std::lock_free<T> (partially implemented for specific types). Does this imply that, 'there is no case where using a non-lock-free atomic type would be a better choice over using a lock-free atomic type when the latter is available' ? (Mainly in terms of performance rather than ease-of-use).

 46  8594  46
1 Jan 1970

Solution

 55

Does this imply that, 'there is no case where using a non-lock-free atomic type would be a better choice over using a lock-free atomic type when the latter is available' ? (Mainly in terms of performance rather than ease-of-use).

No. And that is, in general, not true.

Suppose you have two cores and three threads that are ready-to-run. Assume threads A and B are accessing the same collection and will contend significantly while thread C is accessing totally different data and will minimize contention.

If threads A and B use locks, one of those threads will rapidly wind up being de-scheduled and thread C will run on one core. This will allow whichever thread gets scheduled, A or B, to run with nearly no contention at all.

By contrast, with a lock-free collection, the scheduler never gets a chance to deschedule thread A or B. It is entirely possible that threads A and B will run concurrently through their entire timeslice, ping-ponging the same cache lines between their L2 caches the whole time.

In general, locks are more efficient than lockfree code. That's why locks are used so much more often in threaded code. However, std::atomic types are generally not used in contexts like this. It would likely be a mistake to use a std::atomic type in a context where you have reason to think a lock would be more efficient.

2023-02-28

Solution

 55

Does this imply that, 'there is no case where using a non-lock-free atomic type would be a better choice over using a lock-free atomic type when the latter is available' ? (Mainly in terms of performance rather than ease-of-use).

No. And that is, in general, not true.

Suppose you have two cores and three threads that are ready-to-run. Assume threads A and B are accessing the same collection and will contend significantly while thread C is accessing totally different data and will minimize contention.

If threads A and B use locks, one of those threads will rapidly wind up being de-scheduled and thread C will run on one core. This will allow whichever thread gets scheduled, A or B, to run with nearly no contention at all.

By contrast, with a lock-free collection, the scheduler never gets a chance to deschedule thread A or B. It is entirely possible that threads A and B will run concurrently through their entire timeslice, ping-ponging the same cache lines between their L2 caches the whole time.

In general, locks are more efficient than lockfree code. That's why locks are used so much more often in threaded code. However, std::atomic types are generally not used in contexts like this. It would likely be a mistake to use a std::atomic type in a context where you have reason to think a lock would be more efficient.

2023-02-28

Solution

 39

Further to David Schwartz's excellent answer, I'd note that a great deal can depend upon what's scarce in your system overall.

If you have more threads that are ready to run than cores to run them, then what you generally want to do is detect as quickly as possible that there's contention over some resource, and put all but one of those contending threads to sleep, so you can schedule other threads onto those cores.

Lock free tends to work out better in more or less the opposite situation: you have more hardware available at any given time than you have threads to run. In this case, a busy-wait with lock-free code can react very quickly when a resource becomes free, make its modification, and keep moving forward.

The second question is how long contention is likely to last when it does happen. If you have lots of threads constantly "fighting" over a few resources, you're almost certainly better off putting most of them to sleep, letting a few (often only one) make progress as fast as possible, then switch to another and repeat.

But putting one thread to sleep and scheduling another means a trip to kernel mode and the scheduler. If the contention is expected to be short lived, constant switching between threads can add a lot of overhead so the system as a whole slows down a lot.

2023-02-28

Solution

 39

Further to David Schwartz's excellent answer, I'd note that a great deal can depend upon what's scarce in your system overall.

If you have more threads that are ready to run than cores to run them, then what you generally want to do is detect as quickly as possible that there's contention over some resource, and put all but one of those contending threads to sleep, so you can schedule other threads onto those cores.

Lock free tends to work out better in more or less the opposite situation: you have more hardware available at any given time than you have threads to run. In this case, a busy-wait with lock-free code can react very quickly when a resource becomes free, make its modification, and keep moving forward.

The second question is how long contention is likely to last when it does happen. If you have lots of threads constantly "fighting" over a few resources, you're almost certainly better off putting most of them to sleep, letting a few (often only one) make progress as fast as possible, then switch to another and repeat.

But putting one thread to sleep and scheduling another means a trip to kernel mode and the scheduler. If the contention is expected to be short lived, constant switching between threads can add a lot of overhead so the system as a whole slows down a lot.

2023-02-28