Question
Deeply understanding Volatile and Memory Barriers
I have been reading a ton on this topic over the past view days but would really like clarification on what I learned so far in relations to C# and C. Firstly, Atomicity
seems to be fairly intrinsic in modern times where I am strictly referring to the ability to read and write within a single instruction. It seems that C# always naturally aligns its data (which also seems to be done in C as well, but I heard there's some cases this is not true). By alignment I mean that for example in x64 a long will be 8 bytes meaning its virtual memory address will be a multiple of 8. This way the CPU can read and write with its 8-byte wide BUS in a single cycle. If, however we were not aligned correctly and were off by a byte the CPU may require 2 cycles to read and write data which not only slows our program down it also removes atomicity since this could cause partial data to be observed. It's also good to mention that this seems to also work for other data like a 4-byte int where that can also be read and written to within a single cycle also making it atomic. Lastly, this also seems to yield extra optimization where when doing something like reading an array we can now read / write 2 int's at a time in our 8-byte wide BUS.
Now volatile
in C# seems to do 2 things. Firstly, it prevents the CPU from storing copies of that data in its registers. It however, does not prevent the CPU from storing copies in its caches like L1, L2, or L3. Secondly, it provides a Memory Barrier
which I will talk about in the next paragraph. Now in C it seems that volatile
just prevents caching in registers without providing the memory barrier.
Continuing on a Memory Barrier
seems to primarily prevent reordering of instructions. This is notable since modern CPU's do something called Speculative Execution
which can lead to prematurely running code where if its guess was correct, we would have just of saved some time while if we guessed wrong, we can just roll back the changes. This however could cause problems if multiple threads are modifying data in a time sensitive manner. It's also worth to note that theres a read, write, and a full memory barrier where Thread.MemoryBarrier()
in C# is a full memory barrier which means it works for both read and write instructions.
Of course, volatile and memory barriers are not a one size fit all solution it seems to be very useful when a single thread writes while other threads read. If multiple threads write to the data being protected by these measures, we can run into issues. Also, cache coherency is the reason why we can still cache this data in L1 - L3.
Question:
So now that I covered what I assume to be correct I am primarily confused about why we need a volatile signal flag in C# and why the C code looks so familiar to the C# code despite volatile
in C not using a memory barrier
by default. For example, in the following C# code if I were to remove the isDataReady
signal flag I assume this code would still work fine with the only issue being the possibility of stale copies being in the CPU registers for a max of a few milliseconds. This would probably be fine if my specific use case had a tolerance of a few milliseconds but how does adding the volatile signal flag fix that exactly?
class ProducerConsumerExample
{
private int[] data = new int[10];
private volatile bool isDataReady = false;
public void Producer()
{
for (int i = 0; i < data.Length; i++)
{
data[i] = i;
}
Thread.MemoryBarrier(); // We need a memory barrier after we write
isDataReady = true;
}
public void Consumer()
{
while (!isDataReady)
{
Thread.Sleep(1);
}
Thread.MemoryBarrier(); // We need a memory barrier before we read
for (int i = 0; i < data.Length; i++)
{
Console.WriteLine(data[i]);
}
}
}
Now in C I was told the use of memory barrier works the same even though in this case the volatile
flag will not use a memory barrier behind the scenes. Does this mean in C if I wanted to just use a normal flag (not related to this example) I should also use a volatile field along with a memory barrier for read and write essentially making it work similar to a volatile field in C#? Furthermore, in the following C example I was given why does it look so similar to the C# example with the same use of volatile signal fields despite these signal fields not being protected by a memory barrier. Lastly, in both examples how does this volatile signal flag even help at all since I assume the array that is being looped over can still be copied in the CPU's registers.
#include <stdatomic.h>
volatile int isDataReady = 0;
int data[10];
void producer() {
for (int i = 0; i < 10; i++) {
data[i] = i;
}
atomic_thread_fence(memory_order_release);
isDataReady = 1;
}
void consumer() {
while (isDataReady == 0) {
}
atomic_thread_fence(memory_order_acquire);
for (int i = 0; i < 10; i++) {
process(data[i]);
}
}