Question

Deeply understanding Volatile and Memory Barriers

I have been reading a ton on this topic over the past view days but would really like clarification on what I learned so far in relations to C# and C. Firstly, Atomicity seems to be fairly intrinsic in modern times where I am strictly referring to the ability to read and write within a single instruction. It seems that C# always naturally aligns its data (which also seems to be done in C as well, but I heard there's some cases this is not true). By alignment I mean that for example in x64 a long will be 8 bytes meaning its virtual memory address will be a multiple of 8. This way the CPU can read and write with its 8-byte wide BUS in a single cycle. If, however we were not aligned correctly and were off by a byte the CPU may require 2 cycles to read and write data which not only slows our program down it also removes atomicity since this could cause partial data to be observed. It's also good to mention that this seems to also work for other data like a 4-byte int where that can also be read and written to within a single cycle also making it atomic. Lastly, this also seems to yield extra optimization where when doing something like reading an array we can now read / write 2 int's at a time in our 8-byte wide BUS.

Now volatile in C# seems to do 2 things. Firstly, it prevents the CPU from storing copies of that data in its registers. It however, does not prevent the CPU from storing copies in its caches like L1, L2, or L3. Secondly, it provides a Memory Barrier which I will talk about in the next paragraph. Now in C it seems that volatile just prevents caching in registers without providing the memory barrier.

Continuing on a Memory Barrier seems to primarily prevent reordering of instructions. This is notable since modern CPU's do something called Speculative Execution which can lead to prematurely running code where if its guess was correct, we would have just of saved some time while if we guessed wrong, we can just roll back the changes. This however could cause problems if multiple threads are modifying data in a time sensitive manner. It's also worth to note that theres a read, write, and a full memory barrier where Thread.MemoryBarrier() in C# is a full memory barrier which means it works for both read and write instructions.

Of course, volatile and memory barriers are not a one size fit all solution it seems to be very useful when a single thread writes while other threads read. If multiple threads write to the data being protected by these measures, we can run into issues. Also, cache coherency is the reason why we can still cache this data in L1 - L3.

Question: So now that I covered what I assume to be correct I am primarily confused about why we need a volatile signal flag in C# and why the C code looks so familiar to the C# code despite volatile in C not using a memory barrier by default. For example, in the following C# code if I were to remove the isDataReady signal flag I assume this code would still work fine with the only issue being the possibility of stale copies being in the CPU registers for a max of a few milliseconds. This would probably be fine if my specific use case had a tolerance of a few milliseconds but how does adding the volatile signal flag fix that exactly?

class ProducerConsumerExample
{
    private int[] data = new int[10];
    private volatile bool isDataReady = false;

    public void Producer()
    {
        for (int i = 0; i < data.Length; i++)
        {
            data[i] = i;  
        }

        Thread.MemoryBarrier();  // We need a memory barrier after we write
        isDataReady = true;      
    }

    public void Consumer()
    {
        while (!isDataReady)    
        {
            Thread.Sleep(1);    
        }

        Thread.MemoryBarrier(); // We need a memory barrier before we read

        for (int i = 0; i < data.Length; i++)
        {
            Console.WriteLine(data[i]);  
        }
    }
}

Now in C I was told the use of memory barrier works the same even though in this case the volatile flag will not use a memory barrier behind the scenes. Does this mean in C if I wanted to just use a normal flag (not related to this example) I should also use a volatile field along with a memory barrier for read and write essentially making it work similar to a volatile field in C#? Furthermore, in the following C example I was given why does it look so similar to the C# example with the same use of volatile signal fields despite these signal fields not being protected by a memory barrier. Lastly, in both examples how does this volatile signal flag even help at all since I assume the array that is being looped over can still be copied in the CPU's registers.

#include <stdatomic.h>

volatile int isDataReady = 0;
int data[10];

void producer() {
    for (int i = 0; i < 10; i++) {
        data[i] = i;  
    }
    atomic_thread_fence(memory_order_release);  
    isDataReady = 1;  
}

void consumer() {
    while (isDataReady == 0) {
        
    }
    atomic_thread_fence(memory_order_acquire);  
    for (int i = 0; i < 10; i++) {
        process(data[i]);  
    }
}

5 102 5

1 Jan 1970

Solution

You have some misconceptions:

a Memory Barrier seems to primarily prevent reordering of instructions. This is notable since modern CPU's do something called Speculative Execution which can lead to prematurely running code where if its guess was correct, we would have just of saved some time while if we guessed wrong, we can just roll back the changes. This however could cause problems if multiple threads are modifying data in a time sensitive manner.

Memory Barriers are a compiler thing, preventing the ASM compiler (the JITter in C#) from reordering reads and writes, where it would normally assume the effect is not observable from a single thread. This is separate from a memory fence CPU instruction.

Whether a memory fence instruction needs to be inserted as well is dependent on the CPU architecture, some of which have stronger guarantees than others. Branch prediction is just one of those architecture issues that may or may not be a problem: some CPUs guarantee consistency even with branch prediction.

I assume this code would still work fine with the only issue being the possibility of stale copies being in the CPU registers for a max of a few milliseconds. This would probably be fine if my specific use case had a tolerance of a few milliseconds but how does adding the volatile signal flag fix that exactly?

No, you have a fundamental mistake here: if you don't use volatile then the compiler/JITter is free to keep the data in registers indefinitely. It doesn't have to read from or store into main memory, because the effect is not observable from a single thread. This is the classic mistake of usng while (_someGlobalFlag) { }. You need volatile to avoid this and force read/store to main memory. The CPU then sorts out cache coherency from there, but if it stays in registers then the CPU can't handle that as you haven't told it to.

Thread.MemoryBarrier();  // We need a memory barrier after we write

No, you don't need an explicit memory barrier here, as volatile in C# already provides half-barriers which is enough in this case. You are correct that C doesn't provide barriers.

Again: volatile in and of itself really only refers to forcing a main-memory read/store, and that goes for C. That C# also provides a half-barrier on top is just nice, but in many cases you actually need a full barrier. You need to analyze your code to see what it needs. And MemoryBarrier() by itself does not do what volatile does, it does not guarantee usage of main memory.

2024-07-16

Charlieface