Question

Dereferencing type-punned pointer will break strict-aliasing rules

I used the following piece of code to read data from files as part of a larger program.

double data_read(FILE *stream,int code) {
        char data[8];
        switch(code) {
        case 0x08:
            return (unsigned char)fgetc(stream);
        case 0x09:
            return (signed char)fgetc(stream);
        case 0x0b:
            data[1] = fgetc(stream);
            data[0] = fgetc(stream);
            return *(short*)data;
        case 0x0c:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(int*)data;
        case 0x0d:
            for(int i=3;i>=0;i--)
                data[i] = fgetc(stream);
            return *(float*)data;
        case 0x0e:
            for(int i=7;i>=0;i--)
                data[i] = fgetc(stream);
            return *(double*)data;
        }
        die("data read failed");
        return 1;
    }

Now I am told to use -O2 and I get following gcc warning: warning: dereferencing type-punned pointer will break strict-aliasing rules

Googleing I found two orthogonal answers:

vs

In the end I don't want to ignore the warnings. What would you recommend?

[update] I substituted the toy example with the real function.

 45  79866  45
1 Jan 1970

Solution

 41

The problem occurs because you access a char-array through a double*:

char data[8];
...
return *(double*)data;

But gcc assumes that your program will never access variables though pointers of different type. This assumption is called strict-aliasing and allows the compiler to make some optimizations:

If the compiler knows that your *(double*) can in no way overlap with data[], it's allowed to all sorts of things like reordering your code into:

return *(double*)data;
for(int i=7;i>=0;i--)
    data[i] = fgetc(stream);

The loop is most likely optimized away and you end up with just:

return *(double*)data;

Which leaves your data[] uninitialized. In this particular case the compiler might be able to see that your pointers overlap, but if you had declared it char* data, it could have given bugs.

But, the strict-aliasing rule says that a char* and void* can point at any type. So you can rewrite it into:

double data;
...
*(((char*)&data) + i) = fgetc(stream);
...
return data;

Strict aliasing warnings are really important to understand or fix. They cause the kinds of bugs that are impossible to reproduce in-house because they occur only on one particular compiler on one particular operating system on one particular machine and only on full-moon and once a year, etc.

2012-10-12

Solution

 26

It looks a lot as if you really want to use fread:

int data;
fread(&data, sizeof(data), 1, stream);

That said, if you do want to go the route of reading chars, then reinterpreting them as an int, the safe way to do it in C (but not in C++) is to use a union:

union
{
    char theChars[4];
    int theInt;
} myunion;

for(int i=0; i<4; i++)
    myunion.theChars[i] = fgetc(stream);
return myunion.theInt;

I'm not sure why the length of data in your original code is 3. I assume you wanted 4 bytes; at least I don't know of any systems where an int is 3 bytes.

Note that both your code and mine are highly non-portable.

Edit: If you want to read ints of various lengths from a file, portably, try something like this:

unsigned result=0;
for(int i=0; i<4; i++)
    result = (result << 8) | fgetc(stream);

(Note: In a real program, you would additionally want to test the return value of fgetc() against EOF.)

This reads a 4-byte unsigned from the file in little-endian format, regardless of what the endianness of the system is. It should work on just about any system where an unsigned is at least 4 bytes.

If you want to be endian-neutral, don't use pointers or unions; use bit-shifts instead.

2010-07-14