Question

writing directly to std::string internal buffers

I was looking for a way to stuff some data into a string across a DLL boundary. Because we use different compilers, all our dll interfaces are simple char*.

Is there a correct way to pass a pointer into the dll function such that it is able to fill the string buffer directly?

string stringToFillIn(100, '\0');
FunctionInDLL( stringToFillIn.c_str(), stringToFillIn.size() );   // definitely WRONG!
FunctionInDLL( const_cast<char*>(stringToFillIn.data()), stringToFillIn.size() );    // WRONG?
FunctionInDLL( &stringToFillIn[0], stringToFillIn.size() );       // WRONG?
stringToFillIn.resize( strlen( stringToFillIn.c_str() ) );

The one that looks most promising is &stringToFillIn[0] but is that a correct way to do this, given that you'd think that string::data() == &string[0]? It seems inconsistent.

Or is it better to swallow an extra allocation and avoid the question:

vector<char> vectorToFillIn(100);
FunctionInDLL( &vectorToFillIn[0], vectorToFillIn.size() );
string dllGaveUs( &vectorToFillIn[0] );
 45  46364  45
1 Jan 1970

Solution

 26

I'm not sure the standard guarantees that the data in a std::string is stored as a char*. The most portable way I can think of is to use a std::vector, which is guaranteed to store its data in a continuous chunk of memory:

std::vector<char> buffer(100);
FunctionInDLL(&buffer[0], buffer.size());
std::string stringToFillIn(&buffer[0]);

This will of course require the data to be copied twice, which is a bit inefficient.

2009-06-25

Solution

 22

Update (2021): C++11 cleared this up and the concerns expressed here are no longer relevant.

After a lot more reading and digging around I've discovered that string::c_str and string::data could legitimately return a pointer to a buffer that has nothing to do with how the string itself is stored. It's possible that the string is stored in segments for example. Writing to these buffers has an undefined effect on the contents of the string.

Additionally, string::operator[] should not be used to get a pointer to a sequence of characters - it should only be used for single characters. This is because pointer/array equivalence does not hold with string.

What is very dangerous about this is that it can work on some implementations but then suddenly break for no apparent reason at some future date.

Therefore the only safe way to do this, as others have said, is to avoid any attempt to directly write into the string buffer and use a vector, pass a pointer to the first element and then assign the string from the vector on return from the dll function.

2009-06-25