Without any additional set-up, wifstream
will use the C
locale, which is probably not what you want.
There are multiple possible solutions:
1. Using the default locale
To read files using the default locale, use std::wifstream
with std::locale("")
. This is how most Linux programs work.
Example:
#include <fstream>
#include <iostream>
#include <ranges>
int main()
{
std::wifstream stream("sample.txt");
if (!stream)
{
std::cerr << "Failed to open file\n";
return 1;
}
stream.imbue(std::locale(""));
for (auto c : std::views::istream<wchar_t>(stream))
{
std::cout << std::hex << static_cast<int>(c) << std::endl;
}
if (stream.bad())
{
std::cerr << "Failed to extract character\n";
}
}
Most Linux distros default to UTF-8. Windows can be configured to use UTF-8 too, although I believe that's not the default.
2. Using an UTF-8 locale
While there isn't really a fully portable way to do this, the following seems to work both on Windows and Linux (as long as the en_US.utf-8
locale is available):
stream.imbue(std::locale("en_US.utf-8"));
3. Using the deprecated (C++17) / removed (C++26) codecvt_utf8_utf16
WARNING: Do NOT use on production. Stuff from <codecvt>
have issues related to error handling and are set to be removed from the standard. I also wouldn't trust them to properly handle malformed (or potentially malicious) input.
stream.imbue(std::locale(std::locale::classic(), new std::codecvt_utf8_utf16<wchar_t>));
4. Using a library
This is realistically the best option for production code that can't simply rely on the default locale.
So which approach should you choose? It depends:
- If you just want to do some quick testing, option #2 is probably the simplest. Just don't use it on production;
- If it's acceptable (or even desirable, if you're targeting Linux) for your program to use the default locale, use option #1;
- Otherwise, use #4.