Question
fwscanf failing to read UTF-8 CSV file correctly in C
This program can only use libraries of the C standard.
I'm trying to read a UTF-8 encoded CSV file in C using fwscanf
, but I'm encountering issues with the reading process. The file contains rows with a string and a float value separated by a comma. Here's a minimal example demonstrating the problem:
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#define MAX_STRING_LENGTH 31
int main() {
setlocale(LC_ALL, "en_US.UTF-8");
FILE *file = fopen("input.csv", "r, ccs=UTF-8");
if (file == NULL) {
fwprintf(stderr, L"Error opening file.\n");
return 1;
}
wchar_t string[MAX_STRING_LENGTH];
float frequency;
int row = 0;
while (!feof(file)) {
row++;
int result = fwscanf(file, L"%30[^,],%f,", string, &frequency);
if (result == 2) {
wprintf(L"Row %d: String = '%ls', Frequency = %.4f\n", row, string, frequency);
} else if (result == 1) {
wprintf(L"Row %d: String = '%ls', Frequency not read\n", row, string);
} else if (result == EOF) {
break;
} else {
wprintf(L"Error reading row %d\n", row);
wchar_t c;
// Skip the rest of the line
while ((c = fgetwc(file)) != L'\n' && c != WEOF);
}
}
fclose(file);
return 0;
}
Sample input.csv:
hello,1.0000
world,0.5000
how,0.7500
are,0.2500
you,1.0000
?,0.5000
Expected output:
Row 1: String = 'hello', Frequency = 1.0000
Row 2: String = 'world', Frequency = 0.5000
Row 3: String = 'how', Frequency = 0.7500
Row 4: String = 'are', Frequency = 0.2500
Row 5: String = 'you', Frequency = 1.0000
Row 6: String = '?', Frequency = 0.5000
The issue I'm facing is that fwscanf is not reading the file correctly. It either reads incorrect values or fails to read at all. I've tried using different locale settings and file opening modes, but the problem persists.