Question

MATLAB does not read parquet file, simply says "Unable to read Parquet file". How can I still read it?

I have created a parquet file using Python polars' .write_parquet method. It can be read back by Python without a problem and MATLAB can also read the information about the file using parquetinfo without a problem.

However, when I run parquetread in MATLAB to actually load the data, it fails quickly with the error "Unable to read Parquet file" without further details.

I've searched around, and only found this Mathworks forum post without a solution.

How can I create a parquetfile using Python that is readable by MATLAB?

 2  26  2
1 Jan 1970

Solution

 3

It turns out the compression used by the parquetfile was not compatible with MATLAB 2024a.

In my Python code I wrote:

df.write_parquet("./file.parquet", compression="lz4")

I chose that compression as it was faster according to the docs. After reading on, I found that the docs of the compression parameter also state (emphasis mine):

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.

After setting the compression option to "snappy", the resulting file could be read by MATLAB. So the line of Python code becomes:

df.write_parquet("./file.parquet", compression="snappy")
2024-07-24
Saaru Lindestøkke