Saving a DataTable to Parquet with Parquet.NET 3

A while ago I wrote a post about extracting data from SQL and into Parquet. This was aimed mainly at on-premise systems where Data Gateway or others are not an option but you still want to get your data into a format which can be used by tools such as Azure Databricks.

Since that post Parquet .NET has come along and is now at version 3, it has better convenience methods and provides attributes to make persisting collections of objects to Parquet even easier. But this post is for those who don't want to use the easy parts, this is where you've got your data back into a System.Data.DataTable in C# and want to write it out to parquet, without creating an intermediate class. Fortunately this is still very much doable with the latest changes although the way you have to do this is now slightly different.

In the previous versions of Parquet .NET you would typically create a schema and then add rows of data as object arrays. With the latest version you now create a row group (with a defined number of columns) and then write your data to that row group a column at a time. There are some great examples in the documentation and I strongly encourage you to go and check them out.

The example solution takes data in a DataTable and then uses the column type to determine what the list should be constructed as, it then handles switching from DateTime to DateTimeOffset values and handling DBNull values, as well as splitting the results up into multiple row groups.

The code is over on GitHub and is available under an MIT license, so feel free to grab a copy and have a go.

https://github.com/dazfuller/datatable-to-parquet