top of page

Channels
Contact
Main Site
More

Use tab to navigate through the menu items.

To see this working, head to your live site.

Categories
All Posts
My Posts

Parquet.Net

Parquet.Net is a .NET library to read and write Apache Parquet files.

Parquet.Net

Parquet.Net Use Case: Online Fashion Retailer

E-commerce company BuyFashion.com, an up and coming online fashion retailer, have experienced enormous sustained growth over the past five years. They have recently decided to build a team of Data Scientists in order to take advantage of the analytics edge that they have long been following in th

Jul 12, 2017

Like

What's coming in Parquet.Net 3.1

https://github.com/elastacloud/parquet-dotnet is about to be released in the following few days. Since v3.0 was pushed to the public, it saw a lot of interest and appraisal for it's incredible performance boost, however there were problems as well. To reiterate, v3.0 was a complete rewrite of 2.0 a

Oct 03, 2018

Like

Faster parquet dotnet!

Some pretty good news on the speed front - indicative of our gut feeling that parquet-dotnet is a fast implementation of Parquet for the dotnet framework due to our approach. It's worth noting that we will be tuning for performance optimisation in the coming months, so things will get better still.

Oct 27, 2017

Like

Parquet.Net 1.4 released

This is a big release, including a lot of speed and stability improvements: https://github.com/elastacloud/parquet-dotnet/releases/tag/1.4.0 new features: - data representation internally changed to columnar format, resulting in much smaller memory footprint and better performance (#238) - added su

Oct 23, 2017

Like

Learning from the world of Apache Spark part 1

Spark abstracts the idea of a schema from us by enabling us to read a directory of files which can be similar or identical in schema. A key characteristic is that a superset schema is needed on many occasions. Spark will infer the schema automatically for timestamps, dates, numeric and string types

Sep 09, 2017

Like

Towards a new Data Science stack in .NET

After several years of watching the Data Science ecosystem develop we decided to contribute back and try and influence it towards the myriad of capability that exists in .NET and Azure. The first pass at this was Parquet.NET, Parquet.USQL and the Parq application (now available through Chocolatey).

Aug 28, 2017

Like

Big Data and Parquet (the Microsoft way)

The Azure data platform is awash with ways of querying data. From relational, nosql and unstructured data you generally want to minimise the movement of data at the point of querying. At Elastacloud we're big fans of Microsoft's Azure Data Lake Store (ADLS) which offers us a highly performant way of

Aug 13, 2017

Like

Appending to file

Latest @ApacheParquet for .NET introduces support for appending data to existing files and reading into CSV with type inferring. This is great news for big data scenarios because now you can generate massive files without spending much RAM by streaming data in chunks.

Aug 03, 2017

Like

Chocolatey Parq

As part of our efforts to drive down the barriers to working with Parquet in the .net and Microsoft ecosystem, the Parq commandline is our first go-to tool for quickly inspecting the contents and structure of a Parquet file. Until now, the process for running Parq was to git clone the repository,

Jul 27, 2017

Like

SNAPPY support!

Things are moving rapidly forwards with a new PR merged in today that brings in a full SNAPPY support into parquet-dotnet. This pull request came from our newest collaborator, Sandy May - it's fantastic to see the team grow again.

Jul 11, 2017

Like

Parq outputs

Parq is a tool for Windows that allows the inspection of Parquet files. There are precious few tools that fit this category, and so when we were investing into parquet-dotnet we thought we'd build a console application that at least begins to address the deficit. There are three distinct output for

Jul 10, 2017

Like

Milestone 1 achieved

The team working on parquet-dotnet have completed Milestone 1 including the reading of a specific input file and handling of optional nullable values in parquet. You can track the amount of features we have implemented so far.

Jul 06, 2017

Like

Simplified Serialization with ParquetConvert

We've been working hard on Parquet.NET to give developers high level abstractions over Parquet so that there is an easy entrypoint into developing with Parquet that is not onerous for developers new to the format. As such, we've created ParquetConvert which allows the trivial creation of Parquet f

Oct 26, 2018

Like

Parquet and U-SQL

Did you know that it's possible to extract data from Parquet files in Azure Data Lake Analytics? Well it is and the library just received a couple of updates, check it out over on its Github page. First of all the library has just received an update to bring it up to the latest version of Parquet .

Feb 13, 2018

Like

Parquet on Xbox One

In order to reach a wider audience and engage more developers to use Parquet even in their spare time we're happy to announce that Apache Parquet Viewer works on Xbox One! This is possible because the viewer is a Universal Windows app, therefore works on all windows enabled devices. Now when you

Oct 24, 2017

Like

How about viewing Parquet files?

Traditionally viewing .parquet files requires some sort of online service, being it Apache Spark, Impala, Amazon AWS etc. However, when working in your local development environment it's really hard to see them, unless you write some soft of script printing it on a console. Even then, it's not reall

Oct 23, 2017

Like

Parquet.Net v1.2 Released

new features: - INT64 (C# long) type is supported (#194) - Decimal datatype is fully supported (#209). This includes support for simple System.Decimal, and decimal types with different scales and precisions. Decimals are encoded by utilising all three encodings from parquet specs, however this can b

Sep 06, 2017

Like

Parquet.Net v1.1 released

new features: - Reader supports nested structures. - Parquet output is now compatible with AWS Athena - Writer can append data to existing file improvements: - Parquet metadata sets page sizes according to standard - Schema and SchemaElement has Show method allowing to get user readable representat

Aug 15, 2017

Like

It's fast

Writing 1 million records with @ApacheParquet for .NET takes 9 seconds. And we haven't worked on performance optimisations yet.

Aug 03, 2017

Like

Apache Parquet for .NET released

I'm happy to announce that @ApacheParquet for .NET has reached it's stable status and is now officially released as v1.0 on Nuget. Check out out official GitHub page https://github.com/elastacloud/parquet-dotnet and NuGet feed https://www.nuget.org/packages/Parquet.Net.

Aug 01, 2017

Like

Microsoft TechNet UK Blog: Parquet.Net

We're very pleased that Microsoft's TechNet UK blog recently published an article on Parquet .Net, outlining what it is, how it works and what situations it is useful for. With that blog post as a starting point, this article will constantly update with handy links and resources for getting started

Jul 18, 2017

Like

GZIP support!

Yesterday we completed a commit that includes full (reader and writer) GZIP support for the parquet-dotnet library. We're working on other compression algorithms, including SNAPPY. As always find the latest features here.

Jul 11, 2017

Like

Progress in parquet-dotnet

We have been working on loads of things in Parquet-dotnet over the last few days and are increasingly happy with how it is progressing. There are a few idiosyncrasies that we've found in the implementations you can find in the field. The Spark implementation is the defacto standard we're working t

Jul 10, 2017

Like

Visit the Elastacloud website

bottom of page