E-commerce company BuyFashion.com, an up and coming online fashion retailer, have experienced enormous sustained growth over the past five years. They have recently decided to build a team of Data Scientists in order to take advantage of the analytics edge that they have long been following in th
top of page
To see this working, head to your live site.
Parquet.Net is a .NET library to read and write Apache Parquet files.
https://github.com/elastacloud/parquet-dotnet is about to be released in the following few days. Since v3.0 was pushed to the public, it saw a lot of interest and appraisal for it's incredible performance boost, however there were problems as well. To reiterate, v3.0 was a complete rewrite of 2.0 a
Some pretty good news on the speed front - indicative of our gut feeling that parquet-dotnet is a fast implementation of Parquet for the dotnet framework due to our approach. It's worth noting that we will be tuning for performance optimisation in the coming months, so things will get better still.
This is a big release, including a lot of speed and stability improvements: https://github.com/elastacloud/parquet-dotnet/releases/tag/1.4.0 new features: - data representation internally changed to columnar format, resulting in much smaller memory footprint and better performance (#238) - added su
Spark abstracts the idea of a schema from us by enabling us to read a directory of files which can be similar or identical in schema. A key characteristic is that a superset schema is needed on many occasions. Spark will infer the schema automatically for timestamps, dates, numeric and string types
After several years of watching the Data Science ecosystem develop we decided to contribute back and try and influence it towards the myriad of capability that exists in .NET and Azure. The first pass at this was Parquet.NET, Parquet.USQL and the Parq application (now available through Chocolatey).
The Azure data platform is awash with ways of querying data. From relational, nosql and unstructured data you generally want to minimise the movement of data at the point of querying. At Elastacloud we're big fans of Microsoft's Azure Data Lake Store (ADLS) which offers us a highly performant way of
Latest @ApacheParquet for .NET introduces support for appending data to existing files and reading into CSV with type inferring. This is great news for big data scenarios because now you can generate massive files without spending much RAM by streaming data in chunks.
As part of our efforts to drive down the barriers to working with Parquet in the .net and Microsoft ecosystem, the Parq commandline is our first go-to tool for quickly inspecting the contents and structure of a Parquet file. Until now, the process for running Parq was to git clone the repository,
Parq is a tool for Windows that allows the inspection of Parquet files. There are precious few tools that fit this category, and so when we were investing into parquet-dotnet we thought we'd build a console application that at least begins to address the deficit. There are three distinct output for
We've been working hard on Parquet.NET to give developers high level abstractions over Parquet so that there is an easy entrypoint into developing with Parquet that is not onerous for developers new to the format. As such, we've created ParquetConvert which allows the trivial creation of Parquet f
Did you know that it's possible to extract data from Parquet files in Azure Data Lake Analytics? Well it is and the library just received a couple of updates, check it out over on its Github page. First of all the library has just received an update to bring it up to the latest version of Parquet .
In order to reach a wider audience and engage more developers to use Parquet even in their spare time we're happy to announce that Apache Parquet Viewer works on Xbox One! This is possible because the viewer is a Universal Windows app, therefore works on all windows enabled devices. Now when you
Traditionally viewing .parquet files requires some sort of online service, being it Apache Spark, Impala, Amazon AWS etc. However, when working in your local development environment it's really hard to see them, unless you write some soft of script printing it on a console. Even then, it's not reall
new features: - INT64 (C# long) type is supported (#194) - Decimal datatype is fully supported (#209). This includes support for simple System.Decimal, and decimal types with different scales and precisions. Decimals are encoded by utilising all three encodings from parquet specs, however this can b
new features: - Reader supports nested structures. - Parquet output is now compatible with AWS Athena - Writer can append data to existing file improvements: - Parquet metadata sets page sizes according to standard - Schema and SchemaElement has Show method allowing to get user readable representat
I'm happy to announce that @ApacheParquet for .NET has reached it's stable status and is now officially released as v1.0 on Nuget. Check out out official GitHub page https://github.com/elastacloud/parquet-dotnet and NuGet feed https://www.nuget.org/packages/Parquet.Net.
We're very pleased that Microsoft's TechNet UK blog recently published an article on Parquet .Net, outlining what it is, how it works and what situations it is useful for. With that blog post as a starting point, this article will constantly update with handy links and resources for getting started
We have been working on loads of things in Parquet-dotnet over the last few days and are increasingly happy with how it is progressing. There are a few idiosyncrasies that we've found in the implementations you can find in the field. The Spark implementation is the defacto standard we're working t
bottom of page