• Channels

  • Contact

  • Main Site

  • More

    Use tab to navigate through the menu items.
    To see this working, head to your live site.
    • Categories
    • All Posts
    • My Posts

    Parquet.Net

    Parquet.Net is a .NET library to read and write Apache Parquet files.

    Parquet.Net
    Lucy Bealing

    Parquet.Net Use Case: Online Fashion Retailer

    E-commerce company BuyFashion.com , an up and coming online fashion retailer, have experienced enormous sustained growth over the past five years. They have recently decided to build a team of Data Scientists in order to take advantage of the analytics edge that they have long been following in the
    Views 
    0 comments0
    1
    Recent Activity:
    Jul 12, 2017
    Ivan Gavryliuk

    What's coming in Parquet.Net 3.1

    https://github.com/elastacloud/parquet-dotnet is about to be released in the following few days. Since v3.0 was pushed to the public, it saw a lot of interest and appraisal for it's incredible performance boost, however there were problems as well. To reiterate, v3.0 was a complete rewrite of 2.0
    Views 
    0 comments0
    0
    Recent Activity:
    Oct 03, 2018
    Andy Cross

    Faster parquet dotnet!

    Some pretty good news on the speed front - indicative of our gut feeling that parquet-dotnet is a fast implementation of Parquet for the dotnet framework due to our approach. It's worth noting that we will be tuning for performance optimisation in the coming months, so things will get better still.
    Views 
    0 comments0
    0
    Recent Activity:
    Oct 27, 2017
    Ivan Gavryliuk

    Parquet.Net 1.4 released

    This is a big release, including a lot of speed and stability improvements: https://github.com/elastacloud/parquet-dotnet/releases/tag/1.4.0 new features: - data representation internally changed to columnar format, resulting in much smaller memory footprint and better performance (#238) - added sup
    Views 
    0 comments0
    0
    Recent Activity:
    Oct 23, 2017
    Richard Conway

    Learning from the world of Apache Spark part 1

    Spark abstracts the idea of a schema from us by enabling us to read a directory of files which can be similar or identical in schema. A key characteristic is that a superset schema is needed on many occasions. Spark will infer the schema automatically for timestamps, dates, numeric and string types
    Views 
    0 comments0
    2
    Recent Activity:
    Sep 09, 2017
    Richard Conway

    Towards a new Data Science stack in .NET

    After several years of watching the Data Science ecosystem develop we decided to contribute back and try and influence it towards the myriad of capability that exists in .NET and Azure. The first pass at this was Parquet.NET, Parquet.USQL and the Parq application (now available through Chocolatey).
    Views 
    0 comments0
    0
    Recent Activity:
    Aug 28, 2017
    Richard Conway

    Big Data and Parquet (the Microsoft way)

    The Azure data platform is awash with ways of querying data. From relational, nosql and unstructured data you generally want to minimise the movement of data at the point of querying. At Elastacloud we're big fans of Microsoft's Azure Data Lake Store (ADLS) which offers us a highly performant way of
    Views 
    0 comments0
    2
    Recent Activity:
    Aug 13, 2017
    Ivan Gavryliuk

    Appending to file

    Latest @ApacheParquet for .NET introduces support for appending data to existing files and reading into CSV with type inferring. This is great news for big data scenarios because now you can generate massive files without spending much RAM by streaming data in chunks.
    Views 
    0 comments0
    0
    Recent Activity:
    Aug 03, 2017
    Andy Cross

    Chocolatey Parq

    As part of our efforts to drive down the barriers to working with Parquet in the .net and Microsoft ecosystem, the Parq commandline is our first go-to tool for quickly inspecting the contents and structure of a Parquet file. Until now, the process for running Parq was to git clone the repository, c
    Views 
    0 comments0
    0
    Recent Activity:
    Jul 27, 2017
    Andy Cross

    SNAPPY support!

    Things are moving rapidly forwards with a new PR merged in today that brings in a full SNAPPY support into parquet-dotnet. This pull request came from our newest collaborator, Sandy May - it's fantastic to see the team grow again.
    Views 
    0 comments0
    1
    Recent Activity:
    Jul 11, 2017
    Andy Cross

    Parq outputs

    Parq is a tool for Windows that allows the inspection of Parquet files. There are precious few tools that fit this category, and so when we were investing into parquet-dotnet we thought we'd build a console application that at least begins to address the deficit. There are three distinct output form
    Views 
    0 comments0
    0
    Recent Activity:
    Jul 10, 2017
    Andy Cross

    Milestone 1 achieved

    The team working on parquet-dotnet have completed Milestone 1 including the reading of a specific input file and handling of optional nullable values in parquet. You can track the amount of features we have implemented so far .
    Views 
    0 comments0
    0
    Recent Activity:
    Jul 06, 2017
    Andy Cross

    Simplified Serialization with ParquetConvert

    We've been working hard on Parquet.NET to give developers high level abstractions over Parquet so that there is an easy entrypoint into developing with Parquet that is not onerous for developers new to the format. As such, we've created ParquetConvert which allows the trivial creation of Parquet
    Views 
    0 comments0
    0
    Recent Activity:
    Oct 26, 2018
    Darren Fuller

    Parquet and U-SQL

    Did you know that it's possible to extract data from Parquet files in Azure Data Lake Analytics ? Well it is and the library just received a couple of updates, check it out over on its Github page. First of all the library has just received an update to bring it up to the latest version of Parqu
    Views 
    0 comments0
    0
    Recent Activity:
    Feb 13, 2018
    Ivan Gavryliuk

    Parquet on Xbox One

    In order to reach a wider audience and engage more developers to use Parquet even in their spare time we're happy to announce that Apache Parquet Viewer works on Xbox One! This is possible because the viewer is a Universal Windows app, therefore works on all windows enabled devices. Now when you ar
    Views 
    0 comments0
    0
    Recent Activity:
    Oct 24, 2017
    Ivan Gavryliuk

    How about viewing Parquet files?

    Traditionally viewing .parquet files requires some sort of online service, being it Apache Spark, Impala, Amazon AWS etc. However, when working in your local development environment it's really hard to see them, unless you write some soft of script printing it on a console. Even then, it's not reall
    Views 
    0 comments0
    0
    Recent Activity:
    Oct 23, 2017
    Ivan Gavryliuk

    Parquet.Net v1.2 Released

    new features: - INT64 (C# long) type is supported (#194) - Decimal datatype is fully supported (#209). This includes support for simple System.Decimal, and decimal types with different scales and precisions. Decimals are encoded by utilising all three encodings from parquet specs, however this can b
    Views 
    0 comments0
    0
    Recent Activity:
    Sep 06, 2017
    Ivan Gavryliuk

    Parquet.Net v1.1 released

    new features: - Reader supports nested structures. - Parquet output is now compatible with AWS Athena - Writer can append data to existing file improvements: - Parquet metadata sets page sizes according to standard - Schema and SchemaElement has Show method allowing to get user readable representati
    Views 
    0 comments0
    0
    Recent Activity:
    Aug 15, 2017
    Ivan Gavryliuk

    It's fast

    Writing 1 million records with @ApacheParquet for .NET takes 9 seconds. And we haven't worked on performance optimisations yet.
    Views 
    0 comments0
    0
    Recent Activity:
    Aug 03, 2017
    Ivan Gavryliuk

    Apache Parquet for .NET released

    I'm happy to announce that @ApacheParquet for .NET has reached it's stable status and is now officially released as v1.0 on Nuget. Check out out official GitHub page https://github.com/elastacloud/parquet-dotnet and NuGet feed https://www.nuget.org/packages/Parquet.Net.
    Views 
    0 comments0
    2
    Recent Activity:
    Aug 01, 2017
    Lucy Bealing

    Microsoft TechNet UK Blog: Parquet.Net

    We're very pleased that Microsoft's TechNet UK blog recently published an article on Parquet .Net , outlining what it is, how it works and what situations it is useful for. With that blog post as a starting point, this article will constantly update with handy links and resources for getting star
    Views 
    0 comments0
    0
    Recent Activity:
    Jul 18, 2017
    Andy Cross

    GZIP support!

    Yesterday we completed a commit that includes full (reader and writer) GZIP support for the parquet-dotnet library. We're working on other compression algorithms, including SNAPPY. As always find the latest features here .
    Views 
    0 comments0
    0
    Recent Activity:
    Jul 11, 2017
    Andy Cross

    Progress in parquet-dotnet

    We have been working on loads of things in Parquet-dotnet over the last few days and are increasingly happy with how it is progressing. There are a few idiosyncrasies that we've found in the implementations you can find in the field. The Spark implementation is the defacto standard we're working to
    Views 
    0 comments0
    0
    Recent Activity:
    Jul 10, 2017
    • Twitter Social Icon
    • LinkedIn Social Icon
    • Facebook Social Icon

    Visit the Elastacloud website