After several years of watching the Data Science ecosystem develop we decided to contribute back and try and influence it towards the myriad of capability that exists in .NET and Azure. The first pass at this was , Parquet.USQL and the Parq application (now available through Chocolatey). As a long time user of Apache Spark and consultant I've always followed the mantra of letting customers dictate the technology that they're interested in using the skills that they know. Over the past few years there have been a large volume of options for the Data Scientist that Microsoft has purveyed but most of these are for popular computing frameworks where you have to know R, Python and Scala. What for the .NET developer though? is a pretty good library and there are a few others and also the F# community has had a few attempts at creating libraries. We've decided to accelerate the process and have begun to build Dataframe.Utils in C# which already have a reasonable stats and maths library. The idea will be eventually to have a C# and SQL expressions parser which will allow data shaping and wrangling for Data Scientists. We also plan to abstract and operationalize this process allowing code to be transformed and pushed to HDInsight and Azure Data Lake Analytics so that basic prototypes on a laptop can be reused without any additional effort at scale in the cloud.
The name of this is just a placeholder for the time being since it's tightly coupled to our work on Parquet but you should be able to use it to load and view Parquet files and get summary statistics. In the coming days you'll see some graphic capability as well in binned histograms, correlation matrices and distribution overlays. Check out the initial version but watch for name changes in the short term!
One important thing here is the ability to follow process. Our teams at Elastacloud follow the lifecycle of Crisp-DM and as a first pass for this new approach and application stack we're modelling the lifecycle of our customised view of Crisp-DM and Agile which has worked stunningly well for us. You'll be hearing more about our approach from @andyelastacloud and @lauradatasci over the next few months but @aloneguid and I will be posting a lot of updates on our tooling and technicalities of how we see building technology to support the underlying process.