E-commerce company , an up and coming online fashion retailer, have experienced enormous sustained growth over the past five years. They have recently decided to build a team of Data Scientists in order to take advantage of the analytics edge that they have long been following in the tech media.
Their existing platform is a Microsoft .NET Framework ecommerce system and the majority of their existing team are skilled in C# and familiar with building large scale applications in this stack. The company’s new Data Science team team have chosen to use Apache Spark on top of Azure HDInsight and are skilled in Scala and related JVM languages.
The Data Science team need to consume feeds that come from the core engineering team so that the system interoperates in a seamless manner. The initial implementation interoperates based on a JSON message format. Whilst this is simple from the point of view of the core engineering team, it places a significant burden on the Data Science team, as they have to wrangle the data.
Thankfully, the core engineering team discovered meaning that they can eliminate the need for wrangling the Data Science team need to do by natively writing the interop feed in Parquet directly from .NET.