Right, so I promised our awesome marketing lead Lucy that I would write a post every day for a month so here is the first one!
Right at the start of my Spark journey there was a bunch of things available between 0.4 and 0.7 and they were all disconnected and unpackaged from one another. Most of the various tools were very discrete. All you could really use were RDDs, SQL was disconnected and mllib was in tiny pieces. The Spark we know of today was a bunch of bits back then.
However, Tachyon, was an ever-popular fairly enticing idea. With most Spark applications it was necessary to pull all of your teeth out and and kiss goodbye to your weekends because deployment was full of mishaps and took ages (and the software was buggy).
Fast forward some 3-4 years and Tachyon becomes Alluxio and has a bunch of new features associated with it and we'll cover one such new feature in this post. Ordinarily you can only store files in the cache against keys but an experimental feature of Alluxio allows you store keys and values for a much faster lookup using cluster memory with durability.
To create a store you would probably use the following code.