Azure Data Factory | elastacloud-channels

Here at Elastacloud we have been using Azure Data Factory (ADF) since it initial release, I’ve recently been catching up on some of the new features that have been released over the past few months. Some of which are pretty neat given some of the issues we’ve encountered in a number of projects over the years. I’m particularly pleased that we can now have on-demand spark clusters as a compute resource, previously we were only able to link always on clusters to ADF. This gives us the flexibility of spark for data transformation and augmentation without the added expense of being required to use an always on cluster. Some additional features I’m excited about include the addition of compression/decompression support, service principal authentication for Azure Data Lake Store, enhanced support for manipulating json. These new features (among others) have been integrated into Microsofts new ADF offering, aptly named Azure Data Factory V2 (preview). The major enhancements here involve the ability orchestrate SSIS packages in Azure and additional control flow and scale capabilities. The Azure-SSIS IR is a fully managed cluster of Azure VMs (nodes) dedicated to run your SSIS packages in the cloud. Control flow and scale capabilities are very interesting providing features such as chaining and branching of activities in a pipeline, ability to pass parameters and custom state between activities, trigger based flows allowing (thankfully) on demand execution of pipelines, delta flows for allowing a delta copy to a relational database.