A few days ago, Microsoft announced Azure Event Grid in preview, a new service to route events from publishers to subscribers, with a lot of integration with the rest of the Azure services. Tom Kerkhove has done a great write-up of the basics over at https://www.codit.eu/blog/2017/08/21/exploring-azure-event-grid.
Aside from events launched by the Azure system itself, we can use the capabilities of the Event Grid to support applications and data pipelines, using our own event publishers. According to the documentation, the Grid can handle up to 10 million events per second, so it would outperform any other messaging technology available in Azure. Does that mean Event Hubs, the current prime candidate for 'fat pipe', is no longer going to be useful? Let's take a look at some of the key differences between Event Grid and Event Hubs.
Let's start off with that bandwidth. Microsoft is commiting to the promise of an Event Grid that can handle 10 million events per second, per region. There are no througput units to pay for, and no machines to host. It is an impressive figure, but its usage is hampered by the pricing. In preview, 1 million operations will cost you $0.3. Unlike Event Hubs, any operation on the Grid will add to that cost. Sending a million messages and having them received by a single subscriber will double the cost to $0.6. One more subscriber and that'll be $0.9. Let's assume an extreme case, where a big data pipeline is built to handle 1 million events / s. Even with just a single subscriber, costs will balloon to $0.6/s, or $1.6 million per month.
Let's compare that to Event Hubs. A single standard Event Hub can't handle a million events per second, but 50 of them, each with 20 throughput units, can. The sending logic would need to distribute the messages across the Hubs, but that is a solveable problem. Event Grid events cost $0.028 per million messages. Note the extra 0. On top of that, we'll need 1000 throughput units. All totalled, Event Hubs would cost you $97,315 per month. This is a sixteenth of the cost of the Event Grid, with only a bit more load balancing logic!
For a data pipeline that big, it would also make sense to inquire with Microsoft about a Dedicated Event Hub. A dedicated capacity unit can handle about 250,000 messages per second, so a single Hub with 4 capacity units would be enough for our data. 4 CU's at the advertised price of $733 per day comes out at $90908 per month. A magnitude less than Event Grid in Preview pricing, with all the other features of Event Hubs to boot.
The most important of those features is durability. Event Grid is meant as a distribution system, not as queuing. Pushing an event into it means it gets pushed out immediately, and if it doesn't get handled in time, it's gone forever. There is no durable storage, and an event handler that can't keep up with the speed events are sent at, will start missing out on data. This is fine for some application logic, but potentially disastrous if the events are important and should not be missed, or well-ordered.
In Event Hubs, publishers and subscribers move along at their own pace, and read and write from a durable storage. Not only can a publisher send data faster than subscribers, and it will get queued up for the subscriber, data can be kept in the Event Hub up to 7 days and then replayed. This allows a subscriber that crashed to resume from where it left off, and even restart from an older point in time and reprocess events. Streaming Analytics uses this exact behaviour to allow stopping and starting without any data loss, and so can any custom applications. It guarantees a much more robust data pipeline and application architecture.
Does this mean Event Grid is useless when we already have the power of Hubs at our disposal? No! Event Grid doesn't require any setup, and its advanced matching and routing capabilities make it a great candidate for triggering application logic, sending out logging and reacting to application metrics. Its native integrations with Functions, Logic Apps, Automation and general webhooks will make an invaluable tool for governance of a pipeline and to enable better self-healing systems. Just don't use it as a data pipeline :)
There are other features and differences between the two systems. I'm sure more of them will be highlighted in the next weeks and months as more people get their hands on Event Grid and see what it's capable of. It's an exciting announcement, and we've only scratched the surface.