If, in your cloud-based application, you can live with eventual consistency, Azure gives you three tools out of the box that will give you reliable, extendable and scalable asynchronous applications.
Up until now, in this Coding Azure series, I’ve focused on creating a secure, cloud-native, three-tier synchronous application consisting of a client-side or server-side frontend and a RESTful Web Service backend.
But there are problems with that traditional architecture, and they’re usually related to the backend Web Service: Can the Web Service scale as demand increases? How reliable/robust is the Web Service (and what happens when the Web Service isn’t available)? And how do you extend the application to cleanly add new functionality.
For example, imagine that (at its peak periods) this three-tier application is getting one request per second through its frontend, but the backend Web Service is taking two seconds to process each request.
You’d probably address that by scaling out the backend’s App Service to multiple instances and distributing those incoming requests across all of those instances. You’d have to deal with the latency in starting up those new instances (even containers take some time to warm up) and, of course, pay for the additional cost of those extra instances, but it’s not a bad solution.
But scaling out creates some architectural problems. While we’d prefer all the requests to our backend to be stateless, the reality is that many business transactions are dependent on knowing what went on in the user’s previous activities. That means that your backend needs to deal with maintaining state across requests.
Handling those “semi-stateless” requests will mean either enabling some kind of session-oriented server-side cache or returning state information to the client and requiring the client to return it on every request. Alternatively, you could turn on server affinity (also known as “sticky sessions”) which, unfortunately, has the side effect of preventing requests from being distributed evenly across all your backend instances, taking some of the fun out of scaling out. Sometimes you have to do both.
And then there’s reliability: Scaling out is also a common way to address reliability. If one instance fails, subsequent requests are routed to another running instance so the application can keep running (though, of course, that new instance will now be, at least temporarily, overloaded). However, worst case scenario, if there is only one instance of the backend currently running and it fails, then some requests may be lost as a new instance is spun up.
Another issue is that the communication between the frontend and the backend Web Service isn’t transactional. It’s possible for the frontend to send a request to the Web Service and then fail before receiving the backend’s response, for example. In that scenario, the backend Web Service will finish its updates, but any follow-on activities that the frontend was supposed to perform won’t happen.
Extending an existing synchronous application can also be challenging. You must either add functionality to the frontend (to call an additional Web Service, for example) or add functionality to the backend (to perform additional processing). Either of those options, by piling on additional work to the frontend or backend, also impacts their ability to scale to handle increased demand. And even ignoring all of that, both of those solutions require you to modify existing applications that were, up until your new “enhancements,” working just fine. Consider the odds that will continue after your changes.
All these issues can often be addressed by moving to asynchronous systems: Instead of having your frontend access a Web Service, your frontend adds a message to a queue or raises an event that a backend processor (eventually) responds to and processes.
Moving to an asynchronous application does require dividing a transaction into the part of the transaction that needs to be dealt with immediately by the frontend and the part that can be deferred until the backend processor gets around to it.
A good model for this kind of division is your experience with placing an order with Amazon.com. On Amazon, when you click the submit button on your order, you get an immediate response that essentially says, “Thanks for your order,” However, the actual order processing is performed by some backend processor … eventually. When that backend processing does complete, you’re notified through an email.
The benefit here is that the “immediate processing” portion of the transaction (including writing to a queue or raising an event) is usually much faster and less resource-intensive than the “deferred processing.” This reduces the demand on the frontend so that it may not need to scale out at all. The “deferred processing” is typically more resource-intensive and time-consuming … but that doesn’t matter.
It doesn’t matter because, in an asynchronous system, if the frontend is adding/sending messages faster than the backend processor can handle them, the “deferred processing” requests will just wait until the backend gets around to them. When demand falls off, the backend process will catch up. If demand doesn’t ever fall off (and the deferred processing is being deferred “too long”) you can start a second backend processor to reduce the length of the queue.
It gets better: Because, in a queue, backend processors can review all the messages in the queue, a backend processor can process “batches of messages” that share state. They can even, potentially, implement more efficient batch processing for those requests than a Web Service that is always responding to individual requests.
Asynchronous systems also improve reliability: If the backend processor in a queue-based or event-based system is down for any reason (actual failure, upgrades, scheduled maintenance), messages just sit on the queue until the backend processor restarts. Events work in a similar way when an event can’t be delivered to a consumer.
And, on top of all that, one of the services—the Azure Service Bus—is transactional. This means that, if a transaction fails, messages that were created in the transaction aren’t added to the queue and any messages processed as part of the transaction are automatically returned to the queue. Essentially, the Service Bus cleans up after itself in the event of a failure.
Even extensibility is easier in asynchronous systems. With either event-based or queue-based systems, new backend processors can be added to perform new operations on existing messages without impacting the existing processors.
So, which service is best for your application?
At a very high level, picking between queues and events is pretty easy. If your asynchronous application is going to have multiple backend processors (or, potentially, an unknown number of backend processors), you probably want to use the Azure Event Grid. If you have a single backend processor, then you probably want to use a queue-based system. If you need lots of “queue-based” features, you could use a Service Bus; otherwise a Storage Queue will probably meet your needs.
While that’s a good general rule, it’s not true for every case. If you want to leverage transactions, for example, then you’ll want to use the Service Bus. And, like events, the Service Bus also supports having multiple backend processors (though not as flexibly as the events).
If you’re trying to decide between using a Storage Queue or a Service Bus, there are lots of good comparisons available. Fundamentally, if you think of the Storage Queue as a pickup truck (big capacity, basic features, lower price) and the Service Bus as the tricked-out sports car (less capacity, lots of features, higher price) then you have a good mental model of the differences.
But, to make matters more difficult, you can probably write enough code to give your Storage Queue-based application pretty much all the features that a Service Bus–based application has.
For example, a Service Bus works as a “message broker” and doesn’t care about the format of your message. A Storage Queue, on the other hand, expects that all your messages will be strings. As a result, when working with a Storage Queue, you’ll need to convert any objects you want to add to the queue into JSON documents (specifically, your message should be compatible with being inserted into a Base64-encoded XML document). Fortunately, that’s pretty easy to do, as you’ll see in my post on adding messages to a Storage Queue.
Not surprisingly, there are some design patterns that crop up when working with asynchronous applications.
Not all of those patterns are supported by the Azure Services, however. Ideally, for example, you want “exactly once” delivery—the guarantee that no message is lost, duplicated or orphaned, and the backend processor gets each message once. Unfortunately, none of the three Azure services guarantee that, so your backend processor will have to deal with potential duplicate messages.
Other patterns are, however, well supported, and these are that I’ll be covering in my upcoming posts: Dead letter queues (to allow the review and processing of messages removed from a queue) and the Valet Key pattern, which is important enough to be worth discussing.
The Valet Key pattern separates authorizing access to a resource from reading and writing the resource. This pattern can be valuable when processing a message might be time-consuming and resource-intensive (reading a large message from a queue, for example).
With the Valet Key pattern, the application can initiate the process of processing messages by acquiring the permissions to access the queue and then passing a key that grants those permissions to a client (the key is usually both time-limited and permission-limited).
Alternatively, a client-side application might request a key from a server-side resource by calling a Web Service. The client (which has no permission to access the resource without the key) uses the key to do the required processing, avoiding embedding any secrets in the client-side code.
In my upcoming posts, I’m going look at the services that Azure provides for creating asynchronous applications: Storage Queues, Service Buses and the Event Grid (I’m going to ignore the Event Hub because it’s targeted more at continuous streams of telemetry rather than discreet business transactions).
I’ll be breaking up each of those topics over several posts that will cover configuring and securing the service, creating frontends (both client-side and server-side) to send messages to the service, and creating a backend processor to read the messages (server-side only).
I’m going to start this series with Storage Queues in my next post. After that, I’ll move onto the Service Bus and finish with the Event Grid.
Peter Vogel is both the author of the Coding Azure series and the instructor for Coding Azure in the Classroom. Peter’s company provides full-stack development from UX design through object modeling to database design. Peter holds multiple certifications in Azure administration, architecture, development and security and is a Microsoft Certified Trainer.