Industry_Light_870x220

Learn how the Polly Project, an open source .NET framework that provides patterns and building blocks for fault tolerance and resilience in applications, can be used with .NET Core.

Error handling and resuming reliably in case of an error are the Achilles’ heel of many software projects. Applications that were running smoothly all along suddenly turn into chaotic nightmares as soon as network connectivity stutters or disk space depletes. Professional software stands out by dealing with exactly those edge cases (at a certain adoption rate, those “edge cases” even become normality), expecting them and handling them gracefully. Being able to rely on an existing and battle-hardened framework for such scenarios makes things even easier.

Enter Polly

This is where The Polly Project comes into play. Polly is an open source .NET framework that provides patterns and building blocks for fault tolerance and resilience in applications.

the-polly-project

The Polly Project Website

Polly is fully open source, available for different flavors of .NET starting with .NET 4.0 and .NET Standard 1.1 and can easily be added to any project via the Polly NuGet package. For .NET Core applications this can be done from the command line using the dotnet CLI command.

dotnet add package Polly

Or by adding a PackageReference to the .csproj file (at the time of writing, the latest version was 6.1.2).

<PackageReference Include="Polly" Version="6.1.2" />

When using Visual Studio, “Manage NuGet Packages…” is the quickest way to go.

polly-nuget-vs

Adding the Polly NuGet Reference in Visual Studio

Now that Polly has been added to the project, the question arises: How and in which scenarios can it be used? As is so often the case, this is best explained using actual code and a practical example.

To keep things relatively simple, let’s assume we have an application that persists data or settings continuously in the background by writing them to disk. This happens in a method called PersistApplicationData.

private void PersistApplicationData()

Since this method is accessing the file system, it’s more or less bound to fail from time to time. The disk could be full, files could be locked unexpectedly by indexing services or anti-virus software, access rights might have been revoked... basically anything could happen here. File system access should always be treated as an external dependency that’s out of an application’s control. Therefore, as a basic minimum, a try catch block is required.

The next obvious question is what kinds of exceptions should be caught in the catch block? Going for the Exception base class covers all possible cases, but it also might be too generic. Exceptions like NullReferenceException or AccessViolationException usually imply severe problems in the application’s logic and should probably not be handled gracefully. So catching specific exceptions like IOException or InvalidOperationException might be the better option here. Hence, we end up with two catch blocks for this example.

Since we don’t want to completely ignore those exceptions, at least some logging code needs to be put in place. So we need to duplicate a call to some logging method in the catch blocks.

As a next step, we have to to think about whether or how the application should continue in case an actual exception has occurred. If we assume that we want to implement a retry pattern, an additional loop outside the try catch block is required to be able to repeat the call to PersistApplicationData. This can either be an infinite loop or a loop that terminates after a specific number of retries. In any case, we manually need to make sure that the loop is exited in case of a successful call.

Last but not least we should also consider that the likelihood of failure is really high if a subsequent call to PersistApplicationData happens again immediately. Some kind of throttling mechanism is probably required. The most basic way to do that would be a call to Thread.Sleep using a hard-coded number of milliseconds. Or we could use an incremental approach by factoring in the current loop count.

Putting all these considerations in place, a simple method call quickly turned into a 20+ line construct like this.

private void GuardPersistApplicationData()
{
  const int RETRY_ATTEMPTS = 5;
  for (var i = 0; i < RETRY_ATTEMPTS; i++) {
    try
    {
      Thread.Sleep(i * 100);
      // Here comes the call, we *actually* care about.
      PersistApplicationData(); 
      // Successful call => exit loop.
      break;
    }
    catch (IOException e)
    {
      Log(e);
    }
    catch (UnauthorizedAccessException e)
    {
      Log(e);
    }
  }
}

This simple example illustrates the core problem when it comes to fault-tolerant and resilient code: It’s often not pretty to look at and even hard to read because it obscures the actual application logic.

Resilient and fault-tolerant code is necessary... but not always “pretty” to look at.

The obvious solution to that problem are generically reusable blocks of code that handle those identified concerns. Instead of reinventing the wheel and writing these blocks of codes again and again, a library like Polly should be our natural weapon of choice.

Polly provides building blocks for exactly those use cases (and many more) we identified before in the form of policies. So let’s take a look at these policies in more detail and how they can be used for the example above.

Retry Forever

The most basic Policy that Polly provides is RetryForever, which does exactly what its name suggests. A specific piece of code (here: PersistApplicationData) is executed over and over again until it succeeds (i.e. it does not throw an exception). The policy is created and applied by defining the expected exceptions first via a call to Policy.Handle. Then RetryForever specifies the actual policy used and Execute expects the code which will be guarded by the policy.

Policy.Handle<Exception>()
  .RetryForever()
  .Execute(PersistApplicationData);

Again, we don’t want to generically handle all possible exceptions but rather specific types. This can be done by providing the according type arguments and combining them using the Or method.

Policy.Handle<IOException>().Or<UnauthorizedAccessException>()
  .RetryForever()
  .Execute(PersistApplicationData);

Consequently, catching those exceptions silently is really bad practice, so we can use an overload of RetryForever that expects an expression which gets called in case of an exception.

Policy.Handle<IOException>().Or<UnauthorizedAccessException>()
  .RetryForever(e => Log(e.Message))
  .Execute(PersistApplicationData);

Retry n Times

The RetryForever policy already covered a part of the requirements we identified initially, but the concept of a potentially infinite number of calls to PersistApplicationData is not what we had in mind. So we could opt for the Retry policy instead. Retry behaves very similar to RetryForever with the key difference that it expects a numeric argument which specifies the actual number of retry attempts before it gives up.

Policy.Handle<Exception>()
  .Retry(10)
  .Execute(PersistApplicationData);

Similarly, there is also an overload of Retry that allows the caller to handle an eventual exception and additionally receives an int argument specifying how many times the call has already been attempted.

Policy.Handle<Exception>()
  .Retry(10, (e, i) => Log($"Error '{e.Message}' at retry #{i}"))
  .Execute(PersistApplicationData);

Wait and Retry

The last requirement that is still unfulfilled from the initial example is the possibility to throttle the execution of the retry mechanism, hoping that the flaky resource which originally caused this issue might have recovered by now.

Again, Polly provides a specific policy for that use case called WaitAndRetry. The simplest overload of WaitAndRetry expects a collection of Timespan instances and the size of this collection implicitly dictates the number of retries. Consequently, the individual Timespan instances specify the waiting time before each Execute call.

Policy.Handle<Exception>()
  .WaitAndRetry(new [] { TimeSpan.FromMilliseconds(100), TimeSpan.FromMilliseconds(200) })
  .Execute(PersistApplicationData);

If we wanted to calculate those wait times dynamically, another overload of WaitAndRetry is available.

Policy.Handle<Exception>()
  .WaitAndRetry(5, count => TimeSpan.FromSeconds(count))
  .Execute(PersistApplicationData);

An infinite amount of retries using a dynamic wait time is also possible by using WaitAndRetryForever.

Policy.Handle<Exception>()
  .WaitAndRetryForever(count => TimeSpan.FromSeconds(count))
  .Execute(PersistApplicationData);

Circuit Breaker

The last policy we want to take a look at is slightly different from those we got to know so far. CircuitBreaker acts like its real-world prototype, which interrupts the flow of electricity. The software counterpart of fault current or short circuits are exceptions, and this policy can be configured in a way that a certain amount of exceptions “break” the application’s flow. This has the effect that the “protected” code (PersistApplicationData) simply will not get called any more, as soon as a given threshold of exceptions has been reached. Additionally, an interval can be specified, after which the CircuitBreaker recovers and the application flow is restored again.

Because of that pattern, this policy is usually used by setting it up initially and storing the actual Policy instance in a variable. This instance keeps track of failed calls and the recovery interval and is used to perform the Execute call in a different place.

  .Handle<IOException>().Or<UnauthorizedAccessException>()
  .CircuitBreaker(5, TimeSpan.FromMinutes(2));
  // ...
  policy.Execute(PersistApplicationData);

But Wait, There’s More!

The policies demonstrated above offer only a small peek into the versatile functionality that Polly provides. Each of these policies is e.g. also available in an Async flavor (RetryForeverAsync, RetryAsync, WaitAndRetryAsync, CircuitBreakerAsync) while still providing that same ease of use as the synchronous counterpart.

Just the CircuitBreaker policy alone offers multiple additional ways of configuration which are documented in detail on the GitHub repository. In general, this repository, its documentation and the samples are a great starting point for learning about the policies provided by Polly and the core concepts of resilience and transient-fault-handling. Hopefully, it serves as an inspiration to get rid of custom, hand-crafted error handling code, or even better, it helps to avoid writing that kind of code in future projects and use the powers of Polly instead.


Wolfgang Ziegler
About the Author

Wolfgang Ziegler

Wolfgang is a professional software developer, techie, maker and geek dad located in Linz, Austria. By day he develops software at Dynatrace, and by night he's usually busy with all kinds of side projects involving electronics, 3D printing and, of course, code.
Follow him on Twitter at @z1c0 or visit his website and blog.

Related Posts

Comments

Comments are disabled in preview mode.