Tech

Opnieuw: A simple and intuitive retrying library for Python

February 5, 2020

In this post, we want to introduce our new, easy to use, intuitive and simple retry package: Opnieuw, which is responsible for all the network retries in Channable’s Python code.

Opnieuw is a general-purpose retrying library that simplifies the task of adding retry behavior to both synchronous as well as asynchronous tasks.

Channable is a feed processing tool that imports products from webshops and exports those to marketplaces, price comparison sites, and advertisement platforms. At Channable we consume a lot of external APIs for our daily operations. If you make enough API calls, even the most robust API will occasionally fail. By retrying our request we can usually solve this problem. Therefore, having a good retry mechanism is important for making our operations run smoothly.

Why we wrote Opnieuw

Before writing our own retry package, we tried several other retry packages. Unfortunately we fell into multiple pitfalls:

  • Due to too many knobs, it is easy to configure a retry in a way that does not make sense. For example, we had multiple places in our codebase where we used a combination of exponential backoff with a minimum and maximum delay, such that the effective delay would always be the minimum or the maximum; there was no exponential backoff going on. We solve this by providing just two parameters, where every combination of parameters is valid.

  • Due to a modular approach, doing the right thing is hard. Most places in our codebase were not using jitter, simply because adding jitter is a separate step that was often forgotten. We solve this by providing only a single retry strategy: exponential backoff with jitter.

  • Time units were unclear. When you see a line that reads:

    @retry(wait_exponential_multiplier=10)

    does that mean that the base delay is 10 milliseconds, or 10 seconds? Due to the nature of the APIs we interact with, reasonable wait times span several orders of magnitude, from milliseconds to minutes. We solve the unit problem by putting the unit in the parameter name. When we upgraded our codebase to our own retry decorator, we fixed multiple bugs where the original code confused seconds with milliseconds.

  • Parameters were unintuitive. All retry libraries that we used, required configuring the base delay for exponential backoff. To determine the base delay, we always found ourselves working backwards from a total wait time and a maximum number of calls. In Opnieuw, we accept those more intuitive parameters directly, and we solve for the base delay.

  • The meaning of parameters was unclear. When we checked the documentation of the libraries we used, we would find explanations like “Wait 2n * base_delay between each retry.” It is unclear here if that n starts counting at 0 or 1. With exponential growth, an off-by-one can make minutes difference after a few attempts. We solve this by providing more intuitive parameters, and by giving them a verbose but unambiguous name.

Opnieuw is our take at avoiding those pitfalls, to make a retry library that is easy to use. Let’s look at an example in order to demonstrate the power of Opnieuw. Suppose we want to parse https://tech.channable.com/atom.xml, and we want to retry on ConnectionError and HTTPError. We can add Opnieuw to our network handler as follows:

import requests 
from requests.exceptions import ConnectionError, HTTPError

from opnieuw import retry


@retry(
    retry_on_exceptions=(ConnectionError, HTTPError),
    max_calls_total=4,
    retry_window_after_first_call_in_seconds=60, 
) 
def get_page() -> str: 
    response = requests.get('https://tech.channable.com/atom.xml') 
    return response.text 

As is clear from the above example, by decorating the function we add retry behavior. You should not need to read any documentation to understand the snippet:

  • retry_on_exceptions specifies which exceptions we want to retry on.

  • max_calls_total is the maximum number of times that the decorated function gets called. By avoiding terms like “number of retries” and referring to calls instead, it is clear that this includes the initial call.

  • retry_window_after_first_call_in_seconds is the maximum number of seconds after the first call, where we would still do a new attempt. By avoiding terms like “stop after delay”, it is clear that this is not about the total time spent waiting, or the duration of the final delay, but the time elapsed since the first call was initiated.

In the above example, we would call get_page() at least once and at most 4 times, and the last call would be at most 60 seconds after the first call. On a ConnectionError or HTTPError we would retry, and the wait time after a failed attempt grows exponentially. Other exceptions would bubble up up and not be retried.

Opnieuw is a simple and easy way to add exponential backoff retries with jitter to a call. In addition to retry(), Opnieuw features retry_async() for retrying asynchronous tasks. All the parameters remain the same, the only difference is that it can be used with async functions.

About the name

“Opnieuw” means “again” in Dutch. We had an internal naming competition and Opnieuw won out over Perseverance and channable-retry.

Conclusion

At Channable, we saw an opportunity to improve upon existing retry packages. Unclear APIs were a source of bugs and counter-intuitive behavior. By prioritizing the parameters that users care about, Opnieuw makes it easy to reason about retry behavior of network operations.

Today we are releasing Opnieuw as open source.

Discuss this post on Reddit or on Hacker News

Wesley BowmanDevelopment Manager Product Team
Sayyed NaqiDeveloper
Ruud van AsseldonkDeveloper