Web APIs can sometimes fail for no particular reason;
therefore packages accessing them often add some robustness to their code by retrying calling the API a few times if there was an error.
The two high-level R HTTP clients, httr
and crul
, offer ready-made sub-routines for such cases, but some developers like me have rolled their own out of ignorance. 😅
In this post I shall present the retry sub-routines of httr
and crul
, and more generally reflect on (not) reinventing the wheel in your R package. 🎡
The few figures of this post come from the funny HTTP Cats website and are hyperlinked.
Retry in httr and crul
Relying on internet resources might make a package fragile, since the connection or interfaced web API can fail. Therefore, in packages wrapping APIs, one can find some variation of the following pseudo-code that retries a few times:
maxtry <- 5
try <- 1
resp <- do_an_internet_thing()
while (try <= maxtry && resp$status >= 400) {
resp <- do_an_internet_thing()
try <- try + 1
Sys.sleep(some_waiting_time_increasing_with_try(try))
}
A search on the R-hub’s CRAN source code mirror e.g. surfaces such a function in a package.
As underlined in httr
’s excellent “Best practices for API packages” vignette, “it’s extremely important to make sure to do this with some form of exponential backoff: if something’s wrong on the server-side, hammering the server with retries may make things worse, and may lead to you exhausting quota (or hitting other sorts of rate limits).”
Now, if you need such a pattern in your API package, you could use a shortcut rather than patiently ingesting examples and best practice… by using ready-made features of either httr
or crul
.
Retry in httr
The httr
package contains a handy RETRY()
function that, well, safely retries a request until it succeeds or until the maximal number of tries is reached.
It uses best practice written up by AWS to define the increasing waiting time.
If there’s no error, it simply behaves like the corresponding verb would.
httr::RETRY("GET", "http://httpbin.org/status/200")
## Response [http://httpbin.org/status/200]
## Date: 2020-03-28 12:29
## Status: 200
## Content-Type: text/html; charset=utf-8
## <EMPTY BODY>
httr::GET("http://httpbin.org/status/200")
## Response [http://httpbin.org/status/200]
## Date: 2020-03-28 12:29
## Status: 200
## Content-Type: text/html; charset=utf-8
## <EMPTY BODY>
Now, what happens if the API keeps failing, which the example URL below ensures?
httr::RETRY(
"GET",
"http://httpbin.org/status/500",
times = 5, # the function has other params to tweak its behavior
pause_min = 5,
pause_base = 2)
## Request failed [500]. Retrying in 5 seconds...
## Request failed [500]. Retrying in 5 seconds...
## Request failed [500]. Retrying in 5 seconds...
## Request failed [500]. Retrying in 29.2 seconds...
## Response [http://httpbin.org/status/500]
## Date: 2020-03-28 12:37
## Status: 500
## Content-Type: text/html; charset=utf-8
## <EMPTY BODY>
The function also makes use of the Retry-After
HTTP header so, in short, if the API says “hey please wait 33 seconds” that’s what the waiting time will be.1
To learn more about httr::RETRY()
, head over to its docs and source code.
A wild-caught example of a CRAN package using httr::RETRY()
is the antanym
package, whose RETRY()
use can be traced back to a peer-review of the package by Lorenzo Busetto for rOpenSci.
Retry in crul
What is crul
? crul
is an R client organized around R6 classes.
The retry method for crul
HttpClient
class was modeled after httr
’s RETRY()
. I replaced my homegrown retrying code with it in a pull request.
crul
’s retrying has two interesting differences with httr
’s retrying:
-
It does not wrap the HTTP calls in
tryCatch
so the only errors it handles gracefully are HTTP errors. -
It offers the possibility to use a callback function, “if the request will be retried and a wait time is being applied. The function will be passed two parameters, the response object from the failed request, and the wait time in seconds.”. For instance before retrying maybe you could query an API status endpoint if such a thing exists.
To learn more about crul
’s retry
method, head over to its docs and source code.
On not reinventing the wheel
Once I heard about httr::RETRY()
and the crul
retry
method, I was a bit disappointed at having reinvented the wheel.
Could one avoid doing that too often?
How to not reinvent the wheel in your code
As an R package developer, how do you know about functions and methods already existing in packages your package depends on, or could depend on, or could draw inspiration from? Sometimes you might guess your problem is something others encountered but you might not even know the right words to present it (mocking for instance!).
In a blog post Jeff Atwood states “If anything, “Don’t Reinvent The Wheel” should be used as a call to arms for deeply educating yourself about all the existing solutions”. General strategies for learning more and more about the R ecosystem include
-
reading the whole reference of packages your package depends on, and even its changelog once in a while, because you might as well use all the gems of a package once you’ve decided to trust it;
-
reading the R source of packages similar to yours;
-
trying to keep up-to-date using one or several communication channel(s);
-
spreading the word about cool features which is more or less what this post does, and what Sharla Gelfand does in her great “Sharing two #rstats functions, most days.” tweets whose content is gathered in a GitHub repository;
Of course, “deeply educating yourself” takes time one doesn’t necessarily have and which no one should feel guilty about. Sometimes you’ll re-implement something that already exists elsewhere, and it’s fine!
Lastly, you might even want to create your own (better) version, which is obviously neat. 😎
How to help users of your package not reinvent the wheel
As the developer of a package, you might help users find useful features by… working on its docs.
A good time investment could be to create a pkgdown
website with a well-organized reference index.
Furthermore, some features could be added to your package if they’re often implemented downstream.
Conclusion
In this post we’ve presented useful functions implementing retries for API packages in httr
and crul
.
We’ve also discussed ways to not miss such useful shortcuts for one’s code, mostly by learning more about existing R packages, whilst acknowledging such exploration takes time. What’s your favorite lesser known package gem or R “joygret” moment2?
-
If your only worry is rate limiting and there are no requests happening at the same time, you might find the
ratelimitr
package handy to avoid getting 429 status codes. ↩︎ -
joygret was defined by Hilary Parker in a blog post about writing R packages as “that familiar feeling of the joy of optimization combined with the regret of past inefficiencies”. ↩︎