Checking the inputs of your R functions

Are you, like we were, tired of filling your functions with argument checking code that sometimes ends up being longer that the core of the function itself? Are you trying to find what is the most efficient approach to check inputs easily and without forgetting any edge cases? Read about our exploration into the various ways to check your function inputs in R in this blog post. And please share your own tips and discoveries in the comment section!

Introduction: the dangers of not checking function inputs

R functions and R packages are a convenient way to share code with the rest of the world but it is generally not possible to know how, or with what precise aim in mind, others will use your code. For example, they might try to use it on objects that your function was not designed for. Let’s imagine we have written a short function to compute the geometric mean:

geometric_mean <- function(...) {
  
  return(prod(...)^(1/...length()))
  
}

When you tested the function yourself, everything seemed fine:

geometric_mean(2, 8)
[1] 4

geometric_mean(4, 1, 1/32)
[1] 0.5

But a different person using your function might expose it to situations it was not prepared to handle, resulting in cryptic errors or undefined behaviour:

# Input with factors instead of numerics
geometric_mean(factor(2), 8)
Error in Summary.factor(structure(1L, .Label = "2", class = "factor"), : 'prod' not meaningful for factors

# Input with negative values
geometric_mean(-1, 5)
[1] NaN

# Input with NAs
geometric_mean(2, 8, NA)
[1] NA

Or worse, it could give an incorrect output:

geometric_mean(c(2, 8))
[1] 16

Because of this, you need to make sure you return clear errors whenever your functions receives input it was not designed for. In this blog post, we review a range of approaches to help you check your function inputs and discuss some potential future developments.

Pre-requisite: thoroughly document your argument types

You can notice from the simple example above that it’s easy to pass invalid inputs to the geometric_mean() function because we didn’t provide any documentation on what is or isn’t a valid input. We won’t go into details here but the roxygen2 package provides a convenient way to generate documentation for R functions. Try to be as precise as possible when describing the required format for your inputs ¹.

#' @param name A character of length one with the name of the person to greet
say_hello <- function(name) {
  stopifnot(is.character(name))
  paste("Hello", name)
}

Adding any kind of argument checking in the absence of good documentation would be vain and very frustrating for your users as they would have to figure out what is or isn’t valid by trial and error.

Checking function inputs using base R

`match.arg()`

If the input can only take specific values, the base function match.arg() can also prove useful:

match.arg(arg = "R", choices = c("R", "python"))
[1] "R"

match.arg(arg = "javascript", choices = c("R", "python"))
Error in match.arg(arg = "javascript", choices = c("R", "python")): 'arg' should be one of "R", "python"

But the real power of the match.arg() function comes from the fact that choices can be automatically obtained in the context of a function. The default choice is then always the first element:

choose_language <- function(language = c("R", "python")) {
  
  # Equivalent to `match.arg(language, c("R", "python"))
  language <- match.arg(language)
  
  paste("I love", language)
  
}

choose_language("R")
[1] "I love R"

choose_language()
[1] "I love R"

choose_language("julia")
Error in match.arg(language): 'arg' should be one of "R", "python"

We are getting out of the realm of base R but it is worth mentioning that match.arg() has an equivalent in the tidyverse with a more consistent design and coloured output: rlang::arg_match().

`stopifnot()`

There is a another, more general, built-in mechanism to check input values in base R: stopifnot(). You can see it used throughout R source code. As its name suggests, it will stop the function execution if an object does not pass some tests.

say_hello <- function(name) {
  stopifnot(is.character(name))
  paste("Hello", name)
}

say_hello("Bob")
[1] "Hello Bob"
say_hello(404)
Error in say_hello(404): is.character(name) is not TRUE

However, as you can see in this example, the error message is not in plain English but contains some code instructions. This can hinder understanding of the issue.

Because of this, stopifnot() was improved in R 4.0.0:

stopifnot() now allows customizing error messages via argument names, thanks to a patch proposal by Neal Fultz in PR#17688.

This means we can now provide a clearer error message directly in stopifnot() ²:

say_hello <- function(name) {
  stopifnot("`name` must be a character." = is.character(name))
  paste("Hello", name)
}

say_hello(404)
Error in say_hello(404): `name` must be a character.

This is clearly a really great improvement to the functionality of base R. However, we can see from this example that we could create the error message programmatically based on the contents of the test. Each time we test if the object is of class_X and this is not true, we could throw an error saying something like “x must of a class_X”. This way, you don’t have to repeat yourself which is generally a good aim ³. This becomes necessary when you start having many input checks in your function or in your package.

Checking function inputs using R packages

The example of the checkmate package

Although some developers create their own functions to solve this problem ⁴, you can also rely on existing packages to make your life easier. One of these packages designed to help you in input checking is checkmate. checkmate provides a large number of functions that check that inputs respect a given set of properties, and that return clear error messages when that is not the case:

say_hello <- function(name) {
  # Among other things, check_string() checks that we provide a 
  # character object of length one
  checkmate::assert_string(name)
  paste("Hello", name)
}

say_hello(404)
Error in say_hello(404): Assertion on 'name' failed: Must be of type 'string', not 'double'.

say_hello(c("Bob", "Alice"))
Error in say_hello(c("Bob", "Alice")): Assertion on 'name' failed: Must have length 1.

Other packages to check function inputs

Because input checking is such an important point task and because it is so difficult to get right, it is not surprising that there are many packages other than checkmate to solve this issue. We will not get into the details of all of the available options here but below is a list of some of them, listed by decreasing number of reverse dependencies. If you’re interested in understanding the various approaches to input checking, the documentation for these package is a great place to start. For a more in-depth comparison of the different packages, vetr itself has a nice overview on this topic.

assertthat

assertthat::assert_that(is.character(1))
Error: 1 is not a character vector

vetr

template <- numeric(1L)

vetr::vet(template, 42)
[1] TRUE

vetr::vet(template, 1:3)
[1] "`length(1:3)` should be 1 (is 3)"

vetr::vet(template, "hello")
[1] "`\"hello\"` should be type \"numeric\" (is \"character\")"

assertr

library(magrittr)

mtcars %>%
  assertr::verify(nrow(.) < 10)
verification [nrow(.) < 10] failed! (1 failure)

    verb redux_fn    predicate column index value
1 verify       NA nrow(.) < 10     NA     1    NA
Error: assertr stopped execution

assertive

assertive::assert_is_a_string(1)
Error in eval(expr, envir, enclos): is_a_string : 1 is not of class 'character'; it has class 'numeric'.

ensurer

ensure_square <- ensurer::ensures_that(NCOL(.) == NROW(.))

ensure_square(matrix(1:20, 4, 5))
Error: conditions failed for call 'rmarkdown::render(" .. ecking/index.Rmd", ':
     * NCOL(.) == NROW(.)

vctrs::vec_assert()

vctrs::vec_assert(c(1, 2), "character")
Error in `vctrs::vec_assert()`:
! `c(1, 2)` must be a vector with type .
Instead, it has type .

vctrs::vec_assert(c(1, 2), size = 3)
Error in `stop_vctrs()`:
! `c(1, 2)` must have size 3, not size 2.

check is slightly different because it doesn’t provide utilities that work out of the box, but rather tools to assist you in writing your own checking functions

library(check)

check::setup() 

set_check_fun(
  "`{var}` must be a {type} vector of length {length}." = {
      val <- get(var, env)
      is.atomic(val) && is(val, type) && length(val) == length
  }
)

say_hello <- function(name) {
  check(
    "`name` must be a character vector of length 1."
    )
  paste("hello", name)
}

say_hello("Maria")
[1] "hello Maria"

say_hello(c("Maria", "Noelia"))
Error: `name` must be a character vector of length 1.

There is no ‘one-size-fits-all’

We have presented here different approaches but it is up to you, the developer, to decide which approach suits your needs best. We do not believe that one choice is intrinsically better than the others. All the workflows presented here can achieve the same result. Your choice may be influenced by several factors we cannot take into consideration here: who is your target audience? Will they be okay with somewhat technical terminology in the error messages? Do you have reasons to try and limit the number of dependencies ⁵? Which framework are you the more comfortable with and will facilitate maintenance in the future? And ultimately, what is your personal preference?

If you would like to hear various point of views and a more in-depth discussion about this, please refer to the pull request related to this post.

What about the future?

In this post, we have discussed some methods to check function inputs, and to generate more informative error messages when doing so. However, this always comes with a performance cost, even though it’s often relatively limited. Zero-cost assertions, as found in some other languages, would require some kind of typing system which R does not currently support. Interestingly several other languages have evolved to have typing systems as they have developed. Typescript developed as an extension of JavaScript, and type annotations are now possible in Python. Will R one day follow suit?

Some package developers even developed their own standardized way to document argument types and length. But there is currently no standard shared across the R community. ↩︎
Read the tidyverse style guide for more guidance on how to write good error messages. ↩︎
The Don’t Repeat Yourself (DRY) principle of software development, also mentioned in this post on caching ↩︎
See this earlier blog post for more information about why and who you would go with writing internal functions. ↩︎
This is a complex discussion often caricatured, but that has already been treated on some occasions such as this blog post from Jim Hester. ↩︎