Internal functions in R packages

An R package can be viewed as a set of functions, of which only a part are exposed to the user. In this blog post we shall concentrate of the functions that are not exposed to the user, so called internal functions: what are they, how does one handle them in one’s own package, and how can one explore them?

Internal functions 101

What is an internal function?

It’s a function that lives in your package, but that isn’t surfaced to the user. You could also call it unexported function or helper function; as opposed to exported functions and user-facing functions.

For instance, in the usethis package there’s a base_and_recommended() function that is not exported.

# doesn't work
library("usethis")
base_and_recommended()

## Error in base_and_recommended(): could not find function "base_and_recommended"

usethis::base_and_recommended()

## Error: 'base_and_recommended' is not an exported object from 'namespace:usethis'

# works
usethis:::base_and_recommended()

##  [1] "base"       "boot"       "class"      "cluster"    "codetools" 
##  [6] "compiler"   "datasets"   "foreign"    "graphics"   "grDevices" 
## [11] "grid"       "KernSmooth" "lattice"    "MASS"       "Matrix"    
## [16] "methods"    "mgcv"       "nlme"       "nnet"       "parallel"  
## [21] "rpart"      "spatial"    "splines"    "stats"      "stats4"    
## [26] "survival"   "tcltk"      "tools"      "utils"

As an user, you shouldn’t use unexported functions of another package in your own code.

Why not export all functions?

There are at least these two reasons:

In a package you want to provide your user an API that is useful and stable. You can vouch for a few functions, that serve the package main goals, are documented enough, and that you’d only change with great care if need be. If your package users rely on an internal function that you decide to ditch when re-factoring code, they won’t be happy, so only export what you want to maintain.
If all packages exposed all their internal functions, the user environment would be flooded and the namespace conflicts would be out of control.

Why write internal functions?

Why write internal functions instead of having everything in one block of code inside each exported functions?

When writing R code in general there are several reasons to write functions and it is the same within R packages: you can re-use a bit of code in several places (e.g. an epoch converter used for the output of several endpoints from a web API), and you can give it a self-explaining name (e.g. convert_epoch()). Any function defined in your package is usable by other functions of your package (unless it is defined inside a function of your package, in which case only that parent function can use it).

Having internal functions also means you can test these bits of code on their own. That said if you test internals too much re-factoring your code will mean breaking tests.

To find blocks of code that could be replaced with a function used several times, you could use the dupree package whose planned enhancements include highlighting or printing the similar blocks.

When not to write internal functions?

There is a balance to be found between writing your own helpers for everything and only depending on external code. You can watch this excellent code on the topic.

Where to put internal functions?

You could save internal functions used in one function only in the R file defining that function, and internal functions used in several other functions in a single utils.R file or specialized utils-dates.R, utils-encoding.R files. Choose a system that helps you and your collaborators find the internal functions easily, R will never have trouble finding them as long they’re somewhere in the R/ directory. 😉

Another possible approach to helper functions when used in several packages is to pack them up in a package such as Yihui Xie’s xfun. So then they’re no longer internal functions. 😵

How to document internal functions?

You should at least add a few comments in their code as usual. Best practice recommended in the tidyverse style guide and the rOpenSci dev guide is to document them with roxygen2 tags like other functions, but to use #' @noRd to prevent manual pages to be created.

#' Compare x to 1
#' @param x an integer
#' @noRd
is_one <- function(x) {
  x == 1
}

The keyword @keywords internal would mean a manual page is created but not present in the function index. A confusing aspect is that you could use it for an exported, not internal function you don’t want to be too visible, e.g. a function returning the default app for OAuth in a package wrapping a web API.

#' A function rather aimed at developers
#' @description A function that does blabla, blabla.
#' @keywords internal
#' @export
does_thing <- function(){
 message("I am an exported function")
}

Explore internal functions

You might need to have a look at the guts of a package when wanting to contribute to it, or at the guts of several packages to get some inspiration for your code.

Explore internal functions within a package

Say you’ve started working on a new-to-you package (or resumed work on a long forgotten package of yours 😉). How to know how it all hangs together? You can use the same methods as for debugging code, exploring code is like debugging it and vice versa!

One first way to understand what a given helper does is looking at its code, from within RStudio there are some useful tools for navigating functions. You can then search for occurrences of its names across R scripts. These first two tasks are static code analysis (well unless your brain really executes R code by reading it!). Furthermore, a non static way to explore a function is to use browser() inside it or inside functions calling it.

Another useful tool is the in development pkgapi package. Let’s look at the cranlogs source code.

map <- pkgapi::map_package("/home/maelle/Documents/R-hub/cranlogs")

We can see all defined functions, exported or not.

str(map$defs)

## 'data.frame':	8 obs. of  7 variables:
##  $ name    : chr  "check_date" "cran_downloads" "cran_top_downloads" "cranlogs_badge" ...
##  $ file    : chr  "R/utils.R" "R/cranlogs.R" "R/cranlogs.R" "R/badge.R" ...
##  $ line1   : int  1 61 184 16 137 105 117 126
##  $ col1    : int  1 1 1 1 1 1 1 1
##  $ line2   : int  6 103 208 33 153 115 124 135
##  $ col2    : int  1 1 1 1 1 1 1 1
##  $ exported: logi  FALSE TRUE TRUE TRUE FALSE FALSE ...

We can see all calls inside the package code, to functions from the package and other packages.

str(map$calls)

## 'data.frame':	84 obs. of  9 variables:
##  $ file : chr  "R/badge.R" "R/badge.R" "R/badge.R" "R/badge.R" ...
##  $ from : chr  "cranlogs_badge" "cranlogs_badge" "cranlogs_badge" "cranlogs_badge" ...
##  $ to   : chr  "base::c" "base::match.arg" "base::paste0" "base::paste0" ...
##  $ type : chr  "call" "call" "call" "call" ...
##  $ line1: int  17 21 23 25 30 7 8 62 65 66 ...
##  $ line2: int  17 21 23 25 30 7 8 62 65 66 ...
##  $ col1 : int  38 14 14 16 3 14 14 35 8 17 ...
##  $ col2 : int  38 22 19 21 8 19 19 35 14 25 ...
##  $ str  : chr  "c" "match.arg" "paste0" "paste0" ...

We can filter that data.frame to only keep calls between functions defined in the package.

library("magrittr")
internal_calls <- map$calls[map$calls$to %in% glue::glue("{map$name}::{map$defs$name}"),]

internal_calls %>%
  dplyr::arrange(to)

##           file           from                      to type line1 line2 col1
## 1 R/cranlogs.R cran_downloads    cranlogs::check_date call    69    69    7
## 2 R/cranlogs.R cran_downloads    cranlogs::check_date call    73    73    7
## 3 R/cranlogs.R        to_df_1 cranlogs::fill_in_dates call   123   123    3
## 4 R/cranlogs.R cran_downloads         cranlogs::to_df call   101   101    3
## 5 R/cranlogs.R          to_df       cranlogs::to_df_1 call   109   109    5
## 6 R/cranlogs.R          to_df       cranlogs::to_df_r call   107   107    5
##   col2           str
## 1   16    check_date
## 2   16    check_date
## 3   15 fill_in_dates
## 4    7         to_df
## 5   11       to_df_1
## 6   11       to_df_r

That table can help understand how a package works. One could combine that with a network visualization.

library("visNetwork")
internal_calls <- internal_calls %>%
  dplyr::mutate(to = gsub("cranlogs\\:\\:", "", to))

nodes <- tibble::tibble(id = map$defs$name,
                        title = map$defs$file,
                        label = map$defs$name,
                        shape = dplyr::if_else(map$defs$exported,
                                               "triangle",
                                               "square"))

edges <- internal_calls[, c("from", "to")]


visNetwork(nodes, edges, height = "500px") %>%
  visLayout(randomSeed = 42) %>%
  visNodes(size = 10)

In this interactive visualization one sees three exported functions (triangles), with only one that calls internal functions. Such a network visualization might not be that useful for bigger packages, and in our workflow is limited to pkgapi’s capabilities (e.g. not memoised functions)… but it’s at least quite pretty.

Explore internal functions across packages

Looking at helpers in other packages can help you write your own, e.g. looking at a package elegantly wrapping a web API could help you wrap another one elegantly too.

Bob Rudis wrote a very interesting blog post about his exploration of R packages “utility belts” i.e. the utils.R files. We also recommend our own blog post about reading the R source.

Conclusion

In this post we explained what internal functions are, and gave a few tips as to how to explore them within a package and across packages. We hope the post can help clear up a few doubts. Feel free to comment about further ideas or questions you may have.