5 min read

A NOTE on URL checks of your R package

2020/12/01

|
|

Have you ever tried submitting your R package to CRAN and gotten the NOTE Found the following (possibly) invalid URLs:? R devel recently got more URL checks.1 In this post, we shall explain where and how CRAN checks URLs validity, and we shall explain how to best prepare your package for this check. We shall start by a small overview of links, including cross-references, in the documentation of R packages.

Adding URLs in your documentation (DESCRIPTION, manual pages, README, vignettes) is a good way to provide more information for users of the package.

We’ve already made the case for storing URLs to your development repository and package documentation website in DESCRIPTION. Now, the Description part of the DESCRIPTION, that gives a short summary about what your package does, can contain URLs between < and >.

Please write references in the description of the DESCRIPTION file in the form
authors (year) <doi:...>
authors (year) <arXiv:...>
authors (year, ISBN:...)
or if those are not available: authors (year) <https:...>
with no space after 'doi:', 'arXiv:', 'https:' and angle brackets for auto-linking.

The auto-linking (i.e. from <doi:10.21105/joss.01857> to <a href="https://doi.org/10.21105/joss.01857">doi:10.21105/joss.01857</a>) happens when building the package, via regular expressions. So you don’t type in an URL, but one will be constructed.

For adding links to manual pages, it is best to have roxygen2 docs about linking in mind or open in a browser tab. Also refer to the Writing R Extensions section about cross-references.2

There are links you add yourself (i.e. actual URLs), but also generated links when you want to refer to another topic or function, typeset as code or not. The documentation features useful tables summarizing how to use the syntax [thing] and [text][thing] to produce the links and look you expect.

And see also… the @seealso roxygen2 tag/\Seealso section! It is meant especially for storing cross-references and external links, following the syntax of links mentioned before.

The links to documentation topics are not URLs but they will be checked by roxygen2::roxygenize() (devtools::document()) and R CMD check. roxygen2 will warn Link to unknown topic and R CMD check will warn too Missing link or links in documentation object 'foo.Rd'.

When adding links in a vignette, use the format dictated by the vignette engine and format you are using. Note that in R Markdown vignettes, even plain URLs (e.g. https://r-project.org) will be “autolinked” by Pandoc (to <a href="https://r-project.org">https://r-project.org</a>) so their validity will be checked. To prevent Pandoc to autolink plain URLs, use

output: 
  rmarkdown::html_vignette:
    md_extensions: [ 
      "-autolink_bare_uris" 
    ]

as output format.

In the pkgdown website of your package, you will notice links in inline and block code, for which you can thank downlit. These links won’t be checked by R CMD check.

URLs checks by CRAN

At this point we have seen that there might be URLs in your package DESCRIPTION, manual pages and vignettes, coming from

  • Actual links ([The R project](https://r-project.org), <https://r-project.org>),
  • Plain URLs in vignettes,
  • Special formatting for DOIs and arXiv links.

For these URLs to be of any use to users, they need to be “valid”. Therefore, CRAN submission checks include a check of URLs. There is a whole official page dedicated to CRAN URL checks, that is quite short. It states “The checks done are equivalent to using curl -I -L” and lists potential sources of headache (like websites behaving differently when called via curl vs via a browser).

Note that checks of DOIs are a bit different than checks of URLs since one expects a redirect for a DOI, whereas for an URL, CRAN does not tolerate permanent redirections.

Even before an actual submission, you can obtain CRAN checks of the URLs in your package by using WinBuilder.

URLs checks locally or on R-hub

How to reproduce CRAN URL checks locally? For this you’d need to use R development version so using the urlchecker package, or R-hub instead might be easier. 😸

You can use devtools::check() with a recent R version (and with libcurl enabled) and with the correct values for the manual, incoming and remote arguments.

devtools::check(
  manual = TRUE,
  remote = TRUE,
  incoming = TRUE
  )

Or, for something faster and not requiring R-devel, you can use the urlchecker package. It is especially handy because it can also help you fix URLs that are redirected, by replacing them with the thing they are re-directed to.

On R-hub package builder, the equivalent of

devtools::check(
  manual = TRUE,
  remote = TRUE,
  incoming = TRUE
  )

is

rhub::check(
  env_vars = c(
    "_R_CHECK_CRAN_INCOMING_REMOTE_" = "true", 
    "_R_CHECK_CRAN_INCOMING_" = "true"
    )
)

You’ll need to choose a platform that uses R-devel, and if you hesitate, Windows is the fastest one.

rhub::check(
  env_vars = c(
    "_R_CHECK_CRAN_INCOMING_REMOTE_" = "true", 
    "_R_CHECK_CRAN_INCOMING_" = "true"
    ),
  platform = "windows-x86_64-devel"
)

URL fixes or escaping?

What if you can’t fix an URL, what if there’s a false positive?

  • You could try and have the provider of the resource fix the URL (ok, not often a solution);
  • You could add a comment in cran-comments.md (but this will slow a release);
  • You could escape the URL by writing it as plain text; in vignettes you will furthermore need to switch the output format to
output: 
  rmarkdown::html_vignette:
    md_extensions: [ 
      "-autolink_bare_uris" 
    ]

if you were using rmarkdown::html_vignette().

Conclusion

In this post we have summarized why, where and how URLs are stored in the documentation of R packages; how CRAN checks them and how you can reproduce such checks to fix URLs in time. We have also provided resources for dealing with another type of links in package docs: cross-references.

To not have your submission unexpectedly slowed down by an URL invalidity, it is crucial to have CRAN URL checks run on your package before submission, either locally with the urlchecker package (or R CMD check with R-devel), or via using a R-hub R-devel platform, or WinBuilder.


  1. And parallel, faster URL checks. ↩︎

  2. Furthermore, the guidance (and therefore roxygen2 implementation) sometimes change, so it’s good to know this could happen to you — hopefully this won’t scare you away for adding cross-references! https://www.mail-archive.com/r-package-devel@r-project.org/msg05504.html ↩︎