Which version of R to use in production?

R - CRAN - PRODUCTION

Should I use the most recent version of R? An older one? And when should I update it? Do I even need to?

This article will answer all your questions about which version of R to use.

A new version of R, 4.3.0, has just been released. You should not install it in production!

This new version is buggy.

Oops.

Understanding R versions

As you probably know, to download R you have to go to the CRAN website. You will arrive on the following page: https://cran.r-project.org/bin/windows/base/. And there you just have to click on Download R-4.3.0 for Windows to download the executable and install it on your computer.

What does the following number mean: 4.3.0?

Most programs use a versioning system. They have a version number that will evolve (usually incrementally) over time.

Not all software use the same method to define their version number.

In the case of R, it is relatively simple:

The first number corresponds to the major version. It is incremented when there are particularly important changes in the language. The last time this happened was in 2020, with version 4.0.0. And the time before that was in 2013, with version 3.0.0. That doesn’t happen very often.

The second number corresponds to the minor version. It is incremented when new features or evolutions are added to the language. When it comes to R, and this is a particularly important point for the following: the minor version is incremented exactly once a year, in spring.

For example:

  • Version 4.3.0 in April 2023
  • Version 4.2.0 in April 2022
  • Version 4.1.0 in May 2021
  • Version 4.0.0 in April 2020
  • Version 3.6.0 in April 2019
  • Etc.

You can find the history of R releases on this page: Previous Releases of R for Windows.

Finally, the third number corresponds to the patch version. These versions are released throughout the year to fix bugs. Usually, a last patch is released just before the next minor release, so around the end of winter.

There is no defined number of patches per year, it depends mostly on the amount of bugs and their importance. Here are the dates of the last patches of the last years:

  • Version 4.2.3 in March 2023 (1 month before version 4.3.0)
  • Version 4.1.3 in March 2022 (1 month before version 4.2.0)
  • Version 4.0.5 in March 2021 (2 month before version 4.1.0)
  • Version 3.6.3 in February 2020 (2 month before version 4.0.0)
  • Version 3.5.3 in March 2019 (1 month before version 3.6.0)
  • Etc.

Which version of R to use in production?

Now that you know how the R release cycle works, you can answer the question: which version of R should you use in production?

Should you use the most recent version of R? No.

The most recent version is version 4.3.0, released just recently at the time of writing this article. Before spring 2024, we will have a version 4.3.1, then 4.3.2, etc. These additional versions will not bring new features, but mainly fix multiple bugs present in version 4.3.0.

Work on version 4.3.1 will start very soon, with the creation of a 4.3.0-patched version that will evolve over time (until the release of 4.3.1). To see the list of bugs fixed in the future patch, go on this page: R-Patched, then click on the link “New features in this version”. At the time of writing this article, there is nothing more in 4.3.0-patched than in 4.3.0 since the latter was released only 2 days ago.

By the way, there’s also another temporary version called R-Devel, which contains the changes for the next minor version; i.e., 4.4.0. If you want to know more, you can even find the list of changes on this link.

So, which version do we use?

We want a recent, stable and non-bugged version.

So the best choice is: The last patch of the penultimate minor version.

As we are currently in 4.3.0, this would be version 4.2.3.

That’s the recommended one for a production system at the moment.

And as soon as the R team releases the next minor version – 4.4.0 – then, the recommended version for a production system will be 4.3.3 (or 4.3.4, or 4.3.5, I don’t know yet: It will be the last patch in the 4.3.x series).

When will the next minor version be released? In spring 2024. There’s no exact date. As you could see with the examples above, it sometimes drops in April, sometimes in May… they roll it out when it’s ready.

And when will the last patch be released? More of the same, you don’t know until the next minor version is out. If a patch is released in February, there is a good chance that it will be the last one, but 2020 proved us wrong: 4.0.4 was released in February 2020, and a few weeks later 4.0.5 was released, with some final fixes.

When do I update my version of R?

If you’ve been paying attention, you now know that each year has its own recommended version of R:

  • R 4.1.3 in 2022
  • R 4.0.5 in 2021
  • R 3.6.3 in 2020
  • And so on…

And when I say “in 2022”, I mean “from spring 2022 to spring 2023”.

But by now you are probably thinking: Do you have to update R?

In 2020, I worked with a team on an old version of R. While we should have been in 3.6.3 (according to the recommendations above), we were in 3.2.1: A version released 5 years earlier (in June 2015) and which is not even the last patch of its series. So a buggy version.

When I mentioned this problem, I was told that it would be too expensive and too dangerous to upgrade.

Even worse, in this project we didn’t use a package versioning tool either (like renv) and we loaded 40 packages at initialization. Were all 40 needed and used? No. After factoring, I only loaded 6.

So, what’s the problem in this situation?

Problem 1: Installing the packages will be increasingly difficult

First of all, you should definitely use a version management tool like renv, otherwise you are condemning all the new developers who arrive on the project to many sleepless nights. The same goes for the veterans who deploy on a new environment. And the same goes for the IT people, who are going to pull their hair out.

Some will have the dplyr 2015 version, others the 2023 version. How do you know which one to use? Good luck.

But even if we use renv, it’s still not smooth sailing.

As we use 5 years old packages, they are not necessarily compatible with today’s system dependencies. Obviously, since packages also depend on external tools, like C++ libraries for example. Even so, you’re not guaranteed to have the right version. So you have to document the right versions of the systems to use.

Does this madness make sense to you?

Attempt to install car on R 3.5.1

Let’s take an example. At the moment I am using version 3.5.1 with a client. Let’s try to install the car package. It’s a rather classical package that offers a lot of useful functions around linear regression. It’s available on the CRAN since 2001. So, it should be compatible with R 3.5.1, right? It is, since its latest version indicates that you need R 3.5.0 at least to use it. So, we’re good. Let’s try to install it.

 
install.packages("car") 

The installation fails very quickly with the message: Error: package 'pbkrtest' is not available

Well. Indeed, the pbkrtest package requires at least R 4.1.0 to be installed. So instead I’ll install an older version.

For that, I will use devtools::install_version(), so I have to install the devtools package. Let’s hope this works, otherwise I’ll have to install a dependency package by hand.

Which version of pbkrtest do you need to install? Either you download the archives one by one to find the most recent version that supports R 3.5.1, or you do it randomly. I quickly find that the second last version is suitable, so I install this one.

 
devtools::install_version("pbkrtest", version = "0.5.1") 

Ouch: ERROR: dependency ‘lme4’ is not available for package ‘pbkrtest’. We are also told: package ‘RcppEigen’ is not available (for R version 3.5.1).

Welcome to dependency hell. As a matter of fact, RcppEigen wants at least R 3.6.0. So, back to square one. In the archives, I find an old version compatible with R 3.5.1 and I try to install it:

 
devtools::install_version("RcppEigen", version = "0.3.3.9.2") 

Ouch: ERROR: compilation failed for package ‘RcppEigen’ with lots of obscure C++ error messages that are hard to decipher. So I’ll try the previous version, but it’s a bit of a guess.

 
devtools::install_version("RcppEigen", version = "0.3.3.9.1") 

My patience is rewarded. Now, we can try installing pbkrtest again:

 
devtools::install_version("pbkrtest", version = "0.5.1") 

This time, I can successfully install lme4, and pbkrtest as well. I can then install car:

 
install.packages("car") 

Oh no… Error: package 'MatrixModels' is not available. Back to square one again… I find a version compatible with R 3.5.1 and try to install it (are you having a good time yet?).

 
devtools::install_version("MatrixModels", version = "0.5-0") 

Alright! Let’s try car again:

 
install.packages("car") 

YES! It’s working!

About time. And that was just to install car.

What’s more, there are even more problems coming…

Problem 2: Some functionalities are unavailable

You want to use a recent version of dplyr?

Too bad, you need to have at least R 3.4.0 (released in April 2017).

Version of dplyr Released in… Requires R >= Released in …
0.1.1 January 29, 2014 3.0.2 September 2013
0.4.2 June 16, 2015 3.1.2 October 2014
0.8.1 May 14, 2019 3.2.0 April 2015
1.0.3 January 15, 2021 3.3.0 April 2016
1.0.8 February 8, 2022 3.4.0 April 2017

So yes, you still have some wiggle room if you are using R 4.1.3 today. But if the last few years are anything to go by, you will not be able to keep up with the evolution of dplyr in a few years.

And remember we are talking about dplyr, a very popular package, in active development, which is now in stable version (since 1.0.0).

For the packages that belong to the tidyverse, you can find the supported versions of R in this article: Which versions of R do tidyverse packages support?. According to their table, with my version of R 3.5.1 at my customer’s, I don’t have access to the latest version of the tidyverse packages.

Now say that you want to use the Seurat package. Too bad, you need at least R 4.0.0.

Or gtExtras, an example perhaps a bit more likely, and which requires R 3.6.0.

Anyway, you get the idea.

If you don’t upgrade your version of R, you probably won’t have a problem next year, maybe not the year after either, but the longer you wait… the more risks you take.

And that’s not all, folks.

Problem 3: The Internet moves forward without you

Let’s go back to the dplyr example.

With my other client’s project that was stuck in R 3.2.1, I could have used dplyr version 1.0.2 at best.

This means that when in 2023 I do a search on Duckduckgo to do some data manipulation, I’ll come across Stackoverflow questions with code that won’t be compatible with my version.

I will be offered:

 
starwars %>%
    summarise(
        mean_height = mean(height), 
        by = c(species, homeworld)
    )

Which won’t work for my version. My version still uses the old writing:

 
starwars %>%
    group_by(species, homeworld) %>%
    summarise(mean_height = mean(height))

The more time goes by, the more the code I’ll find on the internet will not be compatible with the versions of the packages I use. And while we can control the versions of all the software on a system, we can’t control the versions we see on the internet.

Problem 4: You’ll have to upgrade one day

Let’s say you don’t upgrade.

Time flies.

Constraints pile up. At first, they are light, only a few packages are affected.

But after a while… you have no other choice.

You realize that working with these constraints becomes more expensive on a daily basis than launching the upgrade project.

And that project turns into an ordeal. You don’t have a process in place, since you’ve never done it before. You suddenly update the version of R and all the packages you use, and errors start popping up everywhere. Half of your tests don’t pass anymore.

Pandemonium.

My recommendation: Plan an upgrade once a year

We’ve discussed this before: There’s one recommended R release per year (the last patch of the penultimate minor release).

In the spring of each year, shortly after the release of the new minor version, schedule a period during which you will upgrade:

  • The version of R
  • All your used packages

Some parts of the code will also probably need some work.

For example, when you moved to R 4.0.0, the stringsAsFactors option changed its default value. This change required special attention.

With the switch to R 4.2.0, the if instructions return an error if the condition is a vector of length greater than 1, whereas before we only had a warning.

You will quickly notice that some packages, often the same ones, will bring problems every year. This was the case with dplyr before the switch to 1.0.0, or the spatial data manipulation packages (which will be deprecated at the end of 2023 anyway). But it’s better to do it once a year, to solve small problems often, than once after 5 years.

R 4.3.0 just came out. It is now time to install R 4.2.3

You will have:

  • The most advanced stable version of R
  • The most advanced versions of R packages
  • Access to the latest evolutions
  • A documented process to your annual R upgrade
  • Peace of mind

Need any help?