<- function(parameters) {
my_function # perform action
# return value
}
10 Functions
If we only had one data set to analyze, it would probably be faster to load the file into a spreadsheet and use that to plot simple statistics. However, the gapminder data is updated periodically, and we may want to pull in that new information later and re-run our analysis again. We may also obtain similar data from a different source in the future.
In this lesson, we’ll learn how to write a function so that we can repeat several operations with a single command.
10.1 Defining a function
If you haven’t already, create a functions/
directory in the same folder as your working quarto file.
Open a new R script file and call it functions-lesson.R
and save it in the functions/
directory.
The general structure of a function is:
Let’s define a function fahr_to_kelvin()
that converts temperatures from Fahrenheit to Kelvin:
<- function(temp) {
fahr_to_kelvin <- ((temp - 32) * (5 / 9)) + 273.15
kelvin return(kelvin)
}
The list of arguments that the function takes are contained within the parentheses in function(args)
.
Next, the body of the function–the statements that are executed when it runs–is contained within curly braces ({}
). The statements in the body are indented by two spaces. This makes the code easier to read but does not affect how the code operates.
It is useful to think of creating functions like writing a cookbook. First, you define the “ingredients” that your function needs. In this case, we only need one ingredient to use our function: “temp”. After we list our ingredients, we then say what we will do with them, in this case, we are taking our ingredient and applying a set of mathematical operators to it.
When we call the function, the values we pass to it as arguments are assigned to those variables so that we can use them inside the function. Inside the function, we use a to output a result when the function is used.
Let’s try running our function. Calling our own function is no different from calling any other function:
# freezing point of water
fahr_to_kelvin(32)
[1] 273.15
# boiling point of water
fahr_to_kelvin(212)
[1] 373.15
10.2 Combining functions
The real power of functions comes from mixing, matching, and combining them into ever-larger chunks to get the effect we want.
Let’s define two functions that will convert temperature from Fahrenheit to Kelvin, and Kelvin to Celsius:
<- function(temp) {
fahr_to_kelvin <- ((temp - 32) * (5 / 9)) + 273.15
kelvin return(kelvin)
}
<- function(temp) {
kelvin_to_celsius <- temp - 273.15
celsius return(celsius)
}
10.3 Interlude: Defensive Programming
Now that we’ve begun to appreciate how writing functions provides an efficient way to make R code reusable and modular, we should note that it is important to ensure that functions only work in their intended use cases. Checking function parameters is related to the concept of defensive programming. Defensive programming encourages us to frequently check conditions and throw an error if something is wrong. These checks are referred to as assertion statements because we want to assert some condition is TRUE
before proceeding. They make it easier to debug because they give us a better idea of where the errors originate.
10.3.1 Checking conditions with stop()
Let’s start by re-examining fahr_to_kelvin()
, our function for converting temperatures from Fahrenheit to Kelvin. It was defined like so:
<- function(temp) {
fahr_to_kelvin <- ((temp - 32) * (5 / 9)) + 273.15
kelvin return(kelvin)
}
For this function to work as intended, the argument temp
must be a numeric
value; otherwise, the mathematical procedure for converting between the two temperature scales will not work.
To create an error, we can use the function stop()
. For example, since the argument temp
must be a numeric
vector, we could check for this condition with an if
statement and throw an error if the condition was violated. We could augment our function above like so:
<- function(temp) {
fahr_to_kelvin if (!is.numeric(temp)) {
stop("temp must be a numeric vector.")
}<- ((temp - 32) * (5 / 9)) + 273.15
kelvin return(kelvin)
}
fahr_to_kelvin("one")
Error in fahr_to_kelvin("one"): temp must be a numeric vector.
If we had multiple conditions or arguments to check, it would take many lines of code to check all of them. Luckily R provides the convenience function stopifnot()
. We can list as many requirements that should evaluate to TRUE
; stopifnot()
throws an error if it finds one that is FALSE
. Listing these conditions also serves a secondary purpose as extra documentation for the function.
Let’s try out defensive programming with stopifnot()
by adding assertions to check the input to our function fahr_to_kelvin()
.
We want to assert the following: temp
is a numeric vector. We may do that like so:
<- function(temp) {
fahr_to_kelvin stopifnot(is.numeric(temp))
<- ((temp - 32) * (5 / 9)) + 273.15
kelvin return(kelvin)
}
It still works when given proper input.
# freezing point of water
fahr_to_kelvin(temp = 32)
[1] 273.15
But fails instantly if given improper input.
# Metric is a factor instead of numeric
fahr_to_kelvin(temp = "a")
Error in fahr_to_kelvin(temp = "a"): is.numeric(temp) is not TRUE
10.4 Default arguments
Our functions above only had one single argument. But some functions have many arguments, some of which are required and others which are not.
The following function has three arguments:
# define a function add() with three arguments, which adds three values together
<- function(a, b, c) {
add return(a + b + (2 * c))
}
# run add() with all three named arguments a = 1, b = 3, c = 5
add(a = 1, b = 3, c = 5)
[1] 14
If all arguments are provided, you don’t need to provide a name for the arguments:
# run add() without naming the arguments
add(1, 3, 5)
[1] 14
If you don’t provide all three arguments, you will get an error:
# try to run add() with just two arguments
add(1, 3)
Error in add(1, 3): argument "c" is missing, with no default
If you want to allow the user to leave some arguments out, you need to provide a default value for the arguments.
Default values can be defined by setting the argument equal to some value:
# redefine add() with defaults of 0 for all arguments
<- function(a = 0, b = 0, c = 0) {
add return(a + b + (2 * c))
}
# try add() with just two arguments
add(1, 3)
[1] 4
If you don’t name the arguments, by default the arguments you provide are assigned to the arguments in the order that they occur.
If you want to specify which argument you are providing, you must name them:
# try add() with arguments for a and c
add(a = 1, c = 3)
[1] 7
10.5 Shorthand functions
There are a few ways to write simple functions on a single line.
For example, the following two functions are equivalent:
<- function(a, b, c) {
add_v1 <- a + b + (2 * c)
linear_combination return(linear_combination)
}
add_v1(1, 5, 4)
[1] 14
and
<- function(a, b, c) return(a + b + (2 * c)) add_v2
add_v2(1, 5, 4)
[1] 14
Note that the return()
above is technically not required, since R will always return the last object that was computed in the body of the function, so the following will also work:
<- function(a, b, c) a + b + (2 * c) add_v3
add_v3(1, 5, 4)
[1] 14
10.6 An advanced example
The following function takes the gapminder data frame, and computes the GDP (in billions) while filtering to a specified year and country if specified.
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
<- function(dat, .year = NULL, .country = NULL) {
calcGDP
if(!is.null(.year)) {
<- dat |> filter(year %in% .year)
dat
}if (!is.null(.country)) {
<- dat |> filter(country %in% .country)
dat
}
<- dat |>
dat transmute(country, year, gdp = pop * gdpPercap / 1e9)
return(dat)
}
If you’ve been writing these functions down into a separate R script (a good idea!), you can load the functions into our R session by using the source()
function:
source("functions/functions-lesson.R")
If we don’t specify a .year
or .country
argument, our function returns all rows of the gapminder data.
calcGDP(gapminder) |>
head()
country year gdp
1 Afghanistan 1952 6.567086
2 Afghanistan 1957 7.585449
3 Afghanistan 1962 8.758856
4 Afghanistan 1967 9.648014
5 Afghanistan 1972 9.678553
6 Afghanistan 1977 11.697659
Let’s take a look at what happens when we specify the year:
head(calcGDP(gapminder, .year = 2007))
country year gdp
1 Afghanistan 2007 31.07929
2 Albania 2007 21.37641
3 Algeria 2007 207.44485
4 Angola 2007 59.58390
5 Argentina 2007 515.03363
6 Australia 2007 703.65836
Or for a specific country:
calcGDP(gapminder, .country = "Australia")
country year gdp
1 Australia 1952 87.25625
2 Australia 1957 106.34923
3 Australia 1962 131.88457
4 Australia 1967 172.45799
5 Australia 1972 221.22377
6 Australia 1977 258.03733
7 Australia 1982 295.74280
8 Australia 1987 355.85312
9 Australia 1992 409.51123
10 Australia 1997 501.22325
11 Australia 2002 599.84716
12 Australia 2007 703.65836
Or both:
calcGDP(gapminder, .year = 2007, .country = "Australia")
country year gdp
1 Australia 2007 703.6584
Let’s walk through the body of the function:
<- function(dat, .year = NULL, .country = NULL) { calcGDP
Here we’ve added two arguments, .year
, and .country
. We’ve set default arguments for both as NULL
using the =
operator in the function definition. We are using a period as a prefix to these arguments .year
and .country
to help visually differentiate between the argument and the column name in dat
/gapminder
.
These arguments will take on those values unless the user specifies otherwise.
if(!is.null(.year)) {
<- dat |> filter(year %in% .year)
dat
}if (!is.null(.country)) {
<- dat |> filter(country %in% .country)
dat }
Here, we check whether each additional argument is set to null
, and whenever they’re not null
overwrite the dataset stored in dat
with the subset computed in body of the if
statement.
Building these conditionals into the function makes it more flexible for later. Now, we can use it to calculate the GDP for:
- The whole dataset;
- A single year;
- A single country;
- A single combination of year and country.
By using %in%
instead, we can also give multiple years or countries to those arguments.
<- dat |>
dat transmute(country, year, gdp = pop * gdpPercap / 1e9)
return(dat)
}
Finally, we used transmute to select the country
, year
, and gdp
variables and to compute the gdp
variable itself.