10 Functions

If we only had one data set to analyze, it would probably be faster to load the file into a spreadsheet and use that to plot simple statistics. However, the gapminder data is updated periodically, and we may want to pull in that new information later and re-run our analysis again. We may also obtain similar data from a different source in the future.

In this lesson, we’ll learn how to write a function so that we can repeat several operations with a single command.

What is a function?

Functions gather a sequence of operations into a whole, preserving it for ongoing use. Functions provide:

a name we can remember and use to invoke it
relief from the need to remember the individual operations
a defined set of inputs and expected outputs

As the basic building block of most programming languages, user-defined functions constitute “programming” as much as any single abstraction can. If you have written a function, you are a computer programmer.

10.1 Defining a function

If you haven’t already, create a functions/ directory in the same folder as your working quarto file.

Open a new R script file and call it functions-lesson.R and save it in the functions/ directory.

The general structure of a function is:

my_function <- function(parameters) {
  # perform action
  # return value
}

Let’s define a function fahr_to_kelvin() that converts temperatures from Fahrenheit to Kelvin:

fahr_to_kelvin <- function(temp) {
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

The list of arguments that the function takes are contained within the parentheses in function(args).

Next, the body of the function–the statements that are executed when it runs–is contained within curly braces ({}). The statements in the body are indented by two spaces. This makes the code easier to read but does not affect how the code operates.

It is useful to think of creating functions like writing a cookbook. First, you define the “ingredients” that your function needs. In this case, we only need one ingredient to use our function: “temp”. After we list our ingredients, we then say what we will do with them, in this case, we are taking our ingredient and applying a set of mathematical operators to it.

When we call the function, the values we pass to it as arguments are assigned to those variables so that we can use them inside the function. Inside the function, we use a to output a result when the function is used.

Return statements

One feature unique to R is that the return statement is not required.

R automatically returns whichever variable is on the last line of the body of the function. But for clarity, we will explicitly define the return statement.

Let’s try running our function. Calling our own function is no different from calling any other function:

# freezing point of water
fahr_to_kelvin(32)

[1] 273.15

# boiling point of water
fahr_to_kelvin(212)

[1] 373.15

Challenge 1

Write a function called kelvin_to_celsius() that takes a temperature in Kelvin and returns that temperature in Celsius.

Hint: To convert from Kelvin to Celsius you subtract 273.15

Solution to challenge 1

Write a function called kelvin_to_celsius that takes a temperature in Kelvin and returns that temperature in Celsius

kelvin_to_celsius <- function(temp) {
celsius <- temp - 273.15
return(celsius)
}

10.2 Combining functions

The real power of functions comes from mixing, matching, and combining them into ever-larger chunks to get the effect we want.

Let’s define two functions that will convert temperature from Fahrenheit to Kelvin, and Kelvin to Celsius:

fahr_to_kelvin <- function(temp) {
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

kelvin_to_celsius <- function(temp) {
  celsius <- temp - 273.15
  return(celsius)
}

Challenge 2

Define the function to convert directly from Fahrenheit to Celsius, using the two functions above.

Solution to challenge 2

Define the function to convert directly from Fahrenheit to Celsius, by reusing these two functions above

fahr_to_celsius <- function(temp) {
 temp_k <- fahr_to_kelvin(temp)
 result <- kelvin_to_celsius(temp_k)
 return(result)
}

10.3 Interlude: Defensive Programming

Now that we’ve begun to appreciate how writing functions provides an efficient way to make R code reusable and modular, we should note that it is important to ensure that functions only work in their intended use cases. Checking function parameters is related to the concept of defensive programming. Defensive programming encourages us to frequently check conditions and throw an error if something is wrong. These checks are referred to as assertion statements because we want to assert some condition is TRUE before proceeding. They make it easier to debug because they give us a better idea of where the errors originate.

10.3.1 Checking conditions with `stop()`

Let’s start by re-examining fahr_to_kelvin(), our function for converting temperatures from Fahrenheit to Kelvin. It was defined like so:

fahr_to_kelvin <- function(temp) {
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

For this function to work as intended, the argument temp must be a numeric value; otherwise, the mathematical procedure for converting between the two temperature scales will not work.

To create an error, we can use the function stop(). For example, since the argument temp must be a numeric vector, we could check for this condition with an if statement and throw an error if the condition was violated. We could augment our function above like so:

fahr_to_kelvin <- function(temp) {
  if (!is.numeric(temp)) {
    stop("temp must be a numeric vector.")
  }
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

fahr_to_kelvin("one")

Error in fahr_to_kelvin("one"): temp must be a numeric vector.

“If” statements

There are several ways you can control flow in R. For conditional statements, the most commonly used approaches are the constructs:

# if
if (condition is true) {
  perform action
}

# if ... else
if (condition is true) {
  perform action
} else {  # that is, if the condition is false,
  perform alternative action
}

Say, for example, that we want R to print a message if a variable x has a particular value:

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
}

The print statement does not appear in the console because x (8) is not greater than 10. To print a different message for numbers less than 10, we can add an else statement.

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else {
  print("x is less than 10")
}

[1] "x is less than 10"

You can also test multiple conditions by using else if.

x <- 8

if (x >= 10) {
  print("x is greater than or equal to 10")
} else if (x > 5) {
  print("x is greater than 5, but less than 10")
} else {
  print("x is less than 5")
}

[1] "x is greater than 5, but less than 10"

Important: when R evaluates the condition inside if() statements, it is looking for a logical value (TRUE or FALSE).

If we had multiple conditions or arguments to check, it would take many lines of code to check all of them. Luckily R provides the convenience function stopifnot(). We can list as many requirements that should evaluate to TRUE; stopifnot() throws an error if it finds one that is FALSE. Listing these conditions also serves a secondary purpose as extra documentation for the function.

Let’s try out defensive programming with stopifnot() by adding assertions to check the input to our function fahr_to_kelvin().

We want to assert the following: temp is a numeric vector. We may do that like so:

fahr_to_kelvin <- function(temp) {
  stopifnot(is.numeric(temp))
  kelvin <- ((temp - 32) * (5 / 9)) + 273.15
  return(kelvin)
}

It still works when given proper input.

# freezing point of water
fahr_to_kelvin(temp = 32)

[1] 273.15

But fails instantly if given improper input.

# Metric is a factor instead of numeric
fahr_to_kelvin(temp = "a")

Error in fahr_to_kelvin(temp = "a"): is.numeric(temp) is not TRUE

Challenge 3

Edit our fahr_to_celsius() function so that it throws an error immediately if the argument temp is non-numeric. Test that your error message works as expected.

Solution to challenge 3

Extend our previous definition of the function by adding in an explicit call to stopifnot(). Since fahr_to_celsius() is a composition of two other functions, checking inside here makes adding checks to the two-component functions redundant.

fahr_to_celsius <- function(temp) {
 stopifnot(is.numeric(temp))
 temp_k <- fahr_to_kelvin(temp)
 result <- kelvin_to_celsius(temp_k)
 return(result)
}

10.4 Default arguments

Our functions above only had one single argument. But some functions have many arguments, some of which are required and others which are not.

The following function has three arguments:

# define a function add() with three arguments, which adds three values together
add <- function(a, b, c) {
  return(a + b + (2 * c))
}

# run add() with all three named arguments a = 1, b = 3, c = 5
add(a = 1, b = 3, c = 5)

[1] 14

If all arguments are provided, you don’t need to provide a name for the arguments:

# run add() without naming the arguments
add(1, 3, 5)

[1] 14

If you don’t provide all three arguments, you will get an error:

# try to run add() with just two arguments
add(1, 3)

Error in add(1, 3): argument "c" is missing, with no default

If you want to allow the user to leave some arguments out, you need to provide a default value for the arguments.

Default values can be defined by setting the argument equal to some value:

# redefine add() with defaults of 0 for all arguments
add <- function(a = 0, b = 0, c = 0) {
  return(a + b + (2 * c))
}

# try add() with just two arguments
add(1, 3)

[1] 4

If you don’t name the arguments, by default the arguments you provide are assigned to the arguments in the order that they occur.

If you want to specify which argument you are providing, you must name them:

# try add() with arguments for a and c
add(a = 1, c = 3)

[1] 7

Challenge 4

Add an error message that will return an error if at least two of a, b, c, are not equal to zero that says “You must provide at least two non-zero values to add”. Test your function.

Solution to challenge 4

add <- function(a = 0, b = 0, c = 0) {
  if (sum(c(a, b, c) == 0) > 1) {
    stop("you must provide at least two non-zero values to add")
  } 
  return(a + b + (2 * c))
}

10.5 Shorthand functions

There are a few ways to write simple functions on a single line.

For example, the following two functions are equivalent:

add_v1 <- function(a, b, c) {
  linear_combination <- a + b + (2 * c)
  return(linear_combination)
}

add_v1(1, 5, 4)

[1] 14

and

add_v2 <- function(a, b, c) return(a + b + (2 * c))

add_v2(1, 5, 4)

[1] 14

Note that the return() above is technically not required, since R will always return the last object that was computed in the body of the function, so the following will also work:

add_v3 <- function(a, b, c) a + b + (2 * c)

add_v3(1, 5, 4)

[1] 14

10.6 An advanced example

The following function takes the gapminder data frame, and computes the GDP (in billions) while filtering to a specified year and country if specified.

# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat, .year = NULL, .country = NULL) {
  
  if(!is.null(.year)) {
    dat <- dat |> filter(year %in% .year)
  }
  if (!is.null(.country)) {
    dat <- dat |> filter(country %in% .country)
  }
  
  dat <- dat |>
    transmute(country, year, gdp = pop * gdpPercap / 1e9)
  
  return(dat)
}

If you’ve been writing these functions down into a separate R script (a good idea!), you can load the functions into our R session by using the source() function:

source("functions/functions-lesson.R")

If we don’t specify a .year or .country argument, our function returns all rows of the gapminder data.

calcGDP(gapminder) |>
  head()

      country year       gdp
1 Afghanistan 1952  6.567086
2 Afghanistan 1957  7.585449
3 Afghanistan 1962  8.758856
4 Afghanistan 1967  9.648014
5 Afghanistan 1972  9.678553
6 Afghanistan 1977 11.697659

Let’s take a look at what happens when we specify the year:

head(calcGDP(gapminder, .year = 2007))

      country year       gdp
1 Afghanistan 2007  31.07929
2     Albania 2007  21.37641
3     Algeria 2007 207.44485
4      Angola 2007  59.58390
5   Argentina 2007 515.03363
6   Australia 2007 703.65836

Or for a specific country:

calcGDP(gapminder, .country = "Australia")

     country year       gdp
1  Australia 1952  87.25625
2  Australia 1957 106.34923
3  Australia 1962 131.88457
4  Australia 1967 172.45799
5  Australia 1972 221.22377
6  Australia 1977 258.03733
7  Australia 1982 295.74280
8  Australia 1987 355.85312
9  Australia 1992 409.51123
10 Australia 1997 501.22325
11 Australia 2002 599.84716
12 Australia 2007 703.65836

Or both:

calcGDP(gapminder, .year = 2007, .country = "Australia")

    country year      gdp
1 Australia 2007 703.6584

Let’s walk through the body of the function:

calcGDP <- function(dat, .year = NULL, .country = NULL) {

Here we’ve added two arguments, .year, and .country. We’ve set default arguments for both as NULL using the = operator in the function definition. We are using a period as a prefix to these arguments .year and .country to help visually differentiate between the argument and the column name in dat/gapminder.

These arguments will take on those values unless the user specifies otherwise.

  if(!is.null(.year)) {
    dat <- dat |> filter(year %in% .year)
  }
  if (!is.null(.country)) {
    dat <- dat |> filter(country %in% .country)
  }

Here, we check whether each additional argument is set to null, and whenever they’re not null overwrite the dataset stored in dat with the subset computed in body of the if statement.

Building these conditionals into the function makes it more flexible for later. Now, we can use it to calculate the GDP for:

The whole dataset;
A single year;
A single country;
A single combination of year and country.

By using %in% instead, we can also give multiple years or countries to those arguments.

   dat <- dat |>
    transmute(country, year, gdp = pop * gdpPercap / 1e9)
  
  return(dat)
}

Finally, we used transmute to select the country, year, and gdp variables and to compute the gdp variable itself.

Tip: Pass by value

Functions in R almost always make copies of the data to operate on inside of a function body. When we modify dat inside the function we are modifying the copy of the gapminder dataset stored in dat, not the original gapminder variable.

This is called “pass-by-value” and it makes writing code much safer: you can always be sure that whatever changes you make within the body of the function, stay inside the body of the function.

Tip: Function scope

Another important concept is scoping: any variables (or functions!) you create or modify inside the body of a function only exist for the lifetime of the function’s execution.

When we call calcGDP(), the variables dat and gdp only exist inside the body of the function. Even if we have variables of the same name in our interactive R session, they are not modified in any way when executing a function.

Challenge 5

Test out your GDP function by calculating the GDP for New Zealand in 1987 and 1952.

Solution to challenge 5

calcGDP(gapminder, .year = c(1952, 1987), .country = "New Zealand")

      country year      gdp
1 New Zealand 1952 21.05819
2 New Zealand 1987 63.05001

Challenge 6

The paste() function can be used to combine text together, e.g:

best_practice <- c("Write", "programs", "for", "people", "not", "computers")
paste(best_practice, collapse=" ")

[1] "Write programs for people not computers"

Write a function called fence() that takes two vectors as arguments, called text and wrapper, and prints out the text wrapped with the wrapper:

The output of the following code should be:

fence(text = best_practice, wrapper="***")

[1] "*** Write programs for people not computers ***"

Note: the paste() function has an argument called sep, which specifies the separator between text. The default is a space: ” “. The default for paste0() is no space”“.

Solution to challenge 6

The following function will achieve our goal:

fence <- function(text, wrapper){
 text <- c(wrapper, text, wrapper)
 result <- paste(text, collapse = " ")
 return(result)
}
best_practice <- c("Write", "programs", "for", "people", "not", "computers")
fence(text=best_practice, wrapper="***")

[1] "*** Write programs for people not computers ***"

Tip: Testing and documenting

It’s important to both test functions and document them: documentation helps you, and others, understand what the purpose of your function is and how to use it, and its important to make sure that your function actually does what you think.

Formal documentation for functions, written in separate .Rd files, gets turned into the documentation you see in help files. The roxygen2 package allows R coders to write documentation alongside the function code and then process it into the appropriate .Rd files. You will want to switch to this more formal method of writing documentation when you start writing more complicated R projects. In fact, packages are, in essence, bundles of functions with this formal documentation. Loading your own functions through source("functions.R") is equivalent to loading someone else’s functions (or your own one day!) through library("package").

Formal automated tests can be written using the testthat package.