── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.0 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.1 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
gapminder <-read.csv("data/gapminder_data.csv")
12.1 “For” loops for repeating operations
If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, one way to do this is using a for() loop.
In general, the advice of many R users would be to learn about for() loops, but to avoid using for() loops unless the order of iteration is important: i.e. the calculation at each iteration depends on the results of previous iterations.
To save the output of a computation in a vector, you need to first create an empty vector (e.g., x <- c()) and sequentially fill the values of this vector in each iteration of the for loop.
result <-c()for (i in1:10) { result[i] <- (1+ i)^{10}}result
For loops are common in programming in general, but for loops are rarely used in R, primarily due to their computational inefficiency.
Instead, a much more efficient method for iterating in R is using the map() functions from the purrr R package. To load the purrr R package, you need to run the following code (if the purrr package is not installed, you will need to run the commented install.packages() line)
# install.packages("purrr")library(purrr)
The first argument from the map() function is the object whose elements we want to iterate over. The second argument is the function that we want to apply at each iteration.
The output of the map() function is always a list.
For example, the following code will apply the exp() function to each element in the vector 1:10 and return the results in a list:
While the list output format offers maximal flexibility, we typically want to create a vector or a data frame. This can be done using alternative versions of the map() function, such as map_dbl(), which specifies the type of your output in its name.
For instance, if you want your output to be a numeric “double” vector, you can use map_dbl():
and if you want it to be a character vector, you can use map_chr():
map_chr(gapminder, class)
country year pop continent lifeExp gdpPercap
"character" "integer" "numeric" "character" "numeric" "numeric"
Here, recall that the gapminder data frame is a list, and the map_ function is iterating over the elements of the list, which in this case, is the columns.
Note that the output of the function you are applying must match the map_ function that you use, else you will get an error:
map_dbl(1:10, class)
Error in `map_dbl()`:
ℹ In index: 1.
Caused by error:
! Can't coerce from a string to a double vector.
The true power of the map functions really comes once you learn how to write your own functions.
For example, we could conduct the following transformation to each entry in 1:10:
Replacing the argument in the body of the function with .
The map_df() function will return a data frame (but requires that the function being applied outputs a data frame).
As an example, the following code takes each entry in the vector c(1, 4, 7), and adds 10 to it, and returns a two-column data frame containing the old number and new number:
As another example, the following code takes the gapminder dataset selects the pop, gdpPercap, and lifeExp columns, and then computes a data frame for each column/variable containing the mean and sd.
and is then pasting the results for each variable together into a single data frame.
Challenge 1
For each column in the gapminder dataset, compute the number of unique entries using the n_distinct() function. Make sure the output of your code is a numeric vector.
Do this in two different ways: using a for loop and a map_dbl() function.
Hint: n_distinct() is a dplyr function which counts the number of unique/distinct values in a vector. Try n_distinct(c(1, 1, 4, 4, 4, 4, 1, 3)) as an example of its usage
Solution to Challenge 1
unique_gapminder <-c()for (i in1:ncol(gapminder)) { unique_gapminder[i] <-n_distinct(gapminder[, i])}unique_gapminder
[1] 142 12 1704 5 1626 1704
map_dbl(gapminder, ~n_distinct(.))
country year pop continent lifeExp gdpPercap
142 12 1704 5 1626 1704
Challenge 2
Use map_df() to compute the number of distinct values and the class of each variable in the gapminder dataset and store them in a data frame.
The output of your code should look like this:
variable n_distinct class1 country 142 character2 year 12 integer3 pop 1704 numeric4 continent 5 character5 lifeExp 1626 numeric6 gdpPercap 1704 numeric
Hint: the argument .id = "variable" variable of map_df() can be used to add the variable column automatically based on the gapminder column names.
variable n_distinct class
1 country 142 character
2 year 12 integer
3 pop 1704 numeric
4 continent 5 character
5 lifeExp 1626 numeric
6 gdpPercap 1704 numeric