6  Vectorization

6.1 Vectorization

Most of R’s functions are vectorized, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error-prone.

For example, applying multiplication to a vector will conduct the operation element-wise:

x <- 1:4
x * 2
[1] 2 4 6 8

We can also add two vectors together:

y <- 6:9
x + y
[1]  7  9 11 13

Each element of x was added to its corresponding element of y:

x:  1  2  3  4
    +  +  +  +
y:  6  7  8  9
    7  9 11 13
Challenge 1

Let’s try this on the weight column of the cats dataset. Make a new column in the cats data frame that contains a “calibrated” weight measurement, corresponding to the original weight measurement weight minus 0.5.

Check the head or tail of the data frame to make sure it worked.

To define the cats data frame, run

cats <- read.csv(file = "data/feline-data.csv")
cats$weight_calibrated <- cats$weight - 0.5
    coat weight likes_string weight_calibrated
1 calico    2.1            1               1.6
2  black    5.0            0               4.5
3  tabby    3.2            1               2.7

Comparison operators, logical operators, and many functions are also vectorized:

6.1.1 Comparison operators

Comparison operators applied to a vector will produce a boolean vector

[1] 1 2 3 4
x > 2

6.1.2 Functions

Most functions also operate element-wise on vectors:

x <- 1:4
[1] 1 2 3 4
[1] 0.0000000 0.6931472 1.0986123 1.3862944
Challenge 2

We’re interested in looking at the sum of the following sequence of fractions:

x = 1/(1^2) + 1/(2^2) + 1/(3^2) + ... + 1/(n^2)

This would be tedious to type out, and impossible for high values of n. Use vectorization to compute x when n = 100. What is the sum when n = 10,000?

[1] 1.634984
[1] 1.644834

We could do this for a general n using:

n <- 10000
[1] 1.644834
Tip: Operations on vectors of unequal length

Operations can also be performed on vectors of unequal length, through a process known as recycling. This process automatically repeats the smaller vector until it matches the length of the larger vector. R will provide a warning if the larger vector is not a multiple of the smaller vector.

x <- c(1, 2, 3)
y <- c(1, 2, 3, 4, 5, 6, 7)
x + y
Warning in x + y: longer object length is not a multiple of shorter object
[1] 2 4 6 5 7 9 8

Vector x was recycled to match the length of vector y

x:  1  2  3  1  2  3  1
    +  +  +  +  +  +  +
y:  1  2  3  4  5  6  7
    2  4  6  5  7  9  8

6.2 Subsetting vectors using logical operations

Let’s consider the following vector:

x <- c(5.4, 6.2, 7.1, 4.8, 7.5)
names(x) <- c('a', 'b', 'c', 'd', 'e')

We can also use any logical vector to subset:

  c   e 
7.1 7.5 

Since comparison operators (e.g. >, <, ==) evaluate to logical vectors, we can also use them to succinctly subset vectors: the following statement gives the same result as the previous one.

x[x > 7]
  c   e 
7.1 7.5 

Breaking it down, this statement first evaluates x>7, generating a logical vector c(FALSE, FALSE, TRUE, FALSE, TRUE), and then selects the elements of x corresponding to the TRUE values.

We can use == to mimic the previous method of indexing by name (remember you have to use == rather than = for comparisons):

names(x) == "a"
x[names(x) == "a"]

6.3 Combining logical conditions

We often want to combine multiple logical criteria. Several operations for combining logical vectors exist in R:

  • a & b: the “logical AND” operator: returns TRUE if both the a and b are TRUE.

  • a | b: the “logical OR” operator: returns TRUE, if either a or b (or both) are TRUE.

You may sometimes see && and || instead of & and |. These two-character operators only look at the first element of each vector and ignore the remaining elements. In general, you should not use the two-character operators in data analysis.

  • !, the “logical NOT” operator: converts TRUE to FALSE and FALSE to TRUE. It can negate a single logical condition (eg !TRUE becomes FALSE), or a whole vector of conditions(eg !c(TRUE, FALSE) becomes c(FALSE, TRUE)).

Additionally, you can compare the elements within a single vector using the all function (which returns TRUE if every element of the vector is TRUE) and the any function (which returns TRUE if one or more elements of the vector are TRUE).

For example, the following code will only return the values of x that are equal to either 5.4 or 7 (since there is no value equal to 7, only the value equal to 5.4 will be returned):

(x == 5.4) | (x == 7)
    a     b     c     d     e 
x[(x == 5.4) | (x == 7)]

What would the following code return?

x[(x == 5.4) & (x == 7)]
named numeric(0)

Nothing, because no value is equal to both 5.4 and 7 at the same time!

We could extract the values that were just equal to 7.5

x[x == 7.5]

or the values that are not equal to 5.4 using !=

x[x != 5.4]
  b   c   d   e 
6.2 7.1 4.8 7.5 

Or by negating the entire expression:

x[!(x == 5.4)]
  b   c   d   e 
6.2 7.1 4.8 7.5 

Note that we don’t want to use == to compare with a vector since this will only look at the first entry.

x == c(5.4, 4.8, 7.5)
Warning in x == c(5.4, 4.8, 7.5): longer object length is not a multiple of
shorter object length
    a     b     c     d     e 

What is happening here? Recycling!

To ask whether the values of one vector are included in another vector, you should instead use the %in% operator.

6.4 The in operator %in%

Another way to do this is to ask R to subset the values of x that are “in” a vector of values.

x %in% c(5.4, 4.8, 7)
x[x %in% c(5.4, 4.8, 7)]
  a   d 
5.4 4.8 

We could extract the entries that are not in this vector by preceding the logical expression with a !

!(x %in% c(5.4, 4.8, 7))
x[!(x %in% c(5.4, 4.8, 7))]
  b   c   e 
6.2 7.1 7.5 
Tip: Getting help for operators

Remember you can search for help on operators by wrapping them in quotes: help("%in%") or ?"%in%".

Challenge 3

Given the following code:

x <- c(5.4, 6.2, 7.1, 4.8, 7.5)
names(x) <- c('a', 'b', 'c', 'd', 'e')
  a   b   c   d   e 
5.4 6.2 7.1 4.8 7.5 

Write a subsetting command to return the values in x that are greater than 4 or less than 7.

Write another subsetting command to return the values in x that are greater than 4 and less than 7.

greater than 4 or less than 7:

x[(x < 7) | (x > 4)]
  a   b   c   d   e 
5.4 6.2 7.1 4.8 7.5 

greater than 4 and less than 7:

x[(x < 7) & (x > 4)]
  a   b   d 
5.4 6.2 4.8 
Tip: Non-unique names

Multiple elements in a vector can have the same name. (For a data frame, columns can have the same name too.) Consider these examples:

x <- 1:3
[1] 1 2 3
names(x) <- c('a', 'a', 'a')
a a a 
1 2 3 
x['a']  # only returns first value
x[names(x) == 'a']  # returns all three values
a a a 
1 2 3