<- 1:4
x * 2 x
[1] 2 4 6 8
Most of R’s functions are vectorized, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error-prone.
For example, applying multiplication to a vector will conduct the operation element-wise:
<- 1:4
x * 2 x
[1] 2 4 6 8
We can also add two vectors together:
<- 6:9
y + y x
[1] 7 9 11 13
Each element of x
was added to its corresponding element of y
:
: 1 2 3 4
x+ + + +
: 6 7 8 9
y---------------
7 9 11 13
Comparison operators, logical operators, and many functions are also vectorized:
Comparison operators applied to a vector will produce a boolean vector
x
[1] 1 2 3 4
> 2 x
[1] FALSE FALSE TRUE TRUE
Most functions also operate element-wise on vectors:
<- 1:4
x x
[1] 1 2 3 4
log(x)
[1] 0.0000000 0.6931472 1.0986123 1.3862944
Let’s consider the following vector:
<- c(5.4, 6.2, 7.1, 4.8, 7.5)
x names(x) <- c('a', 'b', 'c', 'd', 'e')
We can also use any logical vector to subset:
c(FALSE, FALSE, TRUE, FALSE, TRUE)] x[
c e
7.1 7.5
Since comparison operators (e.g. >
, <
, ==
) evaluate to logical vectors, we can also use them to succinctly subset vectors: the following statement gives the same result as the previous one.
> 7] x[x
c e
7.1 7.5
Breaking it down, this statement first evaluates x>7
, generating a logical vector c(FALSE, FALSE, TRUE, FALSE, TRUE)
, and then selects the elements of x
corresponding to the TRUE
values.
We can use ==
to mimic the previous method of indexing by name (remember you have to use ==
rather than =
for comparisons):
names(x) == "a"
[1] TRUE FALSE FALSE FALSE FALSE
names(x) == "a"] x[
a
5.4
We often want to combine multiple logical criteria. Several operations for combining logical vectors exist in R:
a & b
: the “logical AND” operator: returns TRUE
if both the a
and b
are TRUE
.
a | b
: the “logical OR” operator: returns TRUE
, if either a
or b
(or both) are TRUE
.
You may sometimes see &&
and ||
instead of &
and |
. These two-character operators only look at the first element of each vector and ignore the remaining elements. In general, you should not use the two-character operators in data analysis.
!
, the “logical NOT” operator: converts TRUE
to FALSE
and FALSE
to TRUE
. It can negate a single logical condition (eg !TRUE
becomes FALSE
), or a whole vector of conditions(eg !c(TRUE, FALSE)
becomes c(FALSE, TRUE)
).Additionally, you can compare the elements within a single vector using the all
function (which returns TRUE
if every element of the vector is TRUE
) and the any
function (which returns TRUE
if one or more elements of the vector are TRUE
).
For example, the following code will only return the values of x
that are equal to either 5.4 or 7 (since there is no value equal to 7, only the value equal to 5.4 will be returned):
== 5.4) | (x == 7) (x
a b c d e
TRUE FALSE FALSE FALSE FALSE
== 5.4) | (x == 7)] x[(x
a
5.4
What would the following code return?
== 5.4) & (x == 7)] x[(x
named numeric(0)
Nothing, because no value is equal to both 5.4 and 7 at the same time!
We could extract the values that were just equal to 7.5
== 7.5] x[x
e
7.5
or the values that are not equal to 5.4 using !=
!= 5.4] x[x
b c d e
6.2 7.1 4.8 7.5
Or by negating the entire expression:
!(x == 5.4)] x[
b c d e
6.2 7.1 4.8 7.5
Note that we don’t want to use ==
to compare with a vector since this will only look at the first entry.
== c(5.4, 4.8, 7.5) x
Warning in x == c(5.4, 4.8, 7.5): longer object length is not a multiple of
shorter object length
a b c d e
TRUE FALSE FALSE FALSE FALSE
What is happening here? Recycling!
To ask whether the values of one vector are included in another vector, you should instead use the %in%
operator.
%in%
Another way to do this is to ask R to subset the values of x
that are “in” a vector of values.
%in% c(5.4, 4.8, 7) x
[1] TRUE FALSE FALSE TRUE FALSE
%in% c(5.4, 4.8, 7)] x[x
a d
5.4 4.8
We could extract the entries that are not in this vector by preceding the logical expression with a !
!(x %in% c(5.4, 4.8, 7))
[1] FALSE TRUE TRUE FALSE TRUE
!(x %in% c(5.4, 4.8, 7))] x[
b c e
6.2 7.1 7.5