5  Subsetting vectors

5.1 Subsetting vectors

We can extract individual elements of a vector by using the square bracket notation:

sequence_example <- 20:25
first_element <- sequence_example[1]
first_element
[1] 20

To change a single element, use the bracket on the other side of the arrow:

sequence_example[1] <- 30
sequence_example
[1] 30 21 22 23 24 25

Let’s define a new vector, x:

x <- c(5.4, 6.2, 7.1, 4.8, 7.5)
x
[1] 5.4 6.2 7.1 4.8 7.5

So now that we’ve created a toy vector to play with, how do we get at its contents?

5.1.1 Accessing elements using their indices

To extract elements of a vector we can give their corresponding index, starting from one:

x[1]
[1] 5.4
x[4]
[1] 4.8

The square brackets operator is a function. For vectors, it means “get me the nth element”.

We can ask for multiple elements at once by providing a vector of indices:

x[c(1, 3)]
[1] 5.4 7.1

Or “slices” of the vector using a sequential integer vector index:

x[1:4]
[1] 5.4 6.2 7.1 4.8

Recall that the : operator creates a sequence of numbers from the left element to the right.

1:4
[1] 1 2 3 4
c(1, 2, 3, 4)
[1] 1 2 3 4

We can ask for the same element multiple times:

x[c(1, 1, 3)]
[1] 5.4 5.4 7.1

If we ask for an index beyond the length of the vector, R will return a missing value:

x[6]
[1] NA

This is a vector of length one containing an NA, whose name is also NA.

If we ask for the 0th element, we get an empty vector:

x[0]
numeric(0)
Vector numbering in R starts at 1

In many programming languages (C and Python, for example), the first element of a vector has an index of 0. In R, the first element is 1.

5.1.2 Skipping and removing elements

If we use a negative number as the index of a vector, R will return every element except for the one specified:

x[-2]
[1] 5.4 7.1 4.8 7.5

We can skip multiple elements:

x[c(-1, -5)]  # or x[-c(1,5)]
[1] 6.2 7.1 4.8
Order of operations

A common trip-up for novices occurs when trying to skip slices of a vector. It’s natural to try to negate a sequence like so:

x[-1:3]

This gives a somewhat cryptic error:

Error in x[-1:3]: only 0's may be mixed with negative subscripts

But remember the order of operations. : is really a function. It takes its first argument as -1, and its second as 3, so generates the sequence of numbers: c(-1, 0, 1, 2, 3). The correct solution is to wrap that function call in brackets, so that the - operator applies to the result:

x[-(1:3)]
[1] 4.8 7.5

To remove elements from a vector, we need to re-assign the variable to our result:

x <- x[-4]
x
[1] 5.4 6.2 7.1 7.5
Challenge 1

Given the following vector:

x <- c(5.4, 6.2, 7.1, 4.8, 7.5)
x
[1] 5.4 6.2 7.1 4.8 7.5

Come up with at least 2 different commands that will produce the following output:

[1] 6.2 7.1 4.8

After you find 2 different commands, compare notes with your neighbor. Did you have different strategies?

x[2:4]
[1] 6.2 7.1 4.8
x[-c(1, 5)]
[1] 6.2 7.1 4.8
x[c(2, 3, 4)]
[1] 6.2 7.1 4.8
Challenge 2

Start by making a vector with the numbers 5 through 26. Then:

  • Print out the first three entries of the vector

  • Print out the fourth entry of the vector

  • Multiply the vector by 2.

x <- 5:26
head(x, 3)
[1] 5 6 7
x[4]
[1] 8
x * 2
 [1] 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52

5.2 Names

With names, we can give meaning to elements. It is the first time that we do not only have the data, but also explaining information. It is metadata that can be stuck to the object like a label. In R, this is called an attribute. Some attributes enable us to do more with our object, for example, like here, accessing an element by a self-defined name.

5.2.1 Accessing vectors by name

Each element of a vector can be given a name:

pizza_price <- c(pizzasubito = 5.64, pizzafresh = 6.60, callapizza = 4.50)

To retrieve a specific named entry from a vector, we can use the square bracket notation:

pizza_price["pizzasubito"]
pizzasubito 
       5.64 

which is equivalent to extracting the first entry of the vector:

pizza_price[1]
pizzasubito 
       5.64 

If you want to extract just the names of an object, use the names() function:

names(pizza_price)
[1] "pizzasubito" "pizzafresh"  "callapizza" 

We have seen how to access and change single elements of a vector. The same is possible for names:

names(pizza_price)[3]
[1] "callapizza"
names(pizza_price)[3] <- "call-a-pizza"
pizza_price
 pizzasubito   pizzafresh call-a-pizza 
        5.64         6.60         4.50 
Challenge 3

Define the following vector, y, and extract the “a” and “c” entries:

y
  a   b   c   d   e 
5.4 6.2 7.1 4.8 7.5 
y <- c(a = 5.4, b = 6.2, c = 7.1, d = 4.8, e = 7.5) # we can name a vector 'on the fly'
y[c("a", "c")]
  a   c 
5.4 7.1 

This is usually a much more reliable way to subset objects: the position of various elements can often change when chaining together subsetting operations, but the names will always remain the same!

Challenge 4

What is the data type of the names of pizza_price? You can find out using the str() or class() functions.

You get the names of an object by wrapping the object name inside names(...). Similarly, you get the data type of the names by again wrapping the whole code in class(...):

class(names(pizza_price))
[1] "character"

alternatively, use a new variable if this is easier for you to read:

names <- names(pizza_price)
class(names)
[1] "character"
Challenge 5

Instead of just changing the names of each element of a vector individually, you can also set all names of an object by writing code like (replace ALL CAPS text):

names( OBJECT ) <-  CHARACTER_VECTOR

Create a vector that gives the number for each letter in the alphabet!

  1. Generate a vector called letter_no with the sequence of numbers from 1 to 26
  2. R has a built-in object called LETTERS (type LETTERS in the console. It is a 26-character vector of uppercase letters from A to Z. Set the names of letter_no to these 26 letters
  3. Test yourself by calling letter_no["B"], which should give you the number 2!
letter_no <- 1:26   # or seq(1,26)
names(letter_no) <- LETTERS
letter_no["B"]
B 
2 

5.2.2 Removing named elements

Removing named elements is a little harder. If we try to remove one named element by negating the string, R complains (slightly obscurely) that it doesn’t know how to take the negative of a string:

x <- c(a = 5.4, b = 6.2, c = 7.1, d = 4.8, e = 7.5) # we start again by naming a vector 'on the fly'
x[-"a"]
Error in -"a": invalid argument to unary operator

We will discuss more about subsetting in the next lesson.