Academic Resource Center

Data Structures in R

Updated on

Before exploring different types of data structures in the R programming language, refer to the Data Types tutorial to refresh the knowledge or get familiar with the concept.

A data structure is a particular way of organizing data in any programming language. Observe your room, and you will see many different objects. You might have several books arranged on a bookshelf - the computer would treat them as objects of type “books.” Your closet is full of clothes - the computer would treat them as objects of type “clothes.” If you place some of the clothes among the books on the bookshelf and drop a couple of books into the closet, you might find this disarray confusing and look for a better way to organize things in your room. Similarly, a programming language needs a structure to understand, collect, and store data.

R has four basic data structures:

  • Vector
  • List
  • Matrix
  • Array
  • Data Frame

Vector

In R, a vector is the most basic data structure that contains elements of the same data type. Suppose the elements in the vector are of different data types. In that case, R will instinctively convert them into the most appropriate (see examples with y1 and y2 collections when we used function c() in the Data Types tutorial.

s <- 5:9       # creates a variable s that holds values from 5 to 9

s        # prints elements of the vector s

Output: [1] 5 6 7 8 9

is.vector(s)       # returns TRUE if s is a vector and FALSE if s is not a vector

Output: [1] TRUE

length(s)        # returns length of the vector s, indexing each element

Output: [1] 5

str(s)       # this function returns structure of the argument

Output: int [1:5] 5 6 7 8 9

Another way to create a vector is to use function c().

v1 <- c(1, 2, 3, 4)       # creates vector v1 that holds numeric values 1, 2, 3, and 4

v1       # prints elements of v1

Output: [1] 1 2 3 4

is.vector(v1)        # returns TRUE if v1 is a vector and FALSE if v1 is not a vector

Output: [1] TRUE

v2 <- c("A picture is", "worth a", "1000", "words.")       # creates vector v2 with 4 characters

v2        # prints elements of v2

Output: [1] "A picture is" "worth a" "1000" "words."

is.vector(v2)        # checks if v2 is a vector

Output: [1] TRUE

is.numeric(v2)       # checks if v2 is numeric

Output: [1] FALSE

Let’s create another vector that holds Boolean values, TRUE and FALSE, and do not forget to check the data type of this vector too.

v3 <- c(FALSE, TRUE, FALSE, TRUE)        # creates vector v3 that holds Boolean values

v3       # prints elements of v3

Output: [1] FALSE TRUE FALSE TRUE

is.vector(v3)        # returns TRUE if v3 is a vector

Output: [1] TRUE

typeof(v3)        # returns the type of the vector v3

Output: [1] "logical"

v3[4]       # prints the value of the 4th element of vector v3

Output: [1] TRUE

Observe how R converts vector v4 to a double when we change the last logical value in v3 to a numeric.

v4 <- c(FALSE, TRUE, FALSE, 5)       # creates vector v4 with different data types

v4       # prints elements of the vector v4

Output: [1] 0 1 0 5

is.vector(v4)        # checks if v4 is a vector

Output: [1] TRUE

typeof(v4)       # prints the type of vector v4

Output: [1] "double"

v4[4]        # prints value of the 4th element of vector v4

Output: [1] 5

List

A list is also a collection of elements, but unlike a one-dimensional vector, it is multi-dimensional and holds parts of different data types.

Pay attention to the double square brackets in the output, indicating a multi-dimensional level.

To create a list in R, we use function list(). Recall from the example above that s is a vector with numerical values from 5 to 9 inclusive. Examine the output of the codes below, paying attention to the length of the list s. Since s_list has only one element, vector s, its size should be equal to 1. We will add more elements to the list later and examine the changes in the outputs.

s_list <- list(s)       # creates a list s_list

s_list       # prints elements of list s_list

Output:

## [[1]]

## [1] 5 6 7 8 9

length(s_list)       # returns the length of list s_list

Output: [1] 1

typeof(s)        # returns the type of vector s

Output: [1] "integer"

typeof(s_list)        # returns the type of list s_list

Output: [1] "list"

Let’s create a list that holds vector s with elements of numeric type and a character vector v2 that we produced earlier in this tutorial. Again, pay attention to the list’s length, equal to 2 in this example.

s_list1 <- list(s, v2)        # creates list s_list1 with two elements

s_list1       # prints out elements of list s_list1

Output:

## [[1]]

## [1] 5 6 7 8 9

##

## [[2]]

## [1] "A picture is" "worth a" "1000" "words."

length(s_list1)       # returns the length of list s_list

Output: [1] 2

typeof(s_list1)        # returns the type of list s_list

Output: [1] "list"

Matrix

In R, a matrix is a multi-dimensional vector with rows and columns like a table. The elements of a matrix must be the same type. Examine the global variables in the top right panel of the RStudio. You should see a character vector v2 that we can use to create a matrix m. If the v2 vector is not declared, run the codes above again and examine the change in the global variables of your current session.

m <- matrix(v2, nrow = 2, byrow = TRUE)       # creates matrix m

m        # prints 2x2 matrix m

Output:

## [,1] [,2]

## [1,] "A picture is" "worth a"

## [2,] "1000" "words."

typeof(m)       # returns the type of matrix m

Output: [1] "character"

str(m)       # returns structure of matrix m

Output: chr [1:2, 1:2] "A picture is" "1000" "worth a" "words."

While you are in the RStudio environment, go to Help and type matrix() into the search box. Scroll down to find base::matrix to explore and learn more about this function.

Let’s create a numeric matrix with 10 rows and 10 columns that hold all the integers from 1 to 100.

m1 <- matrix(1:100, nrow=10)        # creates matrix m1

m1       # prints 10x10 matrix m1

Output:

## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]

## [1,] 1 11 21 31 41 51 61 71 81 91

## [2,] 2 12 22 32 42 52 62 72 82 92

## [3,] 3 13 23 33 43 53 63 73 83 93

## [4,] 4 14 24 34 44 54 64 74 84 94

## [5,] 5 15 25 35 45 55 65 75 85 95

## [6,] 6 16 26 36 46 56 66 76 86 96

## [7,] 7 17 27 37 47 57 67 77 87 97

## [8,] 8 18 28 38 48 58 68 78 88 98

## [9,] 9 19 29 39 49 59 69 79 89 99

## [10,] 10 20 30 40 50 60 70 80 90 100

typeof(m1)       # returns the type of matrix m

Output: [1] "integer"

Array

An array is another object in R that can store data in several tables with a similar number of rows and columns. Think of it as a collection of matrices or tables of equal size. To create an array in R, we need to use array() function, details of which you should explore in the Help on the low right panel of RStudio.

vector1 <- 1:10       # creates vector vector1 with numerical values from 1 to 10

vector2 <- c("A", "B", "C", "D")        # creates vector vector2 with characters A, B, C, and D

a <- array(c(vector1, vector2),

dim = c(4, 3, 2),

dimnames = list(c("row1", "row2", "row3", "row4"),

c("column1", "column2", "column3"),

c("array1", "array2"))

)

We created array a with elements of the two vectors presented in 2 tables (or matrices). Each table in the array has 4 rows and 3 columns. The size of the matrices and the array we specified with dim attribute during creation of the array a above.

Let’s print out the array a and see its structure.

a       # prints elements of the array a

Output:

## , , array1

##

##    column1 column2 column3

## row1 "1" "5" "9"

## row2 "2" "6" "10"

## row3 "3" "7" "A"

## row4 "4" "8" "B"

##

## , , array2

##

##    column1 column2 column3

## row1 "C" "3" "7"

## row2 "D" "4" "8"

## row3 "1" "5" "9"

## row4 "2" "6" "10"

Notice how R loops through all the array elements populating an indicated number of tables of a specified size with the array’s elements.

We can directly access an element in the array by specifying the row number, column number, and the table as illustrated in the codes below.

a["row2", "column1", "array1"]

Output: [1] "2"

a["row1", "column1", "array2"]

Output: [1] "C"

Data Frame

A data frame is a particular type of list where every vector is the same length. Think of a data frame as a table with columns of the same size. You might recall from the example above that a matrix is also a table and wonder what the difference is between matrices and data frames.

As metntioned above, a matrix is a table that must hold data of the same data type. A data frame is a table that can have data of different types.

Consider working with data that contains information about employees, including their names and annual pay for the last three years.

Presenting this data as a data frame would allow collecting the employee’s names and the amount of annual pay for each year, in a table with four columns; in contrast to a matrix, which only allows collecting the paid amount as a table with three columns, or just names as a table of one column.

Let’s jump in and create a data frame df with elements of different types, as illustrated below, using a function data.frame().

numeric_vector<- c(1, 2, 3)

character_vector <- c("a", "b", "c")

logical_vector <- c(TRUE, TRUE, FALSE)

mixed_vector <- c(numeric_vector[1], character_vector[1], logical_vector[1])

df <- data.frame(numeric_vector, character_vector, logical_vector, mixed_vector)

df

Output:

## numeric_vector character_vector logical_vector mixed_vector

## 1        1        a        TRUE        1

## 2       2        b        TRUE        a

## 3       3        c        FALSE        TRUE

Examine the data frame using useful R functions:

  • nrow()
  • ncol()
  • summary()
  • str()

nrow(df)       # returns the number of rows in the data frame df

Output: [1] 3

ncol(df)        # returns the number of columns in the data frame df

Output: [1] 4

summary(df)        # returns the summary of each variable in the data frame df

Output:

## numeric_vector character_vector logical_vector mixed_vector

## Min. :1.0 Length:3 Mode :logical Length:3

## 1st Qu.:1.5 Class :character FALSE:1 Class :character

## Median :2.0 Mode :character TRUE :2 Mode :character

## Mean :2.0

## 3rd Qu.:2.5

## Max. :3.0

str(df)        # returns the structure of the data frame df

Output:

## 'data.frame': 3 obs. of 4 variables:

## $ numeric_vector : num 1 2 3

## $ character_vector: chr "a" "b" "c"

## $ logical_vector : logi TRUE TRUE FALSE

## $ mixed_vector : chr "1" "a" "TRUE"

Next Steps

Thanks for reviewing Academic Supports first R guide series! We hope you will save these to help you as you move through your school work. Want more R? Click below:

Getting Started in RStudio, Part One

Getting Started in RStudio, Part Two

Need More Help?

Click here to schedule a 1:1 with a tutor, coach, and or sign up for a workshop. *If this link does not bring you directly to our platform, please use our direct link to "Academic Support" from any Brightspace course at the top of the navigation bar. 

Previous Article Data Types in R
Next Article C++ Tips and Tricks
Have a suggestion or a request? Share it with us!