Before exploring different types of data structures in the R programming language, refer to the Data Types tutorial to refresh the knowledge or get familiar with the concept.
A data structure is a particular way of organizing data in any programming language. Observe your room, and you will see many different objects. You might have several books arranged on a bookshelf - the computer would treat them as objects of type “books.” Your closet is full of clothes - the computer would treat them as objects of type “clothes.” If you place some of the clothes among the books on the bookshelf and drop a couple of books into the closet, you might find this disarray confusing and look for a better way to organize things in your room. Similarly, a programming language needs a structure to understand, collect, and store data.
R has four basic data structures:
- Vector
- List
- Matrix
- Array
- Data Frame
Vector
In R, a vector is the most basic data structure that contains elements of the same data type. Suppose the elements in the vector are of different data types. In that case, R will instinctively convert them into the most appropriate (see examples with y1 and y2 collections when we used function c() in the Data Types tutorial.
s <- 5:9 # creates a variable s that holds values from 5 to 9
s # prints elements of the vector s
Output: [1] 5 6 7 8 9
is.vector(s) # returns TRUE if s is a vector and FALSE if s is not a vector
Output: [1] TRUE
length(s) # returns length of the vector s, indexing each element
Output: [1] 5
str(s) # this function returns structure of the argument
Output: int [1:5] 5 6 7 8 9
Another way to create a vector is to use function c().
v1 <- c(1, 2, 3, 4) # creates vector v1 that holds numeric values 1, 2, 3, and 4
v1 # prints elements of v1
Output: [1] 1 2 3 4
is.vector(v1) # returns TRUE if v1 is a vector and FALSE if v1 is not a vector
Output: [1] TRUE
v2 <- c("A picture is", "worth a", "1000", "words.") # creates vector v2 with 4 characters
v2 # prints elements of v2
Output: [1] "A picture is" "worth a" "1000" "words."
is.vector(v2) # checks if v2 is a vector
Output: [1] TRUE
is.numeric(v2) # checks if v2 is numeric
Output: [1] FALSE
Let’s create another vector that holds Boolean values, TRUE and FALSE, and do not forget to check the data type of this vector too.
v3 <- c(FALSE, TRUE, FALSE, TRUE) # creates vector v3 that holds Boolean values
v3 # prints elements of v3
Output: [1] FALSE TRUE FALSE TRUE
is.vector(v3) # returns TRUE if v3 is a vector
Output: [1] TRUE
typeof(v3) # returns the type of the vector v3
Output: [1] "logical"
v3[4] # prints the value of the 4th element of vector v3
Output: [1] TRUE
Observe how R converts vector v4 to a double when we change the last logical value in v3 to a numeric.
v4 <- c(FALSE, TRUE, FALSE, 5) # creates vector v4 with different data types
v4 # prints elements of the vector v4
Output: [1] 0 1 0 5
is.vector(v4) # checks if v4 is a vector
Output: [1] TRUE
typeof(v4) # prints the type of vector v4
Output: [1] "double"
v4[4] # prints value of the 4th element of vector v4
Output: [1] 5
List
A list is also a collection of elements, but unlike a one-dimensional vector, it is multi-dimensional and holds parts of different data types.
Pay attention to the double square brackets in the output, indicating a multi-dimensional level.
To create a list in R, we use function list(). Recall from the example above that s is a vector with numerical values from 5 to 9 inclusive. Examine the output of the codes below, paying attention to the length of the list s. Since s_list has only one element, vector s, its size should be equal to 1. We will add more elements to the list later and examine the changes in the outputs.
s_list <- list(s) # creates a list s_list
s_list # prints elements of list s_list
Output:
## [[1]]
## [1] 5 6 7 8 9
length(s_list) # returns the length of list s_list
Output: [1] 1
typeof(s) # returns the type of vector s
Output: [1] "integer"
typeof(s_list) # returns the type of list s_list
Output: [1] "list"
Let’s create a list that holds vector s with elements of numeric type and a character vector v2 that we produced earlier in this tutorial. Again, pay attention to the list’s length, equal to 2 in this example.
s_list1 <- list(s, v2) # creates list s_list1 with two elements
s_list1 # prints out elements of list s_list1
Output:
## [[1]]
## [1] 5 6 7 8 9
##
## [[2]]
## [1] "A picture is" "worth a" "1000" "words."
length(s_list1) # returns the length of list s_list
Output: [1] 2
typeof(s_list1) # returns the type of list s_list
Output: [1] "list"
Matrix
In R, a matrix is a multi-dimensional vector with rows and columns like a table. The elements of a matrix must be the same type. Examine the global variables in the top right panel of the RStudio. You should see a character vector v2 that we can use to create a matrix m. If the v2 vector is not declared, run the codes above again and examine the change in the global variables of your current session.
m <- matrix(v2, nrow = 2, byrow = TRUE) # creates matrix m
m # prints 2x2 matrix m
Output:
## [,1] [,2]
## [1,] "A picture is" "worth a"
## [2,] "1000" "words."
typeof(m) # returns the type of matrix m
Output: [1] "character"
str(m) # returns structure of matrix m
Output: chr [1:2, 1:2] "A picture is" "1000" "worth a" "words."
While you are in the RStudio environment, go to Help and type matrix() into the search box. Scroll down to find base::matrix to explore and learn more about this function.
Let’s create a numeric matrix with 10 rows and 10 columns that hold all the integers from 1 to 100.
m1 <- matrix(1:100, nrow=10) # creates matrix m1
m1 # prints 10x10 matrix m1
Output:
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 1 11 21 31 41 51 61 71 81 91
## [2,] 2 12 22 32 42 52 62 72 82 92
## [3,] 3 13 23 33 43 53 63 73 83 93
## [4,] 4 14 24 34 44 54 64 74 84 94
## [5,] 5 15 25 35 45 55 65 75 85 95
## [6,] 6 16 26 36 46 56 66 76 86 96
## [7,] 7 17 27 37 47 57 67 77 87 97
## [8,] 8 18 28 38 48 58 68 78 88 98
## [9,] 9 19 29 39 49 59 69 79 89 99
## [10,] 10 20 30 40 50 60 70 80 90 100
typeof(m1) # returns the type of matrix m
Output: [1] "integer"
Array
An array is another object in R that can store data in several tables with a similar number of rows and columns. Think of it as a collection of matrices or tables of equal size. To create an array in R, we need to use array() function, details of which you should explore in the Help on the low right panel of RStudio.
vector1 <- 1:10 # creates vector vector1 with numerical values from 1 to 10
vector2 <- c("A", "B", "C", "D") # creates vector vector2 with characters A, B, C, and D
a <- array(c(vector1, vector2),
dim = c(4, 3, 2),
dimnames = list(c("row1", "row2", "row3", "row4"),
c("column1", "column2", "column3"),
c("array1", "array2"))
)
We created array a with elements of the two vectors presented in 2 tables (or matrices). Each table in the array has 4 rows and 3 columns. The size of the matrices and the array we specified with dim attribute during creation of the array a above.
Let’s print out the array a and see its structure.
a # prints elements of the array a
Output:
## , , array1
##
## column1 column2 column3
## row1 "1" "5" "9"
## row2 "2" "6" "10"
## row3 "3" "7" "A"
## row4 "4" "8" "B"
##
## , , array2
##
## column1 column2 column3
## row1 "C" "3" "7"
## row2 "D" "4" "8"
## row3 "1" "5" "9"
## row4 "2" "6" "10"
Notice how R loops through all the array elements populating an indicated number of tables of a specified size with the array’s elements.
We can directly access an element in the array by specifying the row number, column number, and the table as illustrated in the codes below.
a["row2", "column1", "array1"]
Output: [1] "2"
a["row1", "column1", "array2"]
Output: [1] "C"
Data Frame
A data frame is a particular type of list where every vector is the same length. Think of a data frame as a table with columns of the same size. You might recall from the example above that a matrix is also a table and wonder what the difference is between matrices and data frames.
As metntioned above, a matrix is a table that must hold data of the same data type. A data frame is a table that can have data of different types.
Consider working with data that contains information about employees, including their names and annual pay for the last three years.
Presenting this data as a data frame would allow collecting the employee’s names and the amount of annual pay for each year, in a table with four columns; in contrast to a matrix, which only allows collecting the paid amount as a table with three columns, or just names as a table of one column.
Let’s jump in and create a data frame df with elements of different types, as illustrated below, using a function data.frame().
numeric_vector<- c(1, 2, 3)
character_vector <- c("a", "b", "c")
logical_vector <- c(TRUE, TRUE, FALSE)
mixed_vector <- c(numeric_vector[1], character_vector[1], logical_vector[1])
df <- data.frame(numeric_vector, character_vector, logical_vector, mixed_vector)
df
Output:
## numeric_vector character_vector logical_vector mixed_vector
## 1 1 a TRUE 1
## 2 2 b TRUE a
## 3 3 c FALSE TRUE
Examine the data frame using useful R functions:
- nrow()
- ncol()
- summary()
- str()
nrow(df) # returns the number of rows in the data frame df
Output: [1] 3
ncol(df) # returns the number of columns in the data frame df
Output: [1] 4
summary(df) # returns the summary of each variable in the data frame df
Output:
## numeric_vector character_vector logical_vector mixed_vector
## Min. :1.0 Length:3 Mode :logical Length:3
## 1st Qu.:1.5 Class :character FALSE:1 Class :character
## Median :2.0 Mode :character TRUE :2 Mode :character
## Mean :2.0
## 3rd Qu.:2.5
## Max. :3.0
str(df) # returns the structure of the data frame df
Output:
## 'data.frame': 3 obs. of 4 variables:
## $ numeric_vector : num 1 2 3
## $ character_vector: chr "a" "b" "c"
## $ logical_vector : logi TRUE TRUE FALSE
## $ mixed_vector : chr "1" "a" "TRUE"
Next Steps
Thanks for reviewing Academic Supports first R guide series! We hope you will save these to help you as you move through your school work. Want more R? Click below:
Need More Help?
Click here to schedule a 1:1 with a tutor, coach, and or sign up for a workshop. *If this link does not bring you directly to our platform, please use our direct link to "Academic Support" from any Brightspace course at the top of the navigation bar.