Data structures

Learning Objective

  • to become familiar with four different data structures:
    • vectors
    • matrices
    • lists
    • dictionaries



Table of Contents



Introduction

Now that we have our data types - numbers, logicals and character strings - what sort of data structures can we create? In spoken language we make sentences, paragraphs and essays. In programming, we have vectors, matrices, lists and dictionaries.



Vectors or tuples

Vectors or tuples (the terminology changes depending on the programming language that you use) are ordered groups of variables of a single data type. Ordered doesn't necessarily mean in ascending or descending order, just that the order matters. The vector of numbers 1, 2, 3 is different to the vector of numbers 2, 3, 1. You can think of a vector as a queue of data.

The elements of a vector can be referred to by their position in the vector. For the vector of numbers 2, 3, 1: the first element of the vector is the number 2, the second element is the number 3 and the third element is the number 1.


Challenge 1

For the vector containing the character strings "I", "love", "programming", "in", "R", what is the value of the:

A. first element?

B. second element?

C. fifth element?


Tip: Don't refer to positions in a vector that don't exist!

When referring to elements of a vector, it is important to make sure that the element actually exists. If you have a vector of 5 elements and try to refer to the sixth element, your computer is likely to give you some unexpected behaviour.


Here is an example a vector written in R. The vector is named my_first_vector and contains the character strings: "learning", "to", "code", "is", "fun".

my_first_vector <- c("learning", "to", "code", "is", "fun")

my_first_vector
[1] "learning" "to"       "code"     "is"       "fun"



Matrices or arrays

You might have noticed that vectors are one-dimensional. What if we have a table of values with rows and columns? We can store two-dimensional data using a matrix, data frame or array. Again the terminology changes depending on the programming language that you are using.

Like vectors, a matrix is an ordered collection of a single data type (number, logical or character string). You can think of matrices like grids. Elements of a matrix can be referred to by their row and column number.


Challenge 2

     [,1] [,2] [,3] [,4]
[1,]    1    2    3    4
[2,]    5    6    7    8
[3,]    9   10   11   12

For the above 3 x 4 matrix (3 rows x 4 columns), name the element of the matrix in the following positions:

A. Row 1, Column 1

B. Row 3, Column 2

C. Row 2, Column 4

D. Row 4, Column 3


Here is an example of a matrix written in R. The matrix is named PlatoonLeads. In ResBaz we are broken up into three different platoons, each with its own lead. The PlatoonLeads matrix contains 2 columns (platoon and name) and 4 rows, each representing a different platoon.

PlatoonLeads <- matrix(
  c("Data Wranglers", 
    "Data Miners",  
    "Cadventurers", 
    "Kerry Halupka", 
    "Kim Doyle",  
    "Louise van der Werff"), 
  ncol=2,
  dimnames=list(NULL,c("platoon","name"))) 

PlatoonLeads
     platoon          name                  
[1,] "Data Wranglers" "Kerry Halupka"       
[2,] "Data Miners"    "Kim Doyle"           
[3,] "Cadventurers"   "Louise van der Werff"



Lists, structures or data frames

So far, the data structures that we have seen have only been able to store data of a single data type (number, logical or character string). What if we want to store multiple data types in the one data structure? In fact, what if we want to store different numbers of the different data types, e.g. 2 numbers, 1 logical and 5 character strings. This is where lists or structures come into the picture. Each element of a list can contain either a single variable, a vector, a matrix or another list.


Tip: Data frames are a special type of list

Data frames are a special type of list that is unique to R. Data frames act like matrices in that they are broken up into rows and columns, but each column can hold a different data type.


Challenge 3

[[1]]
[1] 1

[[2]]
 [1]  1  2  3  4  5  6  7  8  9 10

[[3]]
[1] "third"   "element"

[[4]]
      [,1]  [,2]
[1,]  TRUE  TRUE
[2,] FALSE FALSE

What data type and data structure is each element of the above list?


Here is an example of a shopping list written in R. The list is broken down according to meal and also includes how much money is currently in my wallet. The variable is named shoppingList.

shoppingList <- list(
  breakfast     = c("cereal", "milk", "orange juice", "banana"),
  lunch         = c("bread", "cheese", "tomato"),
  dinner        = c("frozen pizza", "chocolate mousse"),
  moneyInWallet = 40.50
)

shoppingList
$breakfast
[1] "cereal"       "milk"         "orange juice" "banana"      

$lunch
[1] "bread"  "cheese" "tomato"

$dinner
[1] "frozen pizza"     "chocolate mousse"

$moneyInWallet
[1] 40.5



Dictionaries

The final data structure we will look at is dictionaries. Dictionaries are unique to the Python programming language and contain pairs of variables: values and keys. When we created vectors, we referred to each element by its position number in the vector. In contrast, dictionaries are unordered and you refer to elements by their key. Much like opening a dictionary, where the word is the key and the definition is the value.


Tip: Keys must be unique values

It’s important that each key is unique. It would be difficult to look up the meaning of a word in the dictionary if the word were repeated multiple times with completely different meanings each time!


The keys must all be the same data types and the values must be the same data type, but the data type of the keys can be different to the data type of the values. E.g. The keys may be numbers and the values vectors or tuples.


Challenge 4

Think of an example of a dictionary. Remember that the keys must be all the same data type and the values must be of the same data type, but the data type of the keys and values can be different.


Here is an example of a dictionary written in Python. The dictionary is named IMDBRatings. The keys are character strings: "Game of Thrones", "Sherlock", "Firefly" and "Friends" and the values are numeric values that represent the average rating of those tv shows on IMDB

IMDBRatings = {"Game of Thrones": 9.4, 
    "Sherlock": 9.2, 
    "Firefly": 9.1, 
    "Friends": 8.9}

for movie, rating in IMDBRatings.items():
    print movie + ': ' + str(rating)
Firefly: 9.1
Friends: 8.9
Sherlock: 9.2
Game of Thrones: 9.4



Challenge solutions

Solution to Challenge 1

A. "I"

B. "love"

C. "R"

Solution to Challenge 2

A. 1

B. 10

C. 8

D. The element at Row number 4 and Column number 3 doesn't exist, since there are only 3 rows in the matrix!

Solution to Challenge 3

Element 1: a single numeric variable

Element 2: a numeric vector

Element 3: a character string vector

Element 4: a logical matrix

results matching ""

    No results matching ""