Data structures
Table of Contents
- Introduction
- Vectors or tuples
- Matrices or arrays
- Lists, structures or data frames
- Dictionaries
- Challenge solutions
Introduction
Now that we have our data types - numbers, logicals and character strings - what sort of data structures can we create? In spoken language we make sentences, paragraphs and essays. In programming, we have vectors, matrices, lists and dictionaries.
Vectors or tuples
Vectors or tuples (the terminology changes depending on the programming language that you use) are ordered groups of variables of a single data type. Ordered doesn't necessarily mean in ascending or descending order, just that the order matters. The vector of numbers 1
, 2
, 3
is different to the vector of numbers 2
, 3
, 1
. You can think of a vector as a queue of data.
The elements of a vector can be referred to by their position in the vector. For the vector of numbers 2
, 3
, 1
: the first element of the vector is the number 2, the second element is the number 3 and the third element is the number 1.
Here is an example a vector written in R. The vector is named my_first_vector
and contains the character strings: "learning"
, "to"
, "code"
, "is"
, "fun"
.
my_first_vector <- c("learning", "to", "code", "is", "fun")
my_first_vector
[1] "learning" "to" "code" "is" "fun"
Matrices or arrays
You might have noticed that vectors are one-dimensional. What if we have a table of values with rows and columns? We can store two-dimensional data using a matrix, data frame or array. Again the terminology changes depending on the programming language that you are using.
Like vectors, a matrix is an ordered collection of a single data type (number, logical or character string). You can think of matrices like grids. Elements of a matrix can be referred to by their row and column number.
Here is an example of a matrix written in R. The matrix is named PlatoonLeads
. In ResBaz we are broken up into three different platoons, each with its own lead. The PlatoonLeads
matrix contains 2 columns (platoon and name) and 4 rows, each representing a different platoon.
PlatoonLeads <- matrix(
c("Data Wranglers",
"Data Miners",
"Cadventurers",
"Kerry Halupka",
"Kim Doyle",
"Louise van der Werff"),
ncol=2,
dimnames=list(NULL,c("platoon","name")))
PlatoonLeads
platoon name
[1,] "Data Wranglers" "Kerry Halupka"
[2,] "Data Miners" "Kim Doyle"
[3,] "Cadventurers" "Louise van der Werff"
Lists, structures or data frames
So far, the data structures that we have seen have only been able to store data of a single data type (number, logical or character string). What if we want to store multiple data types in the one data structure? In fact, what if we want to store different numbers of the different data types, e.g. 2 numbers, 1 logical and 5 character strings. This is where lists or structures come into the picture. Each element of a list can contain either a single variable, a vector, a matrix or another list.
Here is an example of a shopping list written in R. The list is broken down according to meal and also includes how much money is currently in my wallet. The variable is named shoppingList
.
shoppingList <- list(
breakfast = c("cereal", "milk", "orange juice", "banana"),
lunch = c("bread", "cheese", "tomato"),
dinner = c("frozen pizza", "chocolate mousse"),
moneyInWallet = 40.50
)
shoppingList
$breakfast
[1] "cereal" "milk" "orange juice" "banana"
$lunch
[1] "bread" "cheese" "tomato"
$dinner
[1] "frozen pizza" "chocolate mousse"
$moneyInWallet
[1] 40.5
Dictionaries
The final data structure we will look at is dictionaries. Dictionaries are unique to the Python programming language and contain pairs of variables: values and keys. When we created vectors, we referred to each element by its position number in the vector. In contrast, dictionaries are unordered and you refer to elements by their key. Much like opening a dictionary, where the word is the key and the definition is the value.
The keys must all be the same data types and the values must be the same data type, but the data type of the keys can be different to the data type of the values. E.g. The keys may be numbers and the values vectors or tuples.
Here is an example of a dictionary written in Python. The dictionary is named IMDBRatings
. The keys are character strings: "Game of Thrones"
, "Sherlock"
, "Firefly"
and "Friends"
and the values are numeric values that represent the average rating of those tv shows on IMDB
IMDBRatings = {"Game of Thrones": 9.4,
"Sherlock": 9.2,
"Firefly": 9.1,
"Friends": 8.9}
for movie, rating in IMDBRatings.items():
print movie + ': ' + str(rating)
Firefly: 9.1
Friends: 8.9
Sherlock: 9.2
Game of Thrones: 9.4