Before reading about dictionaries, let's just see them in action in Python:
Assign {}
to an object name to create a new, empty dictionary:
last_names = {} # {} is an empty dictionary
type(last_names) # class name dict
dict
Add items to the dictionary by specifying two things, the key and the value. Here, last_names
is meant to be a data structure that tells you someone's last name given their first name. The syntax is
dict_object[key] = value # assign value to the spot identified by key
last_names['Grace'] = 'Jones'
print(last_names)
{'Grace': 'Jones'}
If we add more key:value pairs, the dictionary grows.
last_names['LeBron'] = 'James'
last_names['George'] = 'Orwell'
print(last_names)
{'Grace': 'Jones', 'LeBron': 'James', 'George': 'Orwell'}
A key must be unique. If we re-use a key, then the old value for that key gets overwritten by the new value.
last_names['George'] = 'Takei'
print(last_names)
{'Grace': 'Jones', 'LeBron': 'James', 'George': 'Takei'}
That's pretty much it for dictionaries. Since the key is unique (only 1 per table), we can think of the key exactly like the primary key in a database table. It's a key! The example above showed strings as keys, but most simple types of data (int
, float
, str
) can be used as keys. Any type can be used a value, and they don't all have to be the same. Values can be numbers, lists, another dictionary. They sky's the limit.
The format we see the dictionary printed out above is a syntax we can use to initialize a non-empty dictionary. We can specify the key:value
pairs as a comma-separate list within curly brackets {}
heterogeneous_dictionary = {1:'elephants', 'tiger':2, 'list': [1,2,3], 'dict': {1:2,3:4,5:6}}
Dictionaries are a built-in data type in Python that provide some functionality of a hash table (the Java HashMap
class) but are more properly called an associative array. Such a data structure is one where one piece of data (the key) uniquely maps to another (the value); they are key-value pair data structures. A list is a collection of single values; the index in the list allows the programmer to access any one value. If you know the length of the list (which you can always find out with len()
, you know the possible indices). A dictionary is a collection of pairs. In order to access the values, it doesn't do you any good to know the length of the dictionary. You have to instead know what keys are in the dictionary; or at least which key is associated with the value you're trying to get. One possible conceptual model of a dictionary is that it is a list that is indexed by an arbitrary key (if it's an integer, it doesn't have to be consecutive starting at 0; it doesn't even have to be an integer) rather than an integer number from the sequence 0, 1, … length of data structure minus one.
So the basic access pattern for a dictionary is object_name[key]
which looks just like accessing a list item. The indication that object_name
is a dictionary and not a list comes when the object is instantiated or constructed. A list object is initialized with []
or list()
while for a dictionary you use { }
or dict()
, i.e. object_name = { }.
With a list, we insert items at a desired index. For instance append()
builds up a list item by item with the next index depending on the number of items in the list so far. Other methods we haven’t covered let you insert items at other places in a list, but there is a performance penalty to do this. (Think back to linked lists in I210/I211 where one must traverse a list to find a given position.) Dictionaries do not incur this performance penalty, so they provide fast lookup and insertion of arbitrary keys. So with a dictionary, we can add items in arbitrary order, and adding each item will be similarly efficient.
Here's a new toy example
dict_object = {} # empty dictionary
dict_object[50] = 1 # assign the value 1 to the key 50
dict_object[0] = 2 # assign the value 2 to the key 0
dict_object["i427"] = "Search informatics" # dictionaries keys don't have to be an integer
print(dict_object[0]) # prints 2
print(dict_object[1]) # Throws KeyError
2
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-5-5360fd9fe3aa> in <module> 4 dict_object["i427"] = "Search informatics" # dictionaries keys don't have to be an integer 5 print(dict_object[0]) # prints 2 ----> 6 print(dict_object[1]) # Throws KeyError KeyError: 1
dict_object
{50: 1, 0: 2, 'i427': 'Search informatics'}
Note the error message we receive if we try to access a non-existent key. When we evaluate dict_object
, remember that we see it presented as a comma-separated list of items with each item being of the form key:value. The key:value pairs will generally be presented in the order they were added to the dictionary. All the collection types we have seen (list, tuple, set, and dict) hold collections of items, but in all of them except dict an item is a single value. In a dict the item is a pairing of two things—the key and the value.
If we use the same for-loop approach we’ve used so far with lists, tuples, and sets, we’ll end up with the keys of the dictionary stored in the dummy variable during each iteration. Note that the teams have changed for 2 out of the 3 players since this example code was written. NBA churn, baby!
dict_of_players = { "LeBron" : "Lakers", "KD" : "Warriors", "CP3" : "Rockets" }
for key in dict_of_players:
print(key)
LeBron KD CP3
key
is an ok name for a dummy variable when traversing a dictionary, but it's generic. It's better to be specific.
for player in dict_of_players:
print(player)
LeBron KD CP3
Of course, if we have the keys, we can use them to access the values:
for player in dict_of_players:
print(player)
print(dict_of_players[player])
LeBron Lakers KD Warriors CP3 Rockets
But such code is not very Pythonic. The root of the problem is that dictionaries are more complicated since they involve not one value per item in the collection but two–the key and the value. The first example is good. If we only need the keys, that is how we traverse a dictionary. If we need the key and the value, then, Python provides an items() function that returns both in dummy variables for every iteration.
for player,team in dict_of_players.items():
print(player)
print(team)
LeBron Lakers KD Warriors CP3 Rockets
To achieve this form of iteration, the .items() member function of the dict class yields a 2-item tuple just like enumerate() did with a list. On the other hand the member functions .keys() and .values() return either desired part of each dictionary entry. Referring to the first dictionary example, we can see that it is not necessary to invoke .keys() :
Finally, it is worth mentioning that if we don’t need the keys, we shouldn’t bother to access them. Python also provides us with a values() function in this case, but it’s much less commonly-used than items(). Incidentally, there’s also a keys() function that is equivalent to the first example. It’s an antipattern, so it’s not illustrated here.
for team in dict_of_players.values():
print(team)
Lakers Warriors Rockets
This section shouldn't be read until you're comfortable using a dictionary in Python and able to complete some basic programming exercises with one. Once you can do that, we can use the concept that the value in the key:value pair can be any kind of data structure. The examples above used values that were basic types like int
, float
, or str
. The examples below will show cases where the values are collections like list
or another dict
. Having one data structure be a value in another gives rise to the idea of a nested data structure. Once collection or data structure is part of another as a value within it.
It's not uncommon to use a 2D array or a list of lists when programming. Imagine storing a mathematical matrix (a rectangular table of numbers with a given number of rows and columns) or just a numerical table. We would often need a way to access the value at the ith row and jth column in order to set it, read it, change it, etc. Python doesn't have a built-in way to create such a data structure, but we don't need one because we can build it out of lists.
To be specific, let's imagine we're storing a table of class grades for 5 students where we'll have hw grades and project grades. We can store this in the way most gradebooks are presented with 1 row per student and 1 column per grade type. So we need to store a 5 x 2 matrix, 5 rows x 2 columns. To do this we can make a list
that holds the rows, and then the entry in each row can hold the grades for that student in a row. Let's build such a list from scratch, initializing each grade to 0
.
num_students = 5 # number of rows
student_grades = [] # this will be our list of lists
for i in range(num_students):
student_grades.append([0, 0]) # make a new gradebook entry for each student
print(student_grades)
[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]
The above example used our previous pattern to accumulate a list item-by-item beginning with an empty list. The only thing we did that made it a nested data structure was appending a list
of values instead of an int
, str
, or float
. Once we've done that, we can access items in the student_grades
list, and as always, we should pay attention what type our data are at all times.
First, let's get the fourth student's grades. Then we'll get the first grade of the second student.
print(student_grades[3]) # fourth student's grades are in row 3 since student 0 is first one
type(student_grades[3]) # the fourth student's grades a list. We know how to work with lists
[0, 0]
list
The type of student_grades
is a list
. The type of any item or value in student_grades
is a list
. It's a list of lists. When we access any item in student_grades
by its index, we get a list
back. All of that is illusrated above. We also know how to work with a list. If we want an item from a list we use [index]
after the list's name to access that item by index. So if student_grades[1]
is a list of the second students grade, then [index]
after it will let us access things by index:
print(student_grades[1][0]) # get the first item in the second student's list of grades
0
This access is written a single statement in Python, but it takes some getting used to for many INFO students. Here's the breakdown of how the data works.
student_grades[1]
evaluates to its value. That value is a list
.[0]
is bound to the value of student_grades[1]
, a list
, so we get the value of the first item in that list.It could be programmed as two statements
second_students_grades = student_grades[1]
print(second_students_grades[0])
but it's more typical to program it as the single, more concise statement
print(student_grades[1][0])
When we built the list above, we used a hardcoded list of two 0's to add each item, but that's very rigid. More typically, the program would have the number of columns (number of grades) stored in a variable, and we would have to build the grade list value by value just like we built the student grade entries (the rows) value by value. That code would look like:
num_students = 5 # number of rows
num_grades = 2 # number of columns
student_grades = [] # this will be our list of lists
for i in range(num_students):
student_grades.append([]) # make a new gradebook entry for each student
for j in range(num_grades):
student_grades[i].append(0) # initialize new grade for each category
print(student_grades)
[[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]
With this code, the example is complete, for we have used nested for
loops to accumulate a nested list of lists. The entire data structure begins as an empty list, and we add an empty list for each row. We append to each row's empty column list the new value for each column.
The lists of lists works great when we need access by row and column index, but even for the simple gradebook example it's not the best choice. To use that gradebook, we need to know which student is at row 0, which at row 1, and so on. Likewise, we need to know which grade is at column 0, at column 1, and so on. If we decide we need to use student names as the index instead of row number, the list is not going to match that idea. We need to use a dictionary instead. Each student name can be a key, and the student's list of grades can be the value. We can set it up like this:
student_names = ['Zeke', 'Zutroid', 'Zombie', 'Zooology']
student_grades = {}
num_grades = 2
for student_name in student_names:
student_grades[student_name] = [] # make a new gradebook entry for each student
for j in range(num_grades):
student_grades[student_name].append(0) # initialize new grade for each category
print(student_grades)
{'Zeke': [0, 0], 'Zutroid': [0, 0], 'Zombie': [0, 0], 'Zooology': [0, 0]}
If we need to access Zutroid's grades, we access them by name, which is easy to remember, easy to read, and not as arbitrary as having to know which row Zutroid's grade occupy.
print(student_grades['Zutroid']) # the type of this data is a list
print(student_grades['Zutroid'][0]) # adding `[0]` to a list gives us the value of the first item in the list
[0, 0] 0
Finally, we can extend the example completely to a dictionary of dictionaries. If we recognize it makes more sense (easier to read, easier to write, not as arbitrary) to refer to the grade columns by name, too, we should use a dictionary to hold those as well.
student_names = ['Zeke', 'Zutroid', 'Zombie', 'Zooology']
student_grades = {}
grade_categories = ['HW','Projects']
for student_name in student_names:
student_grades[student_name] = {} # make a new gradebook entry for each student
for category in grade_categories:
student_grades[student_name][category] = 0 # initialize new grade for each category
print(student_grades)
{'Zeke': {'HW': 0, 'Projects': 0}, 'Zutroid': {'HW': 0, 'Projects': 0}, 'Zombie': {'HW': 0, 'Projects': 0}, 'Zooology': {'HW': 0, 'Projects': 0}}
The details to note in these examples that haven't already been mentioned are:
[]
. If it's a dictionary, {}
is always used. This applies to setting up the initial data structure to hold the rows and the nested data structre to hold the columns.i
and j
for indices (lists) and actual, meaningful, descriptive names for dictionariesdata_structure_name[row_label][col_label]
because the []
syntax works for access items in lists and dictionaries. The innermost []
is evaluate first to yield a value data_structure_name[row_label]
and then the outermost []
one is evaluated to yield the final value of the expression.Once again, the advantage of dictionaries is that we don’t have to remember the order of the data or which data goes with which index number. We can access values by name which makes our programs considerably easier to write and to read. The readability of a program doesn’t seem that important to many students at first, but once you realize large programs are written in pieces at many different times, the ability to come back to a partially completed program after a big gap of time and quickly get back up to speed with it is invaluable. That's the big advantage of dictionaries. The big advantage of lists is that sometime consecutive integer indices are the natural way to describe the data being stored, and in that case it's best to use the data structure that maps most easily to that natural ordering.