Python Data Structures
- Summary
-
Discussion
- With respect to data types, what are the differences between Python2 and Python3?
- What data structures in Python are immutable and mutable?
- What data structures in Python are suited to handle binary data?
- What containers and sequences are available in Python?
- How can I construct some common containers?
- What are iterables and iterators?
- Can I convert from one data type to another?
- Should I use a list or a tuple?
- When should I use set and dict?
- How can I implement a linked list in Python?
- Milestones
- Sample Code
- References
- Further Reading
- Article Stats
- Cite As

Among the basic data types and structures in Python are the following:
- Logical:
bool
- Numeric:
int
,float
,complex
- Sequence:
list
,tuple
,range
- Text Sequence:
str
- Binary Sequence:
bytes
,bytearray
,memoryview
- Map:
dict
- Set:
set
,frozenset
All of the above are classes from which object instances can be created. In addition to the above, more data types/structures are available in modules that come as part of any default Python installation: collections
, heapq
, array
, enum
, etc. Extra numeric types are available from modules numbers
, decimals
and fractions
. The built-in function type()
allows us to obtain the type of any object.
Discussion
-
With respect to data types, what are the differences between Python2 and Python3? The following are important differences:
- A division such as
5 / 2
returns integer value 2 in Python2 due to truncation. In Python3, this will evaluate to float value 2.5 even when the input values are only integers. - In Python2, strings were ASCII. To use Unicode, one had to use the
unicode
type by creating them with a prefix:name = u'Saṃsāra'
. In Python3,str
type is Unicode by default. - Python2 has
int
andlong
types but both these are integrated in Python3 asint
. Integers can be as large as system memory allows.
- A division such as
-
What data structures in Python are immutable and mutable? Mutable objects are those that can be changed after they are created, such as updating/adding/removing an element in a
list
. It can be said that mutable objects are changed in place.Immutable objects can't be changed in place after they are created. Among the immutable basic data types/structures are
bool
,int
,float
,complex
,str
,tuple
,range
,frozenset
, andbytes
.The mutable counterparts of
frozenset
andbytes
areset
andbytearray
respectively. Among the other mutable data structures arelist
anddict
.With immutable objects it may seem like we can modify their values by assignment. What actually happens is that a new immutable object is created and then assigned to the existing variable. This can be verified by checking the ID (using
id()
function) of the variable before and after assignment. -
What data structures in Python are suited to handle binary data? The core built-in types for manipulating binary data are
bytes
andbytearray
. They are supported bymemoryview
, which uses the buffer protocol to access the memory of other binary objects without needing to make a copy.The
array
module supports efficient storage of basic data types like 32-bit integers and IEEE754 double-precision floating values. Characters, integers and floats can be storedarray
types, which gives low-level access to the bytes that store the data. -
What containers and sequences are available in Python? Containers are data structures that contain one or more objects. In Python, a container object can contain objects of different types. For that matter, a container can contain other containers at any depth. Containers may also be called collections.
Sequences are containers that have inherent ordering among their items. For example, a string such as
str = "hello world"
is a sequence of Unicode characters h, e, l, etc. Note that there is no character data type in Python, and the expression "h" is actually a 1-character string.Sequences support two main operations (eg. sequence variable
seq
):- Indexing: Access a particular element:
seq[0]
(first element),seq[-1]
(last element). - Slicing: Access a subset of elements with syntax seq[start:stop:step]:
seq[0::2]
(alternate elements),seq[0:3]
(first three elements),seq[-3:]
(last three elements). Note that the stop point is not included in the result.
Among the basic sequence types are
list
,tuple
,range
,str
,bytes
bytearray
andmemoryview
. Conversely,dict
,set
andfrozenset
are simply containers in which elements don't have any particular order. More containers are part ofcollections
module. - Indexing: Access a particular element:
-
How can I construct some common containers? The following examples are self-explanatory:
- str:
a = ''
(empty),a = ""
(empty),a = 'Hello'
- bytes:
a = b''
(empty),a = b""
(empty),a = b'Hello'
- list:
a = list()
(empty),a = []
(empty),a = [1, 2, 3]
- tuple:
a = tuple()
(empty),a = (1,)
(single item),a = (1, 2, 3)
,a = 1, 2, 3
- set:
a = set()
(empty),a = {1, 2, 3}
- dict:
a = dict()
(empty),a = {}
(empty),a = {1:2, 2:4, 3:9}
We can construct
bytearray
frombytes
andfrozenset
fromset
using their respective built-in functions. - str:
-
What are iterables and iterators? An iterable is a container that can be processed element by element. For sequences, elements are processed in the order they are stored. For non-sequences, elements are processed in some arbitrary order.
Formally, any object that implements the iterator protocol is an iterable. The iterator protocol is defined by two special methods,
__iter__()
and__next__()
. Callingiter()
on an iterable returns what is called an iterator. Callingnext()
on an iterator gives us the next element of the iterable. Thus, iterators help us process the iterable element by element.When we use loops or comprehensions in Python, iterators are used under the hood. Programmers don't need to call
iter()
ornext()
explicitly. -
Can I convert from one data type to another? Yes, provided they are compatible. Here are some examples:
int('3')
will convert from string to integerint(3.4)
will truncate float to integerbool(0)
andbool([])
will both returnFalse
ord('A')
will return the equivalent Unicode code point as an integer valuechr(65)
will return the equivalent Unicode string of one characterbin(100)
,oct(100)
andhex(100)
will return string representations in their respective basesint('45', 16)
andint('0x45', 16)
will convert from hexadecimal to decimaltuple([1, 2, 3])
will convert from list to tuplelist('hello')
will split the string into a list of 1-character stringsset([1, 1, 2, 3])
will remove duplicates in the list to give a setdict([(1,2), (2,4), (3,9)])
will construct a dictionary from the given list of tupleslist({1:2, 2:4, 3:9})
will return a list based on the dictionary keys.
-
Should I use a list or a tuple? Tuples are used to pass arguments and return results from functions. This is because they can contain multiple elements and are immutable. Tuples are also good for storing closely related data. For example, (x, y, z) coordinates or (r, g, b) colour components can be stored as tuples. Use lists instead if values can change during the lifetime of the object.
Although lists can contain heterogeneous items, tuples are more commonly used for this purpose. Tuples group together items that belong together even if their types are different. A database record containing a student's details can be stored in a tuple.
If a sequence is to be sorted, use a list for in-place sorting. A tuple can be used but it should return a new sorted object. A tuple cannot be sorted in-place.
For better code readability, elements of a tuple can be named. For this purpose, use
collections.namedtuple
class. This allows us to access the elements via their names rather than tuple indices.It's possible to convert between lists and tuples using functions
list()
andtuple()
. -
When should I use set and dict? Sets and dictionaries have no order. However, from Python 3.7, the order in which items are inserted into a dict is preserved. From Python 3.8, we can iterate through a dict in reverse order.
Sets store unique items. Duplicates are discarded. Dictionaries can contain duplicate values but keys must be unique. Since dict keys are unique, often dict is used for counting. For example, to count the number of occurrences of each word in a document, words can be keys and counts can be values.
Sets are suited for finding the intersection/union of two groups, such as finding those who live in a neighbourhood (set 1) and/or also own a car (set 2). Other set operations are also possible.
Strings, lists and tuples can take only integers as indices due to their ordered nature but dictionaries can be indexed by strings as well. In general, dictionaries can be indexed by any of the built-in immutable types, which are considered hashable. Thus, dictionaries are suited for key-value pairs such as mapping country names (keys) to their capitals (values). But if capitals are the more common input to your algorithm, use them as keys instead.
-
How can I implement a linked list in Python? Linked list is a collection of nodes connected by links or pointers. A node is single data point in the linked list. It not only holds the data, but also has a pointer to the next node in a single-linked list. Thus, the definition of a node is recursive. For a double-linked list, the node holds two pointers, one to the previous node and one to the next node. A linked list can be designed to be ordered or unordered.
The head of the linked list must be accessible. This allows us to traverse the entire list and perform all possible operations. A double-linked list might also expose the tail for traversal from the end.
While a
Node
class may be enough to implement a linked list, it's common to encapsulate the head pointer and all operations withinLinkedList
class. Operations on the linked lists are methods of the class. One possible implementation is given by Downey. ADoubleLinkedList
can be a derived class fromLinkedList
with the addition of a tail pointer and associated methods.
Milestones
Python's creator Guido van Rossum relates the early history of Python with respect to numbers. Implementation used machine integers and machine binary floating point. He explains that int
were treated as signed normally but were unsigned in bitwise operations; but long
were always signed and this caused problems. The int
type could also overflow and raise an exception. Today integers are unbounded.
Sample Code
References
- Atabekov, Farrukh. 2019. "Programming in Python (Udacity)." Medium, October 13. Accessed 2020-07-25.
- Au, Eden. 2019. "6 New Features in Python 3.8 for Python Newbies." Towards Data Science, on Medium, January 02. Accessed 2020-01-02.
- Batchelder, Ned. 2009. "List vs tuple, when to use each?" StackOverflow, answered on November 10. Accessed 2020-07-25.
- Downey, Allen B. 2017. "Think Python: How to Think Like a Computer Scientist." Green Tea Press. Version 2.0.17. Accessed 2017-12-11.
- Hettinger, Raymond (ed). 2019. "What’s New In Python 3.8." Python Docs, October 14. Accessed 2020-01-02.
- Hunner, Trey. 2016. "The Iterator Protocol: How for Loops Work in Python." December 28. Accessed 2017-12-11.
- Mohan, Megha. 2017. "Mutable vs Immutable Objects in Python." Medium. May 25. Accessed 2017-12-11.
- PCA. 2018. "Performance of Python Data Structures." December 17. Accessed 2020-07-25.
- Pranskevichus, Elvis (ed). 2018. "What’s New In Python 3.7." Python Docs. Accessed 2019-03-19.
- Programiz. 2017. "Python Iterators." Accessed 2017-12-11.
- Python Docs. 2017a. "The Python Standard Library." V3.5.4. Python Software Foundation. October 4. Accessed 2017-12-11.
- Python Docs. 2017b. "Glossary." V3.5.4. Python Software Foundation. October 4. Accessed 2017-12-11.
- Rossum, Guido van. 2009. "Early Language Design and Development: From ABC to Python." The History of Python. February 3. Accessed 2017-12-11.
- Tagliaferri, Lisa. 2016. "Python 2 vs Python 3: Practical Considerations." Digital Ocean. August 17. Accessed 2017-12-11.
- Vickery, James. 2016. "Immutable vs Mutable types." Stack Overflow. November 30. Accessed 2017-04-22.
- Wentworth, Peter, Jeffrey Elkner, Allen B. Downey, and Chris Meyers. 2012. "Tuples." Chapter 9 in: Learning with Python 3, How to Think Like a Computer Scientist, October. Accessed 2020-07-25.
- WikiBooks. 2017. "Python Programming/Data Types." December 11. Accessed 2017-12-11.
- Yee, Ka-Ping and Guido van Rossum. 2001. "PEP 234 -- Iterators." Python Developer's Guide. January 30. Accessed 2017-12-11.
- Zadka, Moshe and Guido van Rossum. 2001a. "PEP 237 -- Unifying Long Integers and Integers." Python Developer's Guide. March 11. Accessed 2017-12-11.
- Zadka, Moshe and Guido van Rossum. 2001b. "PEP 238 -- Changing the Division Operator." Python Developer's Guide. March 11. Accessed 2017-12-11.
- Пе, Максим. 2018. "File:Python 3. The standard type hierarchy.png." Wikimedia Commons, November 2. Accessed 2020-07-25.
Further Reading
- Pilgrim, Mark. 2011. "Native Datatypes." Dive Into Python 3. Accessed 2017-12-11.
- Cokelaer, Thomas. 2014. "Python Notes (0.14.0)." August 14. Accessed 2017-12-11.
- Driessen, Vincent. 2014. "Iterables vs. Iterators vs. Generators." nvie.com. September 25. Accessed 2017-12-11.
- Simon. 2009. "Understanding slice notation." StackOverflow, asked on February 03. Accessed 2019-03-19.
Article Stats
Cite As
See Also
- Python
- Data Structures
- Abstract Data Type
- NumPy Data Types
- Pandas Data Types
- Python Iterators