Python Data Structures

Python's type hierarchy. Source: Пе 2018.
Python's type hierarchy. Source: Пе 2018.

Among the basic data types and structures in Python are the following:

  • Logical: bool
  • Numeric: int, float, complex
  • Sequence: list, tuple, range
  • Text Sequence: str
  • Binary Sequence: bytes, bytearray, memoryview
  • Map: dict
  • Set: set, frozenset

All of the above are classes from which object instances can be created. In addition to the above, more data types/structures are available in modules that come as part of any default Python installation: collections, heapq, array, enum, etc. Extra numeric types are available from modules numbers, decimals and fractions. The built-in function type() allows us to obtain the type of any object.

Discussion

  • With respect to data types, what are the differences between Python2 and Python3?

    The following are important differences:

    • A division such as 5 / 2 returns integer value 2 in Python2 due to truncation. In Python3, this will evaluate to float value 2.5 even when the input values are only integers.
    • In Python2, strings were ASCII. To use Unicode, one had to use the unicode type by creating them with a prefix: name = u'Saṃsāra'. In Python3, str type is Unicode by default.
    • Python2 has int and long types but both these are integrated in Python3 as int. Integers can be as large as system memory allows.
  • What data structures in Python are immutable and mutable?

    Mutable objects are those that can be changed after they are created, such as updating/adding/removing an element in a list. It can be said that mutable objects are changed in place.

    Immutable objects can't be changed in place after they are created. Among the immutable basic data types/structures are bool, int, float, complex, str, tuple, range, frozenset, and bytes.

    The mutable counterparts of frozenset and bytes are set and bytearray respectively. Among the other mutable data structures are list and dict.

    With immutable objects it may seem like we can modify their values by assignment. What actually happens is that a new immutable object is created and then assigned to the existing variable. This can be verified by checking the ID (using id() function) of the variable before and after assignment.

  • What data structures in Python are suited to handle binary data?

    The core built-in types for manipulating binary data are bytes and bytearray. They are supported by memoryview, which uses the buffer protocol to access the memory of other binary objects without needing to make a copy.

    The array module supports efficient storage of basic data types like 32-bit integers and IEEE754 double-precision floating values. Characters, integers and floats can be stored array types, which gives low-level access to the bytes that store the data.

  • What containers and sequences are available in Python?
    List data type and its relation to other types. Source: Mohan, 2017.
    List data type and its relation to other types. Source: Mohan, 2017.

    Containers are data structures that contain one or more objects. In Python, a container object can contain objects of different types. For that matter, a container can contain other containers at any depth. Containers may also be called collections.

    Sequences are containers that have inherent ordering among their items. For example, a string such as str = "hello world" is a sequence of Unicode characters h, e, l, etc. Note that there is no character data type in Python, and the expression "h" is actually a 1-character string.

    Sequences support two main operations (eg. sequence variable seq):

    • Indexing: Access a particular element: seq[0] (first element), seq[-1] (last element).
    • Slicing: Access a subset of elements with syntax seq[start:stop:step]: seq[0::2] (alternate elements), seq[0:3] (first three elements), seq[-3:] (last three elements). Note that the stop point is not included in the result.

    Among the basic sequence types are list, tuple, range, str, bytes bytearray and memoryview. Conversely, dict, set and frozenset are simply containers in which elements don't have any particular order. More containers are part of collections module.

  • How can I construct some common containers?
    Typical ways to initialize some Python containers. Source: Atabekov 2019.
    Typical ways to initialize some Python containers. Source: Atabekov 2019.

    The following examples are self-explanatory:

    • str: a = '' (empty), a = "" (empty), a = 'Hello'
    • bytes: a = b'' (empty), a = b"" (empty), a = b'Hello'
    • list: a = list() (empty), a = [] (empty), a = [1, 2, 3]
    • tuple: a = tuple() (empty), a = (1,) (single item), a = (1, 2, 3), a = 1, 2, 3
    • set: a = set() (empty), a = {1, 2, 3}
    • dict: a = dict() (empty), a = {} (empty), a = {1:2, 2:4, 3:9}

    We can construct bytearray from bytes and frozenset from set using their respective built-in functions.

  • What are iterables and iterators?

    An iterable is a container that can be processed element by element. For sequences, elements are processed in the order they are stored. For non-sequences, elements are processed in some arbitrary order.

    Formally, any object that implements the iterator protocol is an iterable. The iterator protocol is defined by two special methods, __iter__() and __next__(). Calling iter() on an iterable returns what is called an iterator. Calling next() on an iterator gives us the next element of the iterable. Thus, iterators help us process the iterable element by element.

    When we use loops or comprehensions in Python, iterators are used under the hood. Programmers don't need to call iter() or next() explicitly.

  • Can I convert from one data type to another?

    Yes, provided they are compatible. Here are some examples:

    • int('3') will convert from string to integer
    • int(3.4) will truncate float to integer
    • bool(0) and bool([]) will both return False
    • ord('A') will return the equivalent Unicode code point as an integer value
    • chr(65) will return the equivalent Unicode string of one character
    • bin(100), oct(100) and hex(100) will return string representations in their respective bases
    • int('45', 16) and int('0x45', 16) will convert from hexadecimal to decimal
    • tuple([1, 2, 3]) will convert from list to tuple
    • list('hello') will split the string into a list of 1-character strings
    • set([1, 1, 2, 3]) will remove duplicates in the list to give a set
    • dict([(1,2), (2,4), (3,9)]) will construct a dictionary from the given list of tuples
    • list({1:2, 2:4, 3:9}) will return a list based on the dictionary keys.
  • Should I use a list or a tuple?
    Complexity of list operations. Source: PCA 2018.
    Complexity of list operations. Source: PCA 2018.

    Tuples are used to pass arguments and return results from functions. This is because they can contain multiple elements and are immutable. Tuples are also good for storing closely related data. For example, (x, y, z) coordinates or (r, g, b) colour components can be stored as tuples. Use lists instead if values can change during the lifetime of the object.

    Although lists can contain heterogeneous items, tuples are more commonly used for this purpose. Tuples group together items that belong together even if their types are different. A database record containing a student's details can be stored in a tuple.

    If a sequence is to be sorted, use a list for in-place sorting. A tuple can be used but it should return a new sorted object. A tuple cannot be sorted in-place.

    For better code readability, elements of a tuple can be named. For this purpose, use collections.namedtuple class. This allows us to access the elements via their names rather than tuple indices.

    It's possible to convert between lists and tuples using functions list() and tuple().

  • When should I use set and dict?

    Sets and dictionaries have no order. However, from Python 3.7, the order in which items are inserted into a dict is preserved. From Python 3.8, we can iterate through a dict in reverse order.

    Sets store unique items. Duplicates are discarded. Dictionaries can contain duplicate values but keys must be unique. Since dict keys are unique, often dict is used for counting. For example, to count the number of occurrences of each word in a document, words can be keys and counts can be values.

    Sets are suited for finding the intersection/union of two groups, such as finding those who live in a neighbourhood (set 1) and/or also own a car (set 2). Other set operations are also possible.

    Strings, lists and tuples can take only integers as indices due to their ordered nature but dictionaries can be indexed by strings as well. In general, dictionaries can be indexed by any of the built-in immutable types, which are considered hashable. Thus, dictionaries are suited for key-value pairs such as mapping country names (keys) to their capitals (values). But if capitals are the more common input to your algorithm, use them as keys instead.

  • How can I implement a linked list in Python?

    Linked list is a collection of nodes connected by links or pointers. A node is single data point in the linked list. It not only holds the data, but also has a pointer to the next node in a single-linked list. Thus, the definition of a node is recursive. For a double-linked list, the node holds two pointers, one to the previous node and one to the next node. A linked list can be designed to be ordered or unordered.

    The head of the linked list must be accessible. This allows us to traverse the entire list and perform all possible operations. A double-linked list might also expose the tail for traversal from the end.

    While a Node class may be enough to implement a linked list, it's common to encapsulate the head pointer and all operations within LinkedList class. Operations on the linked lists are methods of the class. One possible implementation is given by Downey. A DoubleLinkedList can be a derived class from LinkedList with the addition of a tail pointer and associated methods.

Milestones

2001

Iterators are introduced into the language for looping through containers.

2001

Types int and long are unified into a single type.

2001

Division operator is changed to have three variants: classic division (truncation), true division, floor division (using // operator).

2009

Python's creator Guido van Rossum relates the early history of Python with respect to numbers. Implementation used machine integers and machine binary floating point. He explains that int were treated as signed normally but were unsigned in bitwise operations; but long were always signed and this caused problems. The int type could also overflow and raise an exception. Today integers are unbounded.

Jun
2018

Python data model is changed such that the insertion order of items into a dict is preserved. In October 2019, with the release of Python 3.8, dict can be iterated in reverse using built-in function reversed().

Sample Code

  • #------------------------------------------------------------------------------
    # Initializing basic data types
    #------------------------------------------------------------------------------
    # int
    a = 42
     
    # float
    b = 4.2
     
    # complex
    c = 3 + 9j
     
    # bool
    d = False
    d = 2 > 1
     
    # str can be defined within a pair of single or double quotes
    hw  = "Hello world"
    hw = 'Hello world'
     
    # list: inner elements can be of different types
    ll = [1, 4, 6, 'p']
     
    # tuple
    tt = (1, 4, 6, 'p')
     
    # set
    ss = {1, 2, 5, 8, 'l'}
     
    # dict
    rgb = {'red': 45, 'green': 32, 'blue': 22}
     
    # list of tuples
    l_of_t = [(1,2), (5,6), (3,)]
     
     
    #------------------------------------------------------------------------------
    # Strings are immutable. 
    # Assignment creates a new object as seen by calling id() function.
    #------------------------------------------------------------------------------
    s = 'foo'
    print(id(s))
     
    s += 'bar'
    print(id(s))
     
     

References

  1. Atabekov, Farrukh. 2019. "Programming in Python (Udacity)." Medium, October 13. Accessed 2020-07-25.
  2. Au, Eden. 2019. "6 New Features in Python 3.8 for Python Newbies." Towards Data Science, on Medium, January 02. Accessed 2020-01-02.
  3. Batchelder, Ned. 2009. "List vs tuple, when to use each?" StackOverflow, answered on November 10. Accessed 2020-07-25.
  4. Downey, Allen B. 2017. "Think Python: How to Think Like a Computer Scientist." Green Tea Press. Version 2.0.17. Accessed 2017-12-11.
  5. Hettinger, Raymond (ed). 2019. "What’s New In Python 3.8." Python Docs, October 14. Accessed 2020-01-02.
  6. Hunner, Trey. 2016. "The Iterator Protocol: How for Loops Work in Python." December 28. Accessed 2017-12-11.
  7. Mohan, Megha. 2017. "Mutable vs Immutable Objects in Python." Medium. May 25. Accessed 2017-12-11.
  8. PCA. 2018. "Performance of Python Data Structures." December 17. Accessed 2020-07-25.
  9. Pranskevichus, Elvis (ed). 2018. "What’s New In Python 3.7." Python Docs. Accessed 2019-03-19.
  10. Programiz. 2017. "Python Iterators." Accessed 2017-12-11.
  11. Python Docs. 2017a. "The Python Standard Library." V3.5.4. Python Software Foundation. October 4. Accessed 2017-12-11.
  12. Python Docs. 2017b. "Glossary." V3.5.4. Python Software Foundation. October 4. Accessed 2017-12-11.
  13. Rossum, Guido van. 2009. "Early Language Design and Development: From ABC to Python." The History of Python. February 3. Accessed 2017-12-11.
  14. Tagliaferri, Lisa. 2016. "Python 2 vs Python 3: Practical Considerations." Digital Ocean. August 17. Accessed 2017-12-11.
  15. Vickery, James. 2016. "Immutable vs Mutable types." Stack Overflow. November 30. Accessed 2017-04-22.
  16. Wentworth, Peter, Jeffrey Elkner, Allen B. Downey, and Chris Meyers. 2012. "Tuples." Chapter 9 in: Learning with Python 3, How to Think Like a Computer Scientist, October. Accessed 2020-07-25.
  17. WikiBooks. 2017. "Python Programming/Data Types." December 11. Accessed 2017-12-11.
  18. Yee, Ka-Ping and Guido van Rossum. 2001. "PEP 234 -- Iterators." Python Developer's Guide. January 30. Accessed 2017-12-11.
  19. Zadka, Moshe and Guido van Rossum. 2001a. "PEP 237 -- Unifying Long Integers and Integers." Python Developer's Guide. March 11. Accessed 2017-12-11.
  20. Zadka, Moshe and Guido van Rossum. 2001b. "PEP 238 -- Changing the Division Operator." Python Developer's Guide. March 11. Accessed 2017-12-11.
  21. Пе, Максим. 2018. "File:Python 3. The standard type hierarchy.png." Wikimedia Commons, November 2. Accessed 2020-07-25.

Further Reading

  1. Pilgrim, Mark. 2011. "Native Datatypes." Dive Into Python 3. Accessed 2017-12-11.
  2. Cokelaer, Thomas. 2014. "Python Notes (0.14.0)." August 14. Accessed 2017-12-11.
  3. Driessen, Vincent. 2014. "Iterables vs. Iterators vs. Generators." nvie.com. September 25. Accessed 2017-12-11.
  4. Simon. 2009. "Understanding slice notation." StackOverflow, asked on February 03. Accessed 2019-03-19.

Article Stats

Author-wise Stats for Article Edits

Author
No. of Edits
No. of Chats
DevCoins
3
0
2995
2
0
817
7
6
418
1729
Words
27
Likes
41K
Hits

Cite As

Devopedia. 2022. "Python Data Structures." Version 12, February 15. Accessed 2023-11-12. https://devopedia.org/python-data-structures
Contributed by
3 authors


Last updated on
2022-02-15 12:13:26