Data Structures
- Summary
-
Discussion
- What are the types of data structures?
- What are the essential characteristics of data structures?
- What is the time and space complexity of operations on data structures?
- What are the implementation aspects of data structures?
- Could you explain arrays and linked lists?
- What are the differences between stacks and queues?
- Could you compare tree and graph data structures?
- What are ADT and CDT in the context of data structures?
- When to use which data structure?
- What are the applications of data structures?
- What data structures are provided by programming languages out of the box?
- Milestones
- References
- Further Reading
- Article Stats
- Cite As
In today's data-driven world, we need to learn to manage data and perform operations on them to yield solutions to everyday problems. Data structures provide an efficient way to store and organize the data. They enable traversal, sorting, and searching through the data in the minimum time possible.
Data structures are the basic building blocks of a program. Choosing the right ones is important. Many languages provide built-in data structures that have their own properties. Programmers need to pick them carefully.
Historically, data structures were created when it became difficult to process and search through large amounts of data, and when it was hard to concurrently process data.
Discussion
-
What are the types of data structures? Data structures can be broadly classified into linear and non-linear.
In linear data structures data is accessed in a linear order. Examples include array, linked list, stack and queue. Array is a collection of values that are stored sequentially. Linked list is a collection of data stored in non-contiguous locations connected by pointers. Stack is similar to a stack of plates. Insertion and deletion happen only at one end. Queue is similar to people waiting for their turn in a line. It has different ends for insertion and deletion.
Non-linear data structures store data in a non-linear order. Examples include tree and graph. Trees exhibit a hierarchical structure starting from the root node to the leaf node. Graph is a collection of nodes called vertices connected by links called edges.
-
What are the essential characteristics of data structures? - Structure: Linear or non-linear. Arrays and linked lists are linear since every item has a unique predecessor and successor (except first and last items). Trees, graphs and sets are non-linear.
- Access: Direct (aka random) or sequential. With arrays, items can be accessed directly via an index or key. With linked lists or graphs, an item is accessed by traversing preceding items.
- Data Type: Homogeneous or heterogeneous. Arrays often contain items of the same data type. Heterogeneous data structures contain items of different data types/structures. Classes in OOP are heterogeneous.
- Abstraction: Concrete or abstract. Concrete ones expose the implementation. Abstract ones define the interface but hide the implementation.
- Uniqueness: In sets, each item is unique while other data structures usually allow duplicates.
- Operations: Items can typically be added, removed, replaced or sorted in mutable data structures. Some operations may be constrained in some data structures. For instance, in static arrays items are not added or removed. Items are only appended at the end to queues. Sets have no order and hence items can't be sorted.
-
What is the time and space complexity of operations on data structures? Common operations on data structure include:
- Access: Read or write of an item requires access. If access is sequential via other items, we call it traversing the data structure.
- Insert: Add another item to the data structure.
- Delete: Remove a specified item from the data structure.
- Search: Attempt to find a particular item in the data structure. Basic searching algorithms include linear search, binary search, etc.
- Sort: Some data can store items in a particular order. Sorting algorithms include selection sort and insertion sort. Sorting a data structure in advance may speed up searches.
The choice of a data structure usually depends on the algorithm and most common operations on that data structure. If fast direct access is needed, arrays are preferred but not linked lists. If search is common, hash tables are preferred. If middle insertions/deletions are common, single/double linked lists are preferred but not arrays. Stacks and queues have same performance as linked lists since they're often implemented from the latter.
-
What are the implementation aspects of data structures? Memory allocation for data structures can be static or dynamic. Static memory allocation is usually done at compile time and stack memory is used. Memory is freed when variable goes out of scope. With dynamic allocation, heap memory is used. Its size can grow or shrink during the lifetime of the program.
With arrays, items are stored contiguously in memory. Hence access is direct. Cache locality improves performance. With linked lists, items are not stored contiguously in memory. Access is sequential via pointers that take up extra space. In C++,
std::vector
is stored contiguously but notstd::list
.Mutability is another aspect. A data structure variable that can't be modified after initialization is immutable. Python's
tuple
and C++'sconst struct
are examples but any variable can be used in an immutable context. In functional programming, it's customary to accept an input data structure and return a modified one without changing the input. -
Could you explain arrays and linked lists? Array is defined as a list of values stored at contiguous locations. Its elements are accessed using index, i.e. position of elements. The base address is the address of the first element. There are two types of arrays, one-dimensional array and multi-dimensional array. Two-dimensional arrays are similar to matrices in mathematics. Arrays are used to implement stacks, queues, etc.
Linked list is a collection of nodes connected by links. Each node contains data and a pointer to the next node. They occupy non-sequential locations in memory, which is an advantage over arrays that need a chunk of contiguous space. The starting node is mostly called the head node, and the last node points to NULL.
There are many types of linked lists, some of which include,
- Single linked list: Starts from one end and ends at NULL, with next pointer at each node pointing to the next node.
- Doubly linked list: In addition to the next pointer, each node has a previous pointer that points to the previous node.
- Circular linked list: The last node connects to the head node instead of NULL.
-
What are the differences between stacks and queues? Stack works on Last-In, First-Out (LIFO) principle where the element last inserted is deleted first. It has only one end called top where both insertion and deletion happen. Inserting an element in the stack is termed as a push operation where the top value is incremented. Deleting the element is termed as a pop operation where the top value is decremented.
Queue works on First-In, First-Out (FIFO) principle where the element first inserted is deleted first. It has two ends called front where deletion happens and rear where deletion is performed. Inserting an element into a queue is termed as an enqueue operation, which increments rear value. Deletion of an element is called dequeue operation, which increments the front value.
If there's no more space left and an element is inserted into stack or queue, overflow occurs. If there's no element in the stack or queue and deletion is performed, underflow condition occurs.
Both stacks and queues can be implemented using arrays or linked lists. Stacks find application in recursion problems, whereas queues are used in scheduling algorithms.
-
Could you compare tree and graph data structures? Both trees and graphs contain nodes and edges. Node is an entity that contains data. Nodes are also called vertices in graphs. Edges connect adjacent nodes, called adjacency in graphs.
A tree with \(n\) nodes has \(n-1\) edges. Path is the sequence of nodes and edges from one node to another. Height of the tree is the path length from the root node to the deepest leaf node. In a graph, the degree of a vertex is the number of edges at that vertex.
Tree nodes are ordered but graph nodes are not. Tree has a root node and therefore a hierarchy is implied. Graph has no root node or hierarchy. In a tree, there's only one path between any two nodes and no loops. Graphs can contain loops and any pair of nodes may have multiple paths.
Folders and files in computer file system are tree nodes. Facebook users and tags are graph nodes and their connections to other friends are the edges. Basically, tree is a minimally connected graph to represent a hierarchical model whereas graph represents a network model.
-
What are ADT and CDT in the context of data structures? An Abstract Data Type (ADT) defines the data and the operations that can be performed on that data without specifying implementation details. Stacks and queues are examples of ADT. Developers use ADTs without bothering about the internal implementation of the operations being performed with the data. We call
push()
to insert new data andpop()
to delete data, but don't care how these functions work. The idea of hiding the complexities and exposing only the essentials is the essence of abstraction.A Concrete Data Type (CDT) implements a simple way to organize data. The implementation is not hidden. Developers need to be aware of the implementation to operate on the data structure. CDTs include arrays, records, linked lists, trees and graphs. They can be combined or specialized to form other data structures. CDTs can be used to implement ADTs. For example, a linked list or a linear array may be used to implement stacks and queues.
It's possible to abstract CDTs into ADTs. For instance, arrays and records can be defined as ADTs so that implementation details are hidden.
-
When to use which data structure? Among the known data structures, the correct data structure is the one that is efficient and is well suited to the problem. For that, one needs to know both strengths and limitations of a particular data structure.
Array should be used if the amount of data is known beforehand, as it uses less memory and also offers the advantage of random access. It shouldn't be used if insertion and deletion of data is frequently involved. Linked list is used when constant insertion/deletion time is required and random access of elements is not needed.
Stack is used mainly for the recursion and backtracking problems, whereas queue is used when data is to be added and deleted from both ends.
Trees are more memory efficient and can be used when more than one value has to be searched but require already sorted data. Graphs are mainly used in problems that involve finding the shortest paths, solving the maze game, networking, etc.
-
What are the applications of data structures? Two-dimensional arrays (aka matrices) are used in image processing, mobile contact lists and online ticket booking systems.
Linked lists are used for traversing images in image viewers. A user's webpage history is saved using a linked list. Music players uses a circular list to play all selected songs in a loop.
Stacks are used for converting infix expression to postfix expression and vice versa. Undo and Redo functions in any editor uses the stack. While writing a program, parentheses are matched using a stack.
Queues are used by call centers to schedule calls. Requests to shared resources such as printers are managed with queues. CPU interrupts are handled using queues.
Trees are used in database indexing, game development, and many decision-making tasks in machine learning. Pathfinding algorithms also utilize trees.
Graphs are used in most social media platforms where every user is a node, and to find the shortest path between two users. Shopping websites use graphs to represent user preferences.
-
What data structures are provided by programming languages out of the box? The basic concepts of data structures are the same across all languages, though with different implementations.
Array exists in most languages. Some index it starting from zero and some from one. In C++,
std::vector
is the implementation of the dynamic array. Java usesArrayList
whereas Python useslist
.Linked list is implemented using a
std::list
container in C++. In Java, theLinkedList
class implements theList
interface.Stack is implemented as
Stack
class in Java. C++ hasStack
in its template library.Queue exists as a module in Python.
LinkedList
class in Java offers aQueue
interface. C++ hasstd::queue
for queues.Trees and graphs don't exist explicitly in any language but can be implemented using fundamental types. For example, a tree is a kind of graph and a linked list is also a type of graph.
Milestones
John von Neumann uses arrays when implementing the merge sort algorithm. The first digital computers during this period use arrays for data tables and vector or matrix operations. Subsequently, the early languages of the 1950s and 1960s such as Fortran, Algol, COBOL and BASIC use one-indexed arrays with support for multidimensional arrays. A later version of BASIC in 1980 uses zero-indexed arrays.
Klaus Samelson and Friedrich L. Bauer introduce the stack data structure. They subsequently patent it in 1957. This invention comes about while creating a translator for ALGOL programs. A program is essentially a sequence of symbols but processing them is complex due to operator precedence and the use of parentheses. There's a need to evaluate some symbols later even when they appear earlier in the sequence. It's in this context that the stack data structure is invented.
Allen Newell, Herbert A. Simon and J.C Shaw invent and implement linked lists and list processing for the development of artificial intelligence programs. List processing shows that programs need not have a fixed memory structure, that data structures can be chosen to suit the problem independent of hardware, and that symbol manipulation is the essence of computing. In 1959-60, this inspires McCarthy to create LISP to realize the full power of list processing. By early 1960s, some languages use linked lists as their primary data structure.
In an attempt to improve time and space performance for processing data on magnetic tapes, Windley proposes the tree data structure. He defines tree as "a network of points arranged in levels" with the root point at the bottom and branches at higher levels. He shows that rearranging data as trees rather than by merging subsequently improves performance on data processing tasks. In 1961, Iverson at the IBM Research Center publishes a notation for trees.
Tarjan analyzes the performance of find and union operations on disjoint sets. Find operations are interspersed with union operations. Sets are represented as trees. He shows that when designing data structures, focusing on worst case running time is less important than amortized running time, that is, performance averaged over operations on a long sequence of input. He later invents the data structures splay tree (1980) and Fibonacci heap (1985), both of which achieve good amortized performance.
References
- Adams, Barry G. 2008. "Introduction to Computer Science: an object-oriented approach using Java 5." BlueJ and BeanShell Edition, September. Accessed 2022-02-18.
- Bhargava, Aditya Y. 2016. "Grokking Algorithms." Manning Publications. Accessed 2022-02-18.
- BigOCheatSheet. 2016. "Know Thy Complexities!" BigOCheatSheet, August 23. Accessed 2022-02-18.
- Bullinaria, John. 2019. "Data Structures and Algorithms." Lecture notes, University of Birmingham, March 27. Accessed 2022-02-18.
- Careerdrill. 2015. "Choosing the Right Data Structure to solve problems." Blog, Careerdrill, September 11. Updated 2020-05-16. Accessed 2022-01-29.
- Carey, John, Shreyans Doshi, and Payas Rajan. 2019. "Contiguous Versus Linked Data Structures." In Chapter 1 of: C++ Data Structures and Algorithm Design Principles, Packt, October. Accessed 2022-02-18.
- DTET. 2014. "Data Structure." EDUSAT Learning Resource Material, Directorate of Technical Education & Training, Odisha. Accessed 2022-02-03.
- Dix, Alan. 2021. "A brief history of array indices — making programs that fit people." Blog, July 27. Accessed 2022-02-18.
- Elgabry, Omar. 2016. "Data Structures — Language Support (Part 3)." On Medium, November 15. Accessed 2022-01-29.
- FACE Prep. 2020. "Difference between Linked List and Arrays." FACE Prep, Focus 4D Career Education Pvt. Ltd., March 7. Accessed 2022-01-29.
- Gunawardena, Ananda (ed). 2007. "Tree Data Structure." Section 4.1 in: 15-111 Intermediate and Advanced Programming, Carnegie Mellon University, October 22. Accessed 2022-01-29.
- Heyck, Hunter. 2022. "Herbert ("Herb") Alexander Simon." Citation, Turing Award 1975, Association for Computing Machinery. Accessed 2022-02-19.
- Karumanchi, Narasimha. 2017. "Data Structures and Algorithms Made Easy." 5th edition, CareerMonk Publications. Accessed 2022-02-18.
- King, V. 2022. "Robert (Bob) Endre Tarjan." Citation, Turing Award 1986, Association for Computing Machinery. Accessed 2022-02-19.
- Mount, Dave. 2001. "CMSC 420: Data Structures." Lecture notes, The University of Maryland. Accessed 2022-02-03.
- Newell, Allen and Herbert A. Simon. 2007. "Computer Science as Empirical Inquiry: Symbols and Search." ACM Turing Award Lectures, Association for Computing Machinery. Accessed 2022-02-18.
- Pandav, Rohan. 2021. "Applications of data structure." Geek Culture, on Medium, April 14. Updated 2021-04-14. Accessed 2022-01-29.
- Rey, Marta. 2021. "When to use Which Data Structure – Top 6 Data Structures." Hello Code Club, March 27. Accessed 2022-02-03.
- Rhodes, Loren K. 2021. "Introduction to Data Structures and Arrays." Juniata College, October 2. Accessed 2022-02-18.
- Salton, Gerard. 1962. "Manipulation of trees in information retrieval." Commun. ACM, vol. 5, no. 2, pp. 103-114, February. doi: 10.1145/366792.366828. Accessed 2022-02-19.
- Samelson, K., and F.L. Bauer. 1960. "Sequential formula translation." Communications of the ACM, vol. 3, no. 2, pp. 76-83. doi: 10.1145/366959.366968. Accessed 2022-02-19.
- Software Testing Help. 2019. "Introduction To Data Structures In C++." August 8, Software Testing Help. Updated 2022-02-03. Accessed 2022-01-28.
- Tarjan, R. E. 1975. "Efficiency of a Good But Not Linear Set Union Algorithm." Journal of the ACM, vol. 22, no. 2, pp. 215–225, April. doi: 10.1145/321879.321884. Accessed 2022-02-19.
- Tech Differences. 2018. "Difference Between Tree and Graph." Tech Differences, March 19. Updated 2019-12-17. Accessed 2022-02-04.
- Thareja, Reema. 2014. "Data Structures Using C." Second edition, Oxford University Press. Accessed 2022-02-18.
- Thulin, Frederick. 2003. "Stacks, Recursion and Backtracking." Chapter 5 in: Chang, Shi-Kuo (ed), Data Structures and Algorithms, Series on Software Engineering and Knowledge Engineering, vo. 13, World Scientific, pp. 71-101. Accessed 2022-02-18.
- Trironnarith, Thanirin. 2018. " Data Structure: Stack and Queue." DEV, September 13. Updated 2020-02-13. Accessed 2022-01-29.
- Tsiknis, George K. 2007. "Unit 3: Concrete Data Types." CICS 216, University of British Columbia. Accessed 2022-02-03.
- Weddell, Grant. 2010. "Data Types and Structures." Slides for CS234, Univ. of Waterloo, May 17. Accessed 2022-02-19.
- Wicht, Baptiste. 2012. "C++ benchmark – std::vector VS std::list VS std::deque." Blog, December 3. Accessed 2022-02-19.
- Wikipedia. 2022a. "Stack (abstract data type)." Wikipedia, February 6. Accessed 2022-02-18.
- Wikipedia. 2022b. "Linked list." Wikipedia, February 10. Accessed 2022-02-18.
- Wikipedia. 2022c. "Array data structure." Wikipedia, February 10. Accessed 2022-02-18.
- Windley, P. F. 1960. "Trees, Forests and Rearranging." The Computer Journal, vol. 3, no. 2, pp. 84-88. doi: 10.1093/comjnl/3.2.84. Accessed 2022-02-19.
- Yang, Xiaoyun. 2018. "When To Use Immutable Data Structures." Blog, December 7. Accessed 2022-02-18.
Further Reading
- Cormen, Thomas. 2017. "Introduction to Algorithms." The MIT Press.
- Skiena, Steven. 2017. "The Algorithm Design Manual." Springer.
- Bhargava, Aditya Y. 2016. "Grokking Algorithms." Manning Publications. Accessed 2022-02-18.
- Padmaja, B. 2017. "Data Structures." Lecture notes, Institute of Aeronautical Engineering. Accessed 2022-01-29.
- Bullinaria, John. 2019. "Data Structures and Algorithms." Lecture notes, University of Birmingham, March 27. Accessed 2022-02-18.
Article Stats
Cite As
See Also
- Stack (Data Structure)
- Queue (Data Structure)
- Tree (Data Structure)
- Graph (Data Structure)
- Algorithm
- Algorithmic Complexity