Concurrency vs Parallelism

Concurrency vs Parallelism. Source: Defog Tech 2018.
Concurrency vs Parallelism. Source: Defog Tech 2018.

Concurrency and Parallelism are both prominent processing techniques used by the OS when multiple computer processes are pending to be executed by it.

Concurrency implies multiple tasks can be executed in an overlapping time period. So one task may be started before the previous one finishes, but they will not be running at the same instant of time. Instead, the processor will allot time slices per task and switch contexts suitably (multitasking on a single-core machine). Concurrency is hard to implement and debug.

Parallelism describes the ability for independent tasks of a program to be physically executed at the same instant of time. These tasks may run simultaneously: on another processor core, another processor or an entirely separate computer (distributed systems). With ever-increasing demand for computing speed from real-world applications, parallelism is becoming ubiquitous and increasingly affordable.


  • Can you introduce the common terms used in concurrency and parallelism?

    Here are some essential terms to know:

    • Multiprocessing: Two or more CPUs within a single computer system sharing Memory and IO resources.
    • Distributed Computing: Computing system with multiple components (Memory, CPU, IO) located on different machines. Resources are shared. Information is exchanged internally to appear as a single coherent system to the end user.
    • Multicore processor: Combines two or more independent CPUs into one package (Single IC). Examples: Intel Core 2 duo, Qualcomm Snapdragon Octacore processor chips.
    • Multi-threading: When a process runs multiple sub-tasks (threads) to perform various operations at the same time. Individual threads are controlled by a main thread of execution. Data is exchanged using shared memory.
    • Pipelining: Sequential execution of related tasks where output of one task becomes input for the next task in the pipeline. This ordering helps maximise processor utilization.
    • Clock cycle: Speed at which a CPU performs a basic operation. Example: 4 GHz processor has a cycle time of 0.25ns.
    • Time Sharing: Allocation of computer resources in time slots to several programs. Uses CPU scheduling and multi-programming to provide each user with a processor time slice.
  • How does concurrent processing work?
    Concurrent threads. Source: Stafman 2015.
    Concurrent threads. Source: Stafman 2015.

    Goal of concurrency is to minimise CPU idle time. Another process/thread gets CPU allocation while present process/thread is waiting for input/output operation, database transaction or invoking an external application. An interrupt is sent to the active task to pause it. Task-related data stored in RAM is context switched to the new task.

    If two or more tasks are running on a single-core CPU (or on the same core in a multicore processor), they can concurrently access the same resources. Data read operations are safe when done in parallel. Programmers need to manage data integrity during write access.

    Efficient scheduling of tasks is vital in a concurrent system. Popular task scheduling algorithms include FIFO, Shortest Remaining Time, Strict Priority, and Round-Robin.

    In order to prevent task starvation when one task hogs the CPU for too long, interrupts are designed to pause it and give CPU allocation to other tasks. This is called preemptive scheduling. To manage scheduling/dispatching of concurrent tasks, the OS itself needs CPU time like any application program.

    Executing tasks in batches may also minimise idle time. Costly operations such as IO/Memory fetch are done just once per batch.

  • How is parallelism implemented in various computer systems?
    Distributed systems. Source: Pattamsetti 2017.
    Distributed systems. Source: Pattamsetti 2017.

    A computer's performance depends directly on the time required to perform a basic operation and the number of basic operations that can be performed in parallel. Clock cycle of a processor is the minimum time taken to perform a basic operation, which is already near the speed of light in modern computers. Parallelism is therefore necessary for performance gains:

    • Increase internal concurrency in operations at chip level (very expensive VLSI design)
    • Process pipelining
    • Multiple function units (several multipliers, adders managed by one instruction set)
    • Multiple cores (CPUs) in the same computer sharing memory and IO. This makes parallelism possible even in smaller devices like mobile phones, tablets or smart car systems.
    • Distributed systems of multiple independent computers with their own memory and IO. This has become a prevalent and feasible option with greatly improved networking speeds (100 Tbps achieved during research in year 2020).

    With real-world applications becoming more data intensive and new devices getting added to the digital network every day, parallel computing is the way forward. Computer architectures used to sustain this growth are changing radically, from sequential to parallel.

  • Can you explain concurrency and parallelism with an example?
    Illustrating concurrency and parallelism on a 2-core CPU. Source: OpenClassrooms 2020.
    Illustrating concurrency and parallelism on a 2-core CPU. Source: OpenClassrooms 2020.

    A simple example for concurrent processing would be any user interactive application such as a document editor. Every time a function involves user action, the processor cycles are wasted. Instead, control is ceded to some background operation the application may require. When file is being saved or printed (in the background), user can concurrently type.

    The main thread of the word processor spawns several threads when needed – for character typing, saving to disk, despatching to printer, etc. They may execute within a common time frame but they are not really executing in parallel.

    In contrast, a Hadoop-based distributed data processing system is a good example of a parallel processing. It consists of large-scale data processing on clusters using massively parallel processors.

    The parallelization and command control distribution is automatic. A clean abstraction is presented to programmers who can see the whole system as one database. Typical specifications include 40 nodes/rack, 1000-4000 nodes in cluster, 1 Gbps bandwidth in rack, 8 Gbps out of rack. Node specifications include 8-16 cores, 32 GB RAM, 8×1.5 TB disks.

  • How is information shared between processes in both these techniques?

    Every OS process consists of allocated memory for program code, data, and a heap for dynamic memory allocations. Threads spawned within the process have their own call stacks, virtual CPU and local storage. But they share the application's heap, data, codebase and resources (such as file handles) with other threads in the same process.

    Threads share data and code while processes don't. The stack is not shared for both.

    Static data is shared between threads from common shared memory. Other data is shared through arguments/parameters and references, as long as the data is allocated. All allocated storage, unless freed explicitly, is not freed until program termination. Data serialization is the user's responsibility.

    All files are shared between threads. If a thread (other than main) opens a file, it must be closed before that thread terminates.

    CPU parallelism in multi-core systems is different from distributed systems because of shared memory availability. In distributed systems, Message Passing Interface (MPI) allows communication of information between various nodes and clusters. It's commonly used in High Performance Computing (HPC) systems. It may be used in multi-core systems as well, thus presenting a unified abstraction.

  • How are the concurrent and parallel processes scheduled for execution?
    Concurrency Runtime Architecture Source: Microsoft Docs 2018.
    Concurrency Runtime Architecture Source: Microsoft Docs 2018.

    Basic sequence of concurrent process execution is more or less similar for OS tasks and application program tasks. Initially, a bootstrap program (such as init or main thread) starts execution. Then a software or hardware interrupt is raised to request a new thread of execution. Once interrupted, CPU stops what it's doing and immediately transfers execution to the starting address of the interrupting service routine. Return addresses of interrupted threads are stored in a system stack for resuming execution later.

    The Task Scheduler schedules and coordinates concurrent tasks at runtime. Combination of pre-emptive and cooperative scheduling algorithms are used for task sequencing. Goal is to achieve maximum usage of processing resources. Higher priority tasks are allowed to 'steal' CPU resources from other threads. Sometimes threads simply timeout and cede their CPU control. Threads may also keep polling to check for free CPU availability.

    In modern parallel processing systems, the NIC supports multiple queues, so thread states don't need to be transferred between cores/CPUs. One thread per core will receive, multiple threads per core will process. Event-based programming and asynchronous IO access are used.

  • What are the prevalent parallel processing techniques in today's data intensive real-world systems?

    The key to implementing parallel computing systems is designing sub-tasks and operations independent of each other. Except for start and end of each task, there should be no data dependency. Tasks involving a lot of data transfer/communication between them are difficult to parallelize.

    Parallel computing may be done using three different kinds of hardware platforms - multicore systems, clusters and graphics processing units (GPUs). Parallelisation at the software deployment level happens using containerised applications and virtualization techniques.

    For instance, Kubernetes supports allocating different amounts of memory and CPU for each container. So complex multi-threaded algorithms (like XGBoost) can get more resources compared to single-threaded algorithms (such as LogisticRegression from sklearn). We can make the most of our resources, minimise cost and run more pipelines simultaneously.

    A popular parallel programming model for processing big datasets is MapReduce. Blocks of data are processed in parallel via these steps: Read data in blocks → Map them as key-value pairs → Reduce the data by performing summary operations → Write the reduced data into storage systems.

  • What languages provide for concurrency and parallelism?

    Concurrent programming languages use the concept of simultaneously executing processes or threads as a means of structuring a program. A parallel language allows programming constructs executable on more than one processor. Concurrency programming techniques may be used in parallel programs too but not a must. The taxonomy of parallelism involves instruction stream and data stream.

    The three main components in these languages are:

    • Performance computing (concurrent execution for CPU resource optimization)
    • Distributed computing (parallel CPUs to be controlled and utilized)
    • Systems programming (basic OS/hardware management)

    The different languages can be categorised as follows:

    • Object-Oriented Parallelism: Presto, Orca, Nexus, Java, High Performance C++
    • Shared Memory Languages: Linda, Orca, SDL, PThreads, Java
    • Distributed Memory Paradigms: Parallel Virtual Machine (PVM), MPI, Concurrent C, Ada, Fortran M, C*
    • Message Passing: Rust, SmallTalk, Go
    • Parallel Logic Systems: Concurrent Prolog, GHC, Strand
    • Parallel Functional Languages: LISP, MultiLISP, SISAL
    • Frameworks and APIs: Apache Hadoop, Spark, Beam, OpenCL

    In some cases, concurrency or parallelism features may be provided as a library extension and not part of the main language (Eg. POSIX thread library).



An academic study of concurrent algorithms starts as early as the 1960s, with Dijkstra (1965) credited with publishing the first paper in this field, identifying and solving mutual exclusion.


A new kind of parallel computing is launched by the Caltech Concurrent Computation project by building a supercomputer for scientific applications from 64 Intel 8086/8087 processors. These massively parallel processors (MPPs) dominate the top end of computing.


Mathematical models to analyse concurrent systems performance and complexity in the form of Petri nets are published in a critical research paper by Esparza and Nielsen.


NVIDIA invents the Graphics Processing Unit (GPU) as a symmetric multicore parallel processor separate from the CPU.


IBM introduces the first multicore processor VLSI chip with two 64-bit microprocessors comprising of more than 170 million transistors.


Intel adds support for simultaneous multithreading to the Pentium 4 processor, under the name hyper-threading. Until then most desktop computers had only one single-core CPU, with no support for hardware threads.


Parallel data processing in large clusters becomes possible using the MapReduce parallel programming model.


Container management platforms such as Kubernetes enter the market specialising in handling clusters for scalable cloud-based solutions.

Fault tolerance requires knowledge of distributed/parallel/concurrent computing. Source: Armstrong 2014, 6:12.
Fault tolerance requires knowledge of distributed/parallel/concurrent computing. Source: Armstrong 2014, 6:12.

At a Stanford seminar, Joe Armstrong, the creator of Erlang programming language, notes that engineering fault tolerant and scalable systems require understanding of distributed, parallel, concurrent computing.


  1. Alwaaly, Mazin. 2017. "Computer architecture multi core processor." On SlideShare, presented at a seminar at Mustansiriya University, September 27. Accessed 2021-05-09.
  2. Armstrong, Joe. 2014. "Faults, Scaling, and Erlang Concurrency." Stanford Online, on YouTube, September 26. Accessed 2021-06-27.
  3. Aspen Systems. 2020. "HPC MPI Libraries." Aspen Systems. Accessed 2021-05-09.
  4. Chim, Nick. 2021. "Concurrency vs Parallelism: What's the Difference?" Blog, LoginRadius Inc., February 19. Accessed 2021-05-09.
  5. Chuan, Chua Hock. 2012. "Java Programming Tutorial: Multithreading and Concurrent Programming." Nanyang Technological University, April. Accessed 2021-05-09.
  6. Defog Tech. 2018. "Concurrency vs Parallelism." Defog Tech, on YouTube, November 10. Accessed 2021-05-09.
  7. Foster, Ian. 1995. "1.1 Parallelism and Computing." In: Designing and Building Parallel Programs, v1.3, Addison-Wesley Inc. Accessed 2021-05-09.
  8. Geebu, Ruchitha. 2021. "History and Advantages of Hadoop MapReduce Programming." MindMajix, Appmajix Technologies Private Limited, January 29. Accessed 2021-05-09.
  9. Gibb, Robert. 2019. "What is a Distributed System?" Blog, StackPath, July 26. Accessed 2021-05-09.
  10. IBM. 2012. "Power 4: The First Multi-Core, 1GHz Processor." Icons of Progress, IBM Corporation, March 7. Accessed 2021-05-09.
  11. IBM. 2021. "Sharing files between threads." In: Enterprise PL/I for z/OS, v5.3, IBM Corporation. Accessed 2021-05-09.
  12. Intel. 2008. "Parallel Computing: Background." Universal Parallel Computing Research Centers, Intel, March 18. Accessed 2021-05-09.
  13. JavaTpoint. 2021. "Data Flow In MapReduce." JavaTpoint. Accessed 2021-05-09.
  14. Khushalani, Prateek and Victor Robin. 2018. "Parallel Processing of Machine Learning Algorithms." dunnhumby Data Science & Engineering, on Medium, September 25. Accessed 2021-05-09.
  15. MIT. 2014. "Reading 17: Concurrency." In: 6.005 — Software Construction, MIT EECS. Accessed 2021-05-09.
  16. Matloff, Norman. 2016. "Parallel Computing for Data Science: With Examples in R, C++ and CUDA." The R Series, Chapman and Hall/CRC, December 18. Accessed 2021-05-09.
  17. Microsoft Docs. 2016. "Task Scheduler (Concurrency Runtime)." Microsoft C++, C, and Assembler documentation, Microsoft, November 4. Accessed 2021-05-09.
  18. Microsoft Docs. 2018. "Overview of the Concurrency Runtime." Microsoft C++, C, and Assembler documentation, Microsoft, November 19. Accessed 2021-05-09.
  19. Neha, T. 2019. "Pipelining in Computer Architecture." Binary Terms, September 06. Accessed 2021-05-09.
  20. Nilesh. 2020. "A Data Scientist's Intro to Parallel Computing With Dask." Towards Data Science, on Medium, January 25. Accessed 2021-05-09.
  21. OpenClassrooms. 2020. "Identify the Advantages of Concurrency and Parallelism." In: Scale Up Your Code With Java Concurrency, OpenClassrooms, November 12. Accessed 2021-06-17.
  22. Pattamsetti, Raja Malleswara Rao. 2017. "Distributed computing." In: Distributed Computing in Java 9, Packt Publishing Limited, June. Accessed 2021-05-09.
  23. Pinck, Alan. 2003. "Operating Systems : Overview of Concurrent Processing." In: DAT2343 - Computer Systems Architecture, Algonquin College, May 16. Accessed 2021-05-09.
  24. Shute, Gary. 2021. "The Performance Equation." University of Minnesota. Accessed 2021-05-09.
  25. Silberschatz, Abraham, Peter Baer Galvin, and Greg Gagne. 2005. "Operating System Concepts." Seventh Edition, John Wiley and Sons. Accessed 2021-05-09.
  26. Stafman, Logan. 2015. "Designing for Performance: Concurrency and Parallelism." In: COS 518: Computer Systems, Princeton University. Accessed 2021-05-09.
  27. Stojanović, Saša, Dragan Bojić, and Miroslav Bojović. 2015. "Chapter One - An Overview of Selected Heterogeneous and Reconfigurable Architectures." In: Advances in Computers, vol. 96, pp. 1-45, Elsevier B V. doi 10.1016/bs.adcom.2014.11.003. Accessed 2021-05-09.
  28. Talia, Domenico. 1998. "Recent Advances in Parallel Programming Tools and Languages." ISI-CNR Italy, December. Accessed 2021-05-09.
  29. Thakur, Dinesh. 2013a. "Time Sharing Operating System." Computer Notes, April 3. Updated 2020-08-10. Accessed 2021-05-09.
  30. Thakur, Dinesh. 2013b. "Definition Multiprocessor Operating System." Computer Notes, April 4. Updated 2020-08-10. Accessed 2021-05-09.
  31. Triangles. 2019. "A gentle introduction to multithreading." Internal Pointers, March 06. Accessed 2021-05-09.
  32. Wikipedia. 2021. "List of concurrent and parallel programming languages." Wikipedia, April 16. Accessed 2021-05-09.
  33. Wikipedia. 2021b. "Thread (computing)." Wikipedia, April 26. Accessed 2021-05-09.
  34. Wikipedia. 2021c. "Kubernetes." Wikipedia, May 28. Accessed 2021-05-09.
  35. Wikipedia. 2021d. "Concurrent computing." Wikipedia, June 4. Accessed 2021-05-09.
  36. Wikipedia. 2021e. "Petri nets." Wikipedia, June 13. Accessed 2021-05-09.
  37. Yang, Tao. 2017. "Parallel Data Processing with Hadoop/MapReduce." In: CS 240A: Applied Parallel Computing, UC Santa Barbara. Accessed 2021-05-09.

Further Reading

  1. Stafman, Logan. 2015. "Designing for Performance: Concurrency and Parallelism." In: COS 518: Computer Systems, Princeton University. Accessed 2021-05-09.
  2. Silberschatz, Abraham, Peter Baer Galvin, and Greg Gagne. 2005. "Operating System Concepts." Seventh Edition, John Wiley and Sons. Accessed 2021-05-09.
  3. MIT. 2014. "Reading 17: Concurrency." In: 6.005 — Software Construction, MIT EECS. Accessed 2021-05-09.
  4. BrightDigit. 2019. "Asynchronous Multi-Threaded Parallel World of Swift." Learning Swift, BrightDigit, September 25. Updated 2020-04-30. Accessed 2021-06-17.
  5. OpenClassrooms. 2020. "Identify the Advantages of Concurrency and Parallelism." In: Scale Up Your Code With Java Concurrency, OpenClassrooms, November 12. Accessed 2021-06-17.

Article Stats

Author-wise Stats for Article Edits

No. of Edits
No. of Chats

Cite As

Devopedia. 2021. "Concurrency vs Parallelism." Version 6, June 27. Accessed 2024-06-25.
Contributed by
2 authors

Last updated on
2021-06-27 06:24:42

Improve this article
  • Multiprocessing vs Multithreading
  • Central Processing Unit
  • Deadlock
  • Distributed Computing
  • MapReduce
  • Concurrency Models

Article Warnings

  • In References, replace these sub-standard sources:
  • Readability score of this article is below 50 (44.3). Use shorter sentences. Use simpler words.