• Comparing data race vs general race. Source: Joshi 2017, 2:31.

Race condition (software)

Improve this article. Show messages.

Summary

Race condition in software is an undesirable event that can happen when multiple entities access or modify shared resources in a system. The system behaves correctly when these entities use the shared resources as expected. But sometimes due to uncontrollable delays, the sequence of operations may change due to relative timing of events. When this happens, the system may enter a state not designed for and hence fail. The "race" happens because this type of failure is dependent on which entity gains access to the shared resource first.

One definition of race condition describes it as "anomalous behavior due to unexpected critical dependence on the relative timing of events." This definition is broad in the sense that it can apply to both hardware and software race conditions. In software, race condition happens because of some shared resource whose access is not properly controlled by design.

Discussion

  • Can you give one example of race condition?

    Suppose a process calls a function that's supposed to increment a value (v) stored in a database. This operation is not atomic, meaning that it can be broken down into smaller steps. The function will read the current value from database (A), store it in memory as a function variable (B), increment it (C) and finally write the new value to database (D). This function's execution is therefore a sequence of steps A-D.

    Race condition will happen if process P1 calls this function and proceeds to step C. Meanwhile, it gets preempted by the operating system and another process P2 gets its chance to run. P2 calls the same function, completes all the steps A-D and returns. When P1 resumes, it continues from step C using an old value (v), not P2's result (v+1).

    Race condition would not have happened if P1 had completed all steps without preemption; or P2 had been prevented from executing the function until P1 had completed all steps.

  • How is it possible that my one-line code is causing a race condition?

    What appears as a single line in a high-level programming language, will actually translate to multiple assembly instructions that involve multiple clock cycles. The executing process or thread can therefore get interrupted anywhere along this sequence of assembly instructions. Thus, race condition is still possible. Microsoft Support shows on its site how a single line of Visual Basic code such as Total = Total + val1 translates to six assembly instructions for a x86-based processor.

  • What systems are prone to race conditions?

    Because race condition is related to timing, it can happen in any system where multiple processes "race" to access a shared resource. Hence, it's not limited to just real-time systems. Hardware containing multicore CPUs, common in parallel computing, can suffer from race condition since multiple processes are running at the same time on multiple cores or CPUs. In networked systems, the channel can be considered as a shared resource and if two users attempt to access the shared channel race condition will lead to a packet collision. Network links that have large delays, such as satellite links, can cause race condition. Multithreaded applications are prone to race condition.

    Web applications can suffer from race condition since multiple clients are sending requests in parallel to the web server, which may spawn multiple threads or processes to serve the requests. Even for a single user session, race condition is possible. For example, a web chat application may periodically send an AJAX request to check for the latest updates. At the same time, the user may create a new chat message.

  • Are there any best practices for avoiding race condition?

    The usual solution to avoid race condition is to serialize access to the shared resource. If one process gains access first, the resource is "locked" so that other processes have to wait for the resource to become available. Even if the operating system allows other processes to execute, they will get blocked on the resource. This allows the first process to access and update the resource safely. Once done, it will "release" the resource. One of the processes waiting on the resource will now get its chance to access the resource. Code protected this way using locks is called critical section. The idea of granting access to only one process is called mutual exclusion.

    Having large critical sections will affect performance. Such code may be refactored into smaller critical sections. In real-time systems or kernel code, another solution is to turn off interrupts when entering (and turn on when leaving) a critical section.

  • How to detect race condition in a program?

    The first symptom is that results are unpredictable. In fact, the random nature of race condition poses a problem for debugging since under debug mode the race condition may not appear. For this reason, code reviews are recommended to catch race condition. This really implies that best way to avoid race condition is by careful design and not by testing and debugging.

    There are tools that perform dynamic analysis and flag possible race conditions that may occur. Examples include Coverity's Thread Analyzer for Java and Intel Inspector. Clang Thread Safety Analysis is a static analysis tool to detect race condition. ThreadSanitizer is a data race detector written in C++ used by Clang and Go languages.

  • Do real-time operating systems (RTOS) have provisions to mitigate race conditions?

    The common mechanism used to allow safe access to shared resource is called mutex, which is short for mutual exclusion semaphore. Any process entering a critical section will first acquire the mutex. Other processes wanting access to the critical section will get blocked. The owning process will give up the mutex when leaving the critical section.

  • Are there different types of race conditions?
    Comparing data race vs general race. Source: Joshi 2017, 2:31.

    Netzer and Miller make this distinction:

    • General races: Programs designed to be deterministic behaving in nondeterministic manner.
    • Data races: Failure with accessing nonatomic critical sections in nondeterministic programs.

    They also claim that debugging data races is easier since it's local to the critical section. General races require analysis of entire execution to understand exactly how the expected deterministic behaviour deviated.

    A data race need not result in a race condition in the sense that program correctness is not compromised. A race condition that's not a data race implies a general race, for which the accompanying figure is an example.

  • Can you give examples of products that failed due to race conditions?

    Therac-25 was a software-controlled radiation therapy machine used in the 1980s. Six patients were overdosed and some died. Race condition is one of many reasons for the system's failure. Shared variables were accessed by both data entry subroutine and magnet control subroutine. When a race condition occurred, the magnet control subroutine failed to read new settings by the operator.

    In August 2003, Northeastern parts of North America suffered a massive blackout. "There was a couple of processes that were in contention for a common data structure, and through a software coding error in one of the application processes, they were both able to get write access to a data structure at the same time. And that corruption led to the alarm event application getting into an infinite loop and spinning."

    NASA's Mars Spirit Rover suffered a race condition whereby an initialization process could not obtain write access to a variable. An exception occurred. In another case, one an imaging module was attempting to read from memory while a deactivation process was also triggered. Other problems, possibly involving race condition, have been reported.

  • Can race conditions be useful?

    Race conditions are something software designers wish to avoid. But some researchers have attempted to show that they can be used to generate random numbers. This is dependent on the operating system's scheduler, which by itself is not random. Randomness is obtained by triggering context switches based on "execution environment, cache misses, the instruction execution pipeline and the imprecision of the hardware clock used to generate timer interrupts."

References

  1. Clang. 2017a. "Thread Safety Analysis." Clang 5 Documentation. Accessed 2017-03-09.
  2. Clang. 2017b. "ThreadSanitizer." Clang 5 Documentation. Accessed 2017-03-12.
  3. Colesa, A., R. Tudoran, and S. Banescu. 2008. "Software Random Number Generation Based on Race Conditions." 10th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara. pp. 439-444. Accessed 2017-03-08.
  4. Daya. 2012. Principles of Operating Systems. Accessed 2017-03-09.
  5. Golang. 2017. "Data Race Detector." The Go Programming Language. Accessed 2017-03-12.
  6. Javarevisited. 2012. "What is Race Condition in Multithreading – 2 Examples in Java." Javarevisited. February. Accessed 2017-03-09.
  7. Jenkov, Jakob. 2016. "Race Conditions and Critical Sections." Jenkov.com. September 11. Accessed 2017-03-09.
  8. Joshi, Kavya. 2017. "Looking Inside a Race Detector." InfoQ. March 10. Accessed 2017-03-12.
  9. Leveson, Nancy. 1995. "Appendix A: Medical Devices: The Therac-25." Safeware: System Safety and Computers. Addison-Wesley. Accessed 2017-03-10.
  10. Microsoft. 2012. Description of race conditions and deadlocks. Microsoft Support. June 18. Accessed 2017-03-09.
  11. Netzer, Robert H. B., and Barton P. Miller. 1992. "What are race conditions?: Some issues and formalizations." ACM Letters on Programming Languages and Systems 1, no. 1: 74-88. March.
  12. Poulsen, Kevin. 2004. "Tracking the blackout bug." SecurityFocus. April 7. Accessed 2017-03-10.
  13. Regehr, John. 2011. "Race Condition vs. Data Race." Embedded in Academia Blog. March 13. Accessed 2017-03-12.
  14. Somayaji, Anil. 2010. "COMP 3000 Essay 1 2010 Question 6." November 8. Accessed 2017-03-10.
  15. Vaughan, Jack. 2008. Dynamic analysis tool from Coverity looks at concurrency defects. Tech Target. May 07. Accessed 2017-03-09.
  16. Wheeler, David A. 2015. "Avoid Race Conditions." Secure Programming HOWTO. September 19. Accessed 2017-03-08.
  17. Wikipedia. 2017a. Race condition. March 1. Accessed 2017-03-09.
  18. Wikipedia. 2017b. Intel Inspector. January 6. Accessed 2017-03-09.
  19. Wilson, Ron. 2004. "The trouble with Rover is revealed." EE Times. February 20. Accessed 2017-03-10.

Tags

See Also

  • Race condition (hardware)
  • Concurrency
  • Multithreading
  • Critical section
  • Mutual exclusion

Further Reading

  1. Joshi, Kavya. 2017. "Looking Inside a Race Detector." InfoQ. March 10. Accessed 2017-03-12.
  2. OsInfoBlog. 2013. "Critical Regions." June 27. Accessed 2017-03-09.
  3. Regehr, John. 2011. "Race Condition vs. Data Race." Embedded in Academia Blog. March 13. Accessed 2017-03-12.

Top Contributors

Last update: 2017-03-12 07:42:18 by devguru
Creation: 2017-03-08 12:28:39 by devguru

Article Stats

1326
Words
0
Chats
2
Authors
5
Edits
2
Likes
457
Hits

Cite As

Devopedia. 2017. "Race condition (software)." Version 5, March 12. Accessed 2017-11-19. https://devopedia.org/race-condition-software
BETA V0.11.2