# Binary search

## Summary

Binary search is an algorithm for efficiently searching for an item within a sorted list of items. The search is performed in steps, with each step reducing the search space by half (with linear or sequential search the search space reduces by only one item). At each step, the midpoint is selected and compared against the search value. If the midpoint is lesser than the search value, only the upper half is retained for the next step.

When formulated in a more abstract way, binary search can be used to solve optimization problems with the use of binary predicates.^{} Whenever the predicate is evaluated, the algorithm will retain only half the search space where the predicate is true. In this case, binary search can be seen as "searching" for an optimum solution rather than searching for a specific item.

## Discussion

How can we derive the complexity of binary search? Given a list of \(n\) items, linear search is of complexity \(O(n)\). Binary search has a complexity of only \(O(log_{2}n)\).

^{}For comparison, linear search on a list of 1000 items may take on average 500 steps and 1000 steps in the worst case. The same with binary search will only take about \(log_{2}1000\approx10\) steps.A video tutorial by GeeksQuiz explains how to derive this complexity.

^{}Briefly, since each step reduces the search space by half, and given that \(k\) steps are required, this implies \(n=2^k\), which translates to \(k=log_{2}n\). Binary search is said to take the divide-and-conquer approach.^{}Computational complexity described above is in terms of number of steps to complete the search. An optimized implementation would use only one comparison per step.

^{}Algorithms also take up memory to manage operations. The space complexity of binary search is \(O(1)\), which means that it is constant and independent of \(n\).^{}What are the prerequisites for applying binary search? Binary search can be applied only under the following conditions:

- List is sorted: Since half the search space is discarded based on a midpoint value comparison, binary search works only on a sorted list.
- Random access is possible: Binary search maintains the boundaries of the search space in terms of left and right indices. Midpoint is calculated based on these indices. To access the value of the current midpoint item, the item must be accessible by its index. This works nicely with data structures that support indexing such as arrays (C, Java) and lists (Python). Using binary search on a data structures such as a linked list is wasteful since we need to traverse the list item by item to get to the midpoint.
- Size: Size of the search space must be known.

If my array is unsorted, what search should I use? There are two approaches: sort the array first and then apply binary search; or simply apply linear search. The question is which of these two is faster. Let's use a fast sorting algorithm such as mergesort. This has a complexity of \(O(n\,log_{2}n)\). Linear search has a complexity of \(O(n)\). Since \(O(n\,log_{2}n) > O(n)\), to use linear search is better.

This also implies that if your application does frequent updates to the list, avoid binary search since keeping the list in sorted order can reduce performance.

^{}Can we do a 3-way or even n-ary search for faster results? Every comparison is between two alternatives, hence binary in nature. To decide among three alternatives we would need two comparisons anyway. A binary search is therefore superior since with two comparisons it can handle \(2^{2}=4\) alternatives. The same argument works for n-ary searches. An alternative explanation is available from Geeks For Geeks.

^{}When the distribution of data is known and the lists are large, then interpolation search can do better than binary search.

^{}Hash tables are faster because they maintain a mapping of key-to-value. However, binary search is more versatile since they can be used to solve optimization problems as well.^{}Can you give an example of binary search for solving optimization problems? TopCoder gives an example called FairWorkload.

^{}There are \(n\) filing cabinets each with a variable number of folders and \(k\) available workers to process the folders. Each worker must be assigned one or more contiguous filing cabinets to process. Under the constraint of worker availability, how many workers do we need so that workload is most evenly distributed while minimizing the number of folders each worker processes? This can be solved using binary search. The search space is the number of folders per worker.The predicate can be stated as, "Can the workload be spread so that each worker has to examine no more than x folders, with the limited number of workers available?"

^{}With each step, we can calculate the number of workers required for a certain maximum number of files per worker. If this number is less than or equal to \(k\), it implies that we can move the upper limit to the current midpoint.Another example is to find the minimum of a mathematical function.

^{}Another example is to find the square root of a real number.^{}How is binary search different from binary search tree? Binary search is an algorithm. Binary search tree is a data structure. Other than this fundamental difference, both share the same principle in terms of searching for an item. Search complexity is logarithmic in time. With binary search tree, each node in the tree has a left and a right branch for items of lesser and greater value respectively. This means that when the tree is searched, half the search space is discarded either to the left or right. However, a tree that's not balanced can result in worse performance. Even a balanced binary search tree is not necessarily optimal for searching compared to binary search.

^{}Unlike binary search, binary search trees can't be used to solve optimization problems. Binary search trees also take up more space compared to sorted arrays. Binary search trees have the advantage that insertion and deletion of items are also logarithmic in time, unlike sorted arrays where such operations are of complexity \(O(n)\).

Should we implement binary search iteratively or recursively? For algorithms in general, there's no general rule that one is better than the other.

^{}Sandeep Gupta shows an example where recursive implementation is easier to read.^{}With respect to binary search, for large arrays, recursive code can lead to stack overflows. Likewise, recursive code can be slower due to the extra function calls. However, if the code can be refactored to tail recursive, speed and stack problems can be overcome with tail-call optimization.

^{}Which one to adopt might ultimately be a matter of personal choice for developers. Adopt one that's easier for you to understand. There's nothing worse that a recursive code that doesn't have a proper terminating condition and hence goes into infinite recursion.

^{}What are the variations of binary search? Variations include the following:

^{}- Uniform binary search
^{} - Boundary search
- Fibonacci search
- Exponential search
- Interpolation search
^{} - Fractional cascading

- Uniform binary search
What are the usual challenges for implementing binary search? Typical binary search maintains the left and right indices of the search space and calculates the midpoint. For the next step, these indices and midpoints are adjusted. If one is not careful, the search item may be on the boundary and may get missed out due to "off-by-one" error. Don't overconstrain the bounds. Relax them while adhering to the predicate.

^{}Also, when calculating the midpoint, take care that overflow and truncation errors are avoided or handled correctly.Always test for corner cases. Will the code work for odd and even number of list items? Will it work for 1-item or 2-item lists? How will it behave if there are duplicate entries of the search term? Will it exit cleanly if the search term is not found? Does it work correctly when searching for real numbers? With real numbers, match of the search term should be based on either the number of steps or the relative difference between steps. Avoid using absolute difference since precision is limited when numbers are large.

^{}If a recursive implementation is used, be aware how many recursive calls will be possible due to stack space limits.

Can you give examples of overflow and truncation errors? When calculating the midpoint on a large array, indices may be large values. Thus,

`mid = (lo + hi)/2`

may result in an overflow. Change this to`mid = lo + (hi-lo)/2`

. On a 2-element array, with lo=0 and hi=1, midpoint will remain at 0 due to truncation. If the match is at index 1, this will result in an infinite loop. The correct code in such a case would be`mid = lo + (hi-lo+1)/2`

.^{}What support do we have from languages to use or implement binary search? While it's possible to do one's own implementation, many languages provide libraries that simplify implementation:

^{}- C++ Standard Template Library: lower_bound, upper_bound, binary_search, equal_range
- Java: Arrays.binary_search
- .NET Framework: Array.BinarySearch
- Python: bisect
^{}

## Sample Code

## References

- Anmol. 2017. "Why is Binary Search preferred over Ternary Search?" GeeksforGeeks. Accessed 2017-03-24.
- GeeksQuiz. 2016. "Binary Search." YouTube. March 28. Accessed 2017-03-22.
- Gupta, Sandeep. 2011. "Recursion v/s Iteration." sangupta.com. February 6. Accessed 2017-03-22.
- Herman, Jaken. 2015. "What benefit is there to using recursive binary search over iterative binary search or vice versa?" StackExchange: Software Engineering. February 19. Accessed 2017-03-22.
- Johnson, Max. 2015a. "5 Gifs to Understand Binary Search Trees." Penjee.com Blog. November 22. Accessed 2017-03-26.
- Johnson, Max. 2015b. "Binary Vs Linear Search Through Animated Gifs." Penjee.com Blog. December 15. Accessed 2017-03-26.
- Knuth, Donald E. 1997. The Art of Computer Programming. Second Edition. Vol 3: Sorting and Searching.
- Lovro. 2017. "Binary Search." TopCoder. Accessed 2017-03-22.
- Manna, Zohar, and Richard Waldinger. 1986. "The Origin of a Binary-Search Paradigm." Technical Note 351R. SRI International. October. Accessed 2017-03-24.
- Pfaff, Ben. 2004. "What are the advantages of iteration over recursion, and vice versa?" January 2. Accessed 2017-03-24.
- Popov, Stoimen. 2015. "Binary Search." Algorithm's Wiki on GitHub. October 16. Accessed 2017-03-26.
- Rosetta Code. 2017. "Binary search." March 7. Accessed 2017-03-24.
- Seguin, Karl. 2012. "Binary Search." May 15. Accessed 2017-03-24.
- TheCodeAddict. 2011a. "Searching Algorithms – Part 2." TheCodeAddict. November 17. Accessed 2017-03-24.
- TheCodeAddict. 2011b. "Searching Algorithms – Part 3." TheCodeAddict. November 18. Accessed 2017-03-24.
- Tillema, Erik. 2016. "The Joy of Algorithms & Data Structures - part 2." Infi. April 5. Accessed 2017-03-22.
- Venki. 2013. "The Ubiquitous Binary Search: Set 1." GeeksforGeeks. February 4. Accessed 2017-03-22.
- Wikipedia. 2016. "Interpolation search." November 2. Accessed 2017-03-22.
- Wikipedia. 2017. "Binary search algorithm." March 18. Accessed 2017-03-22.

## Milestones

## Tags

## See Also

- Uniform binary search
- Interpolation search
- Divide and conquer
- Big O notation
- Algorithmic complexity
- Recursion

## Further Reading

- Cormen, Thomas and Devin Balkcom. 2017. "Binary search." Khan Academy. Accessed 2017-03-24.
- Lovro. 2017. "Binary Search." TopCoder. Accessed 2017-03-22.
- Wikipedia. 2017. "Binary search algorithm." March 18. Accessed 2017-03-22.