Talk:Binary search algorithm

This is the talk page for discussing improvements to the Binary search algorithm article.
This is not a forum for general discussion of the article's subject.

Put new text under old text. Click here to start a new topic.
New to Wikipedia? Welcome! Learn to edit; get help.

Article policies

Find sources: Google (books · news · scholar · free images · WP refs) · FENS · JSTOR · TWL

Archives: 1, 2

This article is written in American English, which has its own spelling conventions (color, defense, traveled) and some terms that are used in it may be different or absent from other varieties of English. According to the relevant style guide, this should not be changed without broad consensus.

Binary search algorithm is a featured article; it (or a previous version of it) has been identified as one of the best articles produced by the Wikipedia community. Even so, if you can update or improve it, please do so.

This article appeared on Wikipedia's Main Page as Today's featured article on October 29, 2018.

Article milestones
Date	Process	Result
May 11, 2016	Good article nominee	Listed
February 25, 2017	Peer review	Reviewed
August 14, 2018	Featured article candidate	Promoted
A fact from this article appeared on Wikipedia's Main Page in the "Did you know?" column on May 23, 2016. The text of the entry was: Did you know ... that when the binary search algorithm* was assigned in a course for professional programmers, 90 percent of the programmers failed to provide a correct solution?*
Current status: Featured article

Computer science High‑importance

This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles

High

This article has been rated as High-importance on the project's importance scale.

Things you can help WikiProject Computer science with:

Here are some tasks awaiting attention:

Article requests :
- Requested articles/Applied arts and sciences/Computer science, computing, and Internet
Cleanup :
- Computer science articles needing attention
- Computer science articles needing expert attention
Copyedit :
- Computing
Expand :
- Computer science
Infobox :
- Computer science articles without infoboxes
Maintain :
- Timeline of computing 2020–present
Photo :
- Find pictures for the biographies of computer scientists (see List of computer scientists)
- Computing articles needing images
Stubs :
- Computer science stubs
Unreferenced :
- WikiProject Computer science/Unreferenced BLPs
Project-related :
- Tag all relevant articles in Category:Computer science and sub-categories with {{WikiProject Computer science}}

Other talk page banners

This article was submitted to WikiJournal of Science for external peer review in 29 October 2018 (reviewer reports). It was published as Anthony Lin; et al. (2 July 2019). "Binary search algorithm" (PDF). WikiJournal of Science. 2 (1): 5. doi:10.15347/WJS/2019.005. ISSN 2470-6345. Wikidata Q81434400.{{cite journal}}: CS1 maint: unflagged free DOI (link) and the updated content was reintegrated into the Wikipedia page under a CC BY-SA-3.0 license (2019).

Primary Procedure can Yield Position in "unsuccessful" case[edit]

I observe that if the search terminates in Step 2 as "unsuccessful", the target value 'T' would be located between A_R and A_L. That is, R is the index (possibly -1) of the array element less than T and L is the index of the array element (possibly n) of the array element greater than T.

I find this property of the procedure useful when searching ranges. It is, I believe, one step faster than using the right-most or left-most variants on average.

I'm not sure if a note on the Article page is justified, but having this information would be helpful to those doing "range" searches. It may be that a note on the Article page violates some policy WRT cited work vs. original work, other Talk topics make me think it might.

This is not a comment on the utility of the left-most or right-most variants. They are useful when the array can contain duplicates. Usually no duplicates are present when the problem is to determine which range a value is in.

Glennp17321 (talk) 18:52, 17 May 2020 (UTC)[reply]

Ranges should have exclusive upper bounds[edit]

I've taught Binary Search multiple times at the university level. I've also asked it as an interview question about 50 times from people with Masters in CS. I've determined that a major reason that people fail my interview question is because binary search is usually taught with inclusive upper bounds, rather than exclusive ones. This Wikipedia article continues that habit. I think the article would be substantially improved by switching to exclusive upper bounds.

By "exclusive upper bounds", I mean that the initial state for an $n$ element array would become $L=0$ and $R=n$ instead of having $R=n-1$ . Almost all good array code I've seen uses an exclusive upper bound. All of Java's libraries use an exclusive upper bound. One nice property of the exclusive upper bound is that $R-L$ is the number of items. A corollary is that $L=R$ is a test for an empty range and $L<R$ is a test for a non-empty one. Another good property of exclusive upper bounds is that for any $M$ where $L$ ≤ $M$ ≤ $R$ , it is easy to split the range into two subranges $[L,M)$ and $[M,R)$ . These properties make reasoning about the code much easier.

When you use the exclusive upper bound, the code for binary search becomes pretty natural:

function binary_search_exclusive_upper_bound(A, n, T) is
    L := 0
    R := n
    while R-L ≥ 2 do
        m := L + floor((R-L)/ 2)
        if T < A[m] then
            R := m
        else:
            L := m
    if R-L=1 and A[L] = T then
        return L
    return unsuccessful

Nota Bene: I am assuming the "and" operator uses short-circuit evaluation.

Please compare the code above to the code on the main page. I hope you'll see that this code is much easier to reason about. You can see that (1) the loop runs only when it can split the range, (2) that $m$ is clearly not equal to $R$ and not equal to $L$ , (3) that it doesn't matter if I used $floor()$ or $ceil()$ to compute $m$ , (4) that range variables are correctly handled without off-by-one errors and (5) the code handles all cases, such as when $n$ is zero.

BTW, I believe that handling $n$ is zero, an empty range, is an important case. The paper by Pattis that is cited on the main page also considers it important. The main page's version of "binary_search_alternative" goes into an infinite loop if $n$ is zero. I see that as a consequence of using non-exclusive upper bounds, because the code on the main page is harder to reason about.

I believe binary search is usually taught with an inclusive upper bound because it is such an old algorithm. When it was created, we didn't have the habit (which I learned with C in the early 80's) of using an exclusive upper bound. But almost all array code now uses the exclusive upper bound. Using the exclusive upper bound is easier to reason with. I believe strongly that this article would be substantially improved by using exclusive upper bounds.

Michael Nahas, former Adjunct Faculty of University of Virginia, former Instructor at Johns Hopkins Center for Talented Youth

P.S. If you need an external reference, I wrote up my opinions on my website.

Mdnahas (talk) 23:07, 17 June 2021 (UTC)[reply]

Thank you very much for sharing your input! I definitely agree that the exclusive lower bound is a better and more logical way of implementing and reasoning about this algorithm. However, Wikipedia is not a place to "right great wrongs"; there at least needs to be a reference that confirms that this is actually the proper way to implement the algorithm. We can't use the thing you put on your website as a source, because it's self-published. Also, as you also mentioned, the inclusive upper bound is also still commonly used. I quickly looked up the .NET source code and it seems that they also use the inclusive upper bound in their binary search implementation (source). So I think that, if we were to add your version, the current one at least also needs to be mentioned, maybe under the "alternative procedure" section. ―Jochem van Hees (talk) 00:07, 19 June 2021 (UTC)[reply]

My quick impression is that having a half-open search range is likely to confuse more readers, when they start reading the article, than it helps later on with pseudocode simplifications. Also, my impression from other similar articles is that such a change is likely to lead to a nightmare of maintenance where we keep having to revert well-meaning freshmen who think they see an inconsistency in the inequalities and change it to look more consistent, without changing the rest of the code to match. So the situation is a little different from in a classroom, where you can control better the whole presentation. —David Eppstein (talk) 01:04, 19 June 2021 (UTC)[reply]

Using an exclusive upperbound is just good programming practice. I don't think I am trying to "right great wrongs" --- we're talking about adding 1 to a variable. I am just applying good practice to an algorithm. And, as I said, good practice is important to apply here. Many experienced programmers get this program wrong, because it is hard to reason about code that uses an inclusive upper bounds.

In fact, the exclusive upper bound is such good programming practice that C# version 8.0 has added a language feature where array ranges are specified with an exclusive upper bound.

Since Jochem looked up an implementation for the #5 programming language, let's look at what the other 4 do.

C's bsearch interface uses start + length. I couldn't find LLVM's implementation. The GCC implementation uses start + length.

The C++ STL function has an interface using exclusive upper bounds. The algorithm is implemented in std::lower_bound. The reference code works with start + length.

Python's bisect has an interface using an exclusive upper bound. It's implementation works using an exclusive upper bound.

Java's Arrays.binarySearch interface uses exclusive upper bounds. The OpenJDK implementation and Android implementation use an inclusive upper bound. The GNU implementation uses exclusive upper bounds.

I don't deny that the inclusive upper bound is used. It is how the literature first wrote about the algorithm. It is how most textbooks present it. It seems to be a leftover from before we knew that it was bad programming practice to use it. I'd like you to notice that none of these languages use the inclusive bounds for their interface. None. If there "needs to be a reference that confirms that this is actually the proper way to implement the algorithm", I would say that that is it.

I don't care if the Wikipedia page uses start+length or exclusive upper bounds. Either is fine programming practice. If David is worried about well-meaning freshman modifying code with the exclusive upper bounds, can we agree on start+length?

Mdnahas (talk) 19:09, 5 July 2021 (UTC)[reply]

If the inclusive upper bound is "how most textbooks present it" then that is a strong argument for how we should present it here. In this sort of thing, Wikipedia should be a follower, not a leader. —David Eppstein (talk) 19:39, 5 July 2021 (UTC)[reply]

Wikipedia "should be a follower" is an argument for Wikipedia not being a place for research or new scholarship. Changing a inclusive upper bound to start+length is not a new idea, nor is it significant to the algorithm itself. Moreover, when saying "Wikipedia should be a follower", I think our reference should be the code being run, and not what's in textbooks. That the code departs from the textbooks is an important detail, in my opinion.

I think the prime goal of Wikipedia is to clearly communicating an idea. I am confident that using an inclusive upper bound does not clearly communicate this algorithm. Mdnahas (talk) 17:07, 28 July 2021 (UTC)[reply]

Wikipedia is not the "place for research or new scholarship." Wikipedia:No original research covers that. Wikipedia reports what reliable sources report. ~Anachronist (talk) 17:22, 28 July 2021 (UTC)[reply]

Apart from agreeing with David about following sources, I disagree with a lot that is written here. At the moment, what I think is being called the "inclusive" method, the upper and lower positions of where the target may be are treated the same. The algorithm in concept is symmetrical with respect to upper and lower. The proposed "exclusive" method proposes that the upper limit and the lower limit be treated differently, and this is claimed to be better. Well, BS. Adding extra concepts to an algorithm makes it harder to understand, not easier. I'm not convinced by the experience argument either, as I taught binary search to undergraduates more than 30 times without a similar experience. McKay (talk) 04:04, 29 July 2021 (UTC)[reply]

Approximate matches doesn't belong in this article[edit]

First of all, someone needs to define which regex means "Approximate matches"

I believe that they are exclusively talking about "begins with..." / "starts with..."

but it has almost nothing to do with binary searching. Even though it's interesting, it's not related to the Binary Search Algorithm.

AlwaysAngry (talk) 18:46, 25 November 2022 (UTC)[reply]

Examples of binary search described in ancient texts[edit]

Here are two instances of ancient texts that describe finding an item (person) in a list (group) by halving the size of the list in each iteration and examining a predicate over each. These instances may be earlier than the currently listed reference to sorted lists in a Babylonian tablet.

1. The Losaka story in the Jataka tales:

So in time it came to pass that the people fell into a wretched plight. Reflecting that such had not been their lot in former days, but that now they were going to rack and ruin, they concluded that there must be some breeder of misfortune among them, and resolved to divide into two bands. This they did; and there were then two bands of five hundred families each. Thence-forward, ruin dogged the band which included the parents of the future Losaka, whilst the other five hundred families throve apace. So the former resolved to go on halving their numbers, and did so, until this one family was parted from all the rest.

src: https://www.sacred-texts.com/bud/j1/j1044.htm

2. 1 Samuel 14:41 in the Old Testament, as described in Urim and Thummim#Form_and_function.

src: https://en.wikisource.org/wiki/Bible_(King_James)/1_Samuel#14:41

- 152.44.156.175 (talk) 07:56, 4 July 2023 (UTC)[reply]

Section on galloping binary search?[edit]

Should we consider adding a section on galloping binary search, which operates differently, but it fundamentally the same algorithm? See for example Cormack's book on search systems. DMH43 (talk) 17:05, 29 December 2023 (UTC)[reply]