Talk:Random binary tree
Random binary tree has been listed as one of the Mathematics good articles under the good article criteria. If you can improve it further, please do so. If it no longer meets these criteria, you can reassess it. Review: March 29, 2024. (Reviewed version). |
This article is rated GA-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||
|
GA Review
[edit]GA toolbox |
---|
Reviewing |
- This review is transcluded from Talk:Random binary tree/GA1. The edit link for this section can be used to add comments to the review.
Nominator: David Eppstein (talk · contribs)
Reviewer: Czarking0 (talk · contribs) 02:28, 28 March 2024 (UTC)
Hello and thanks for your contribution. At a glance this seems like a cool article. I will attempt to provide a though and well paced review. I am placing this table below too keep track of the criteria but feel free to suggest another format.
Rate | Attribute | Review Comment |
---|---|---|
1. Well-written: | ||
1a. the prose is clear, concise, and understandable to an appropriately broad audience; spelling and grammar are correct. |
| |
1b. it complies with the Manual of Style guidelines for lead sections, layout, words to watch, fiction, and list incorporation. |
| |
2. Verifiable with no original research: | ||
2a. it contains a list of all references (sources of information), presented in accordance with the layout style guideline. |
| |
2b. reliable sources are cited inline. All content that could reasonably be challenged, except for plot summaries and that which summarizes cited content elsewhere in the article, must be cited no later than the end of the paragraph (or line if the content is not in prose). |
Looks good, read through several of them. Byline checked all. | |
2c. it contains no original research. |
Looks good | |
2d. it contains no copyright violations or plagiarism. |
Earwig looks good | |
3. Broad in its coverage: | ||
3a. it addresses the main aspects of the topic. |
| |
3b. it stays focused on the topic without going into unnecessary detail (see summary style). |
| |
4. Neutral: it represents viewpoints fairly and without editorial bias, giving due weight to each. |
No issues | |
5. Stable: it does not change significantly from day to day because of an ongoing edit war or content dispute. |
No issues | |
6. Illustrated, if possible, by media such as images, video, or audio: | ||
6a. media are tagged with their copyright statuses, and valid non-free use rationales are provided for non-free content. |
The author has taken the time to generate these great images and dedicate them to the public domain! | |
6b. media are relevant to the topic, and have suitable captions. |
Super great | |
7. Overall assessment. |
Article looks good I left a couple notes to the author on what might make it better but I do not think they make a difference for GA. I appreciate this deeply knowledgeable author writing a very clear article on what is an important subject. I especially appreciate the images. |
Some replies:
Re 1a: "Derivation": you mean a formal proof? Really? Or am I misunderstanding this point? "Almost surely": should have been with high probability (that is, probability tending to 1 in the limit of large rather than probability 1 even for fixed ; fixed. Re "(or extended...)": removed.
- After reading it again I changed my mind and I think it is fine the way it is. Done Czarking0 (talk) 01:51, 29 March 2024 (UTC)
Re 1b: There is only one sentence about treaps in the lead. The paragraph it is in is mostly not about treaps. It is intended as a brief summary of the long "from random permutations" section, most of which is motivated by the average-case analysis of insertion-only binary search trees without any rebalancing. Treaps are a trick to get that same analysis to work for worst-case inputs with both insertions and deletions; I think they are worth mentioning in the lead, but not for more than a sentence. Re critical Galton-Watson trees: it's difficult to split out the p=1/2 cases from the p<1/2 and p>1/2 cases in the analysis, but I added a paragraph break before the algorithmic application as suggested, and added subsection headers to the whole section.
Re 2a and "only the weaker upper bound": removed "only" and "actually" to avoid the implication that this is the last word on the subject (although I think it is).
- I like this change and I think it is fine the way it is. The problem I am trying to avoid is when an article claims something that was true at the time of writing but a new proof is found later then years go by with no change. That is probably outside the context of a GAN.
Re 3a: "Is it important that the distribution is uniform?" (for treap priorities) No. It is important that they be independent and (likely) distinct, but uniformity doesn't matter. That's why I omitted it from this part. Some of the later works cited in the last paragraph of this section address the issue that getting computer representations of uniform reals to enough precision to make sure they're all distinct needs a lot of bits of randomness per number (and then all these bits need to be stored in the treap), leading to unnecessary inefficiencies. Instead they use more complicated schemes to save bits, but then they don't get trees whose distribution perfectly matches the uniform insertion order distribution.
- Just to be clear, and because I find this surprising, I would still be correct if I replaced random with exponentially distributed or beta distributed (to maintain the unit interval) such that it read: "By choosing the priorities to be independent exponentially distributed real numbers in the unit interval, and by maintaining the Cartesian tree structure using tree rotations after any insertion or deletion of a node, it is possible to maintain a data structure that behaves like a random binary search tree." I wanted to make sure I am on the same page with you in that it is not just that it does not need to be uniform or even approximately uniform. Czarking0 (talk) 01:51, 29 March 2024 (UTC)
- Ok after reading more I get it. Since "behaves like" is talking about the structure of the tree and not making claims about the priorities themselves it does not really matter how the priorities are assigned. The treap article covers this well. I think we are Done here. Czarking0 (talk) 03:13, 29 March 2024 (UTC)
Re: "Why does the reader want to know about radix trees?": If the reader cares about probability distributions over random trees, the radix tree gives you a predetermined number of internal nodes, unlike the trie. If the reader cares about data structures, they need to know about both tries and radix trees as both are standard ideas in this area. If the reader cares about algorithms, both tries and radix trees model the behavior of certain binary-representation-based divide-and-conquer sorting algorithms on random inputs, but different algorithms. The trie models a sorting algorithm that divide on the bits of the binary numbers given, in strict succession from most significant to least significant. The radix tree models a sorting algorithm that finds and then divides on the next bit that separates some inputs. My personal interests are in data structures and algorithms, but actually my own motivation for including the section on tries and radix trees in this article was less about algorithms and more from the point of view of probability distributions over random trees: they give a quite different distribution than the other ones described (more strongly balanced, especially in the radix tree case) and I thought that made an interesting contrast to the other distributions.
- Great reply, I think the article would be served a little bit by saying some of this. There is already radix tree but an uninformed reader may not know that they want to click on that page just from reading this. Something like, "Radix trees provide a baseline for comparing data structures. In divide-and-conquer algorithms, radix trees model the behavior on random inputs" I am not sure if those statements are 100% accurate I am just compressing your reply. Citations will likely be needed unless your sources, which I will review shortly, have some info on that. Czarking0 (talk) 01:51, 29 March 2024 (UTC)
- Yes, I suspect citations motivating this as a subtopic of random trees more generally (rather than just talking about the data structures and their analysis) are likely to be sparse. —David Eppstein (talk) 06:04, 29 March 2024 (UTC)