User:Esquivalience/sandbox: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
No edit summary
Line 4: Line 4:
==== Successful searches ====
==== Successful searches ====
<!-- E stands for "expected" -->
<!-- E stands for "expected" -->
In the binary tree representation, a successful search can be represented by a path from the root to the target node, called an ''internal path''. The length of a path is the number of edges (connections between nodes) that the path passes through. The number of iterations performed by a search, given that the corresponding path has length <math>l</math>, is <math>l + 1</math> counting the initial iteration. The ''internal path length'' is the sum of the lengths of all unique internal paths. Since there is only one path from the root to any single node, each internal path represents a search for a specific element. If the internal path length is <math>I</math>, then the average case <math>E = 1 + \frac{I}{n}</math>, with the one iteration added to count the initial iteration.
In the binary tree representation, a successful search can be represented by a path from the root to the target node, called an ''internal path''. The length of a path is the number of edges (connections between nodes) that the path passes through. The number of iterations performed by a search, given that the corresponding path has length <math>l</math>, is <math>l + 1</math> counting the initial iteration. The ''internal path length'' is the sum of the lengths of all unique internal paths. Since there is only one path from the root to any single node, each internal path represents a search for a specific element. If there are <math>n</math> elements, which is a positive integer, and the internal path length is <math>I(n)</math>, then the average number of iterations for a successful search <math>T(n) = 1 + \frac{I(n)}{n}</math>, with the one iteration added to count the initial iteration.


Since binary search is the optimal algorithm for searching with comparisons, this problem is reduced to calculating the minimum internal path length of all binary trees with <math>n</math> nodes, which is equal to:
Since binary search is the optimal algorithm for searching with comparisons, this problem is reduced to calculating the minimum internal path length of all binary trees with <math>n</math> nodes, which is equal to:


<math>
<math>
I = \sum_{k=1}^n \left \lfloor \log_2(k) \right \rfloor
I(n) = \sum_{k=1}^n \left \lfloor \log_2(k) \right \rfloor
</math>
</math>


Line 18: Line 18:
</math>
</math>


The average number of iterations would be <math>1 + \frac{10}{7} = 2 \frac{3}{7}</math> based on the equation for the average case. The sum for <math>I</math> can be simplified to:
The average number of iterations would be <math>1 + \frac{10}{7} = 2 \frac{3}{7}</math> based on the equation for the average case. The sum for <math>I(n)</math> can be simplified to:


<math>
<math>
I = \sum_{k=1}^n \left \lfloor \log_2(k) \right \rfloor = (n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2
I(n) = \sum_{k=1}^n \left \lfloor \log_2(k) \right \rfloor = (n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2
</math>
</math>


Substituting <math>I</math> into the equation for the average case:
Substituting <math>I(n)</math> into the equation for <math>T(n)</math>:


<math>
<math>
E = 1 + \frac{(n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2}{n}
T(n) = 1 + \frac{(n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2}{n}
</math>
</math>


For integer <math>n</math>, this is equivalent to the equation for the average case on a successful search specified above.
<!--Unsuccessful searches can be represented by augmenting the tree with ''external nodes''. If an internal node, or a node present in the tree, has fewer than two child nodes, then additional child nodes, called external nodes, are added so that each internal node has two children. By doing so, an unsuccessful search can be represented as a path to an external node, whose parent is the single element that remains during the last iteration. An ''internal path'' is a path from the root to an internal node. An ''external path'' can either be an internal path, or a path from the root to an external node. The length of a path is the number of edges (connections between nodes) that the path passes through. The number of iterations performed by a search, given that the corresponding path has length <math>l</math>, is <math>l + 1</math> counting the initial iteration.


<math>
The ''internal path length'' is the sum of all unique internal paths. If <math>I</math>-->
T(n) = 1 + \frac{(n + 1)\left \lfloor \log_2(n + 1) \right \rfloor - 2^{\left \lfloor \log_2(n+1) \right \rfloor + 1} + 2}{n} = \lfloor \log_2 (n) \rfloor + 1 - (2^{\lfloor \log_2 (n) \rfloor + 1} - \lfloor \log_2 (n) \rfloor - 2)/n
</math>

==== Unsuccessful searches ====
Unsuccessful searches can be represented by augmenting the tree with ''external nodes'', which forms an ''extended binary tree''. If an internal node, or a node present in the tree, has fewer than two child nodes, then additional child nodes, called external nodes, are added so that each internal node has two children. By doing so, an unsuccessful search can be represented as a path to an external node, whose parent is the single element that remains during the last iteration. An ''external path'' is a path from the root to an external node. The ''external path length'' is the sum of the lengths of all unique external paths. If there are <math>n</math> elements, which is a positive integer, and the external path length is <math>E(n)</math>, then the average number of iterations for an unsuccessful search <math>T'(n)=1+\frac{E(n)}{n}</math>, with the one iteration added to count the initial iteration.

This problem can similarly be reduced to determining the minimum external path length of all binary trees with <math>n</math> nodes. In a tree with minimum external path length, all external nodes are within the last two levels of the extended binary tree. This is equivalent to an unsuccessful search ending in either the second-deepest or deepest level of the tree depending on the location of the target node.

In particular, let <math>l(n)</math> be the number of iterations in the worst case, or the number of levels in the extended binary tree, <math>l(n) = \lfloor \log_2 (n) + 1 \rfloor</math>. Then, there are <math>2^{l(n)} - n</math> nodes in the second-deepest level of the tree (level <math>l(n)-1</math>) and <math>2n - 2^{l(n)}</math> nodes in the deepest level of the tree (level <math>l(n)</math>). Therefore,

<math>
E(n) = (l(n)-1)(2^{l(n)} - n) + l(n)(2n - 2^{l(n)}) = (l(n)+1)n - 2^{l(n)} = (n + 1)(l(n)+1)-2^{l(n)}
</math>

Revision as of 21:25, 29 September 2018

Derivation of average case

The average number of iterations performed by binary search depends on the probability of each element being searched. The average case is different for successful searches and unsuccessful searches. It will be assumed that each element is equally likely to be searched for successful searches. For unsuccessful searches, it will be assumed that the intervals between and outside elements are equally likely to be searched. The average case for successful searches is the number of iterations required to search every element exactly once, divided by , the number of elements. The average case for unsuccessful searches is the number of iterations required to search an element within every interval exactly once, divided by the intervals. (Knuth 1998, §6.2.1 ("Searching an ordered table"), subsection "Further analysis of binary search".)

Successful searches

In the binary tree representation, a successful search can be represented by a path from the root to the target node, called an internal path. The length of a path is the number of edges (connections between nodes) that the path passes through. The number of iterations performed by a search, given that the corresponding path has length , is counting the initial iteration. The internal path length is the sum of the lengths of all unique internal paths. Since there is only one path from the root to any single node, each internal path represents a search for a specific element. If there are elements, which is a positive integer, and the internal path length is , then the average number of iterations for a successful search , with the one iteration added to count the initial iteration.

Since binary search is the optimal algorithm for searching with comparisons, this problem is reduced to calculating the minimum internal path length of all binary trees with nodes, which is equal to:

For example, in a 7-element array, the root requires one iteration, the two elements below the root require two iterations, and the four elements below require three iterations. In this case, the internal path length is:

The average number of iterations would be based on the equation for the average case. The sum for can be simplified to:

Substituting into the equation for :

For integer , this is equivalent to the equation for the average case on a successful search specified above.

Unsuccessful searches

Unsuccessful searches can be represented by augmenting the tree with external nodes, which forms an extended binary tree. If an internal node, or a node present in the tree, has fewer than two child nodes, then additional child nodes, called external nodes, are added so that each internal node has two children. By doing so, an unsuccessful search can be represented as a path to an external node, whose parent is the single element that remains during the last iteration. An external path is a path from the root to an external node. The external path length is the sum of the lengths of all unique external paths. If there are elements, which is a positive integer, and the external path length is , then the average number of iterations for an unsuccessful search , with the one iteration added to count the initial iteration.

This problem can similarly be reduced to determining the minimum external path length of all binary trees with nodes. In a tree with minimum external path length, all external nodes are within the last two levels of the extended binary tree. This is equivalent to an unsuccessful search ending in either the second-deepest or deepest level of the tree depending on the location of the target node.

In particular, let be the number of iterations in the worst case, or the number of levels in the extended binary tree, . Then, there are nodes in the second-deepest level of the tree (level ) and nodes in the deepest level of the tree (level ). Therefore,