List-labeling problem: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
redirect to order-maintenance problem
Tag: New redirect
 
Farach (talk | contribs)
This content was substantially rewritten from the article on the Order-maintenance problem. The order-maintenance problem is an application of list labeling, of which there are many. So List labeling should get its own page, with more details than was present in the order maintenance article.
Tag: Removed redirect
Line 1: Line 1:
In [[computer science]], the '''list-labeling problem''' involves maintaining a [[totally ordered set]] S supporting the following operations:
#REDIRECT [[Order-maintenance problem#List-labeling]]

* <code>insert(X)</code>, which inserts X into set S;
* <code>delete(X)</code>, which removes X from set S;
* <code>label(X)</code>, which returns a label assigned to X subject to:
** <code>label(X)</code> <math>\in \{0, 1, \ldots, m-1\}</math>
** <math>\forall</math> X,Y <math>\in</math> S, X < Y implies <code>label(X) < label(Y)</code>

The cost of a list labeling algorithm is the number of label (re-)assignments per insertion or deletion. List labeling algorithms have applications in many areas, including the [[order-maintenance problem]], [[cache-oblivious data structures]],<ref name="BenderCacheObl">{{citation
| last1 = Bender | first1 = Michael A. | author1-link = Michael A. Bender
| last2 = Demaine | first2 = Erik D. | author2-link = Erik Demaine
| last3 = Farach-Colton | first3 = Martin | author3-link = Martin Farach-Colton
| doi = 10.1137/S0097539701389956
| issue = 2
| journal = [[SIAM Journal on Computing]]
| mr = 2191447
| pages = 341–358
| title = Cache-oblivious B-trees
| url = http://erikdemaine.org/papers/CacheObliviousBTrees_SICOMP/paper.pdf
| volume = 35
| year = 2005}}.</ref> [[persistent data structure|data structure persistence]],<ref
name="Driscoll">{{citation
| last1 = Driscoll | first1 = James R.
| last2 = Sarnak | first2 = Neil
| last3 = Sleator | first3 = Daniel D. | author3-link = Daniel Sleator
| last4 = Tarjan | first4 = Robert E. | author4-link = Robert Tarjan
| doi = 10.1016/0022-0000(89)90034-2
| issue = 1
| journal = [[Journal of Computer and System Sciences]]
| mr = 990051
| pages = 86–124
| title = Making data structures persistent
| volume = 38
| year = 1989| doi-access = free
}}.</ref> graph algorithms<ref
name="Eppstein">{{citation
| last1 = Eppstein | first1 = David | author1-link = David Eppstein
| last2 = Galil | first2 = Zvi | author2-link = Zvi Galil
| last3 = Italiano | first3 = Giuseppe F. | author3-link = Giuseppe F. Italiano
| last4 = Nissenzweig | first4 = Amnon
| doi = 10.1145/265910.265914
| issue = 5
| journal = [[Journal of the ACM]]
| mr = 1492341
| pages = 669–696
| title = Sparsification—a technique for speeding up dynamic graph algorithms
| volume = 44
| year = 1997}}.</ref><ref name="Katriel">{{citation
| last1 = Katriel | first1 = Irit
| last2 = Bodlaender | first2 = Hans L. | author2-link = Hans L. Bodlaender
| doi = 10.1145/1159892.1159896
| issue = 3
| journal = ACM Transactions on Algorithms
| mr = 2253786
| pages = 364–379
| title = Online topological ordering
| volume = 2
| year = 2006| citeseerx = 10.1.1.78.7933
}}.</ref> and fault-tolerant data structures.<ref
name="Aumann">{{citation
| last1 = Aumann | first1 = Yonatan
| last2 = Bender | first2 = Michael A. | author2-link = Michael A. Bender
| contribution = Fault tolerant data structures
| doi = 10.1109/SFCS.1996.548517
| pages = 580–589
| title = Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS 1996)
| year = 1996| title-link = Symposium on Foundations of Computer Science
| isbn = 978-0-8186-7594-2
}}.</ref>


==Upper bounds==

The cost of list labeling is related <math>m</math>, the range of the labels assigned. Suppose that no more than <math>n</math> items are stored in the list-labeling structure at any time. Four cases have been studied:
* <math> m = 2^{\Omega(n)}</math>
* <math> m = n^{\Omega(1)}</math>
* <math> m = O(n) </math>
* <math> m = (1+\varepsilon) n </math>

===Exponential Labels===
In the exponential label case, each item that is inserted can be given a label that is the average of its neighboring labels. It takes <math>\Omega(n)</math> insertions before two items are at adjacent labels and there are no labels available for items in between them. When this happens, all items are relabelled evenly from the space of all labels. This incurs <math>O(n)</math> relabeling cost. Thus, the amortized relabeling cost in this case is <math>O(1)</math>.<ref name="BulanekKoSa15">{{citation
| last1 = Bulánek | first1 = Jan
| last2 = Koucký | first2 = Michal
| last3 = Saks | first3 = Michael E. | author3-link = Michael Saks (mathematician)
| contribution = Tight Lower Bounds for the Online Labeling Problem
| pages = 1765--1797
| title = SIAM Journal on Computing
| volume = 44
| year = 2015
}}.</ref>

===Polynomial Labels===
The other cases of list labeling can be solved via balanced [[binary search trees]]. Consider <math>T</math>, a binary search tree on S of height <math>h</math>. We can label every node in the tree via a ''path label'' as follows:
Let <math>\sigma(X)</math> be the sequence of left and right edges on the root-to-<math>X</math> path, encoded as bits. So if <math>X</math> is in the left subtree of the root, the high-order bit of <math>\sigma(X)</math> is <math>0</math>, and if it is in the right subtree of the root, the high-order bit of <math>\sigma(X)</math> is <math>1</math>. Once we reach <math>X</math>, we complete <math>\sigma(X)</math> to a length of <math>h+1</math> as follows. If <math>X</math> is a leaf, we append <math>0</math>s as the low order bits until <math>\sigma(X)</math> has <math>h+1</math> bits. If
<math>X</math> is an internal node, we append one <math>0</math> and then <math>1</math>s as the low order bits until <math>\sigma(X)</math> has <math>h+1</math> bits.

<!-- XXX need a figure illustrating path labels! -->
The important properties of <math>\sigma()</math> are that: these labels are in the range <math>\{0, 1, \ldots, 2^{h+1}-1\}</math>; and for two nodes with keys <math>X</math> and <math>Y</math> in <math>T,</math> if <math>X<Y,</math> then <math>\sigma(X) < \sigma(Y)</math>. To see this latter property, notice that the property is true if the least common ancestor of <math>X</math> and <math>Y</math> is neither <math>X</math> nor <math>Y</math>, because <math>\sigma(X)</math> and <math>\sigma(Y)</math> will share bits until their least common ancestor. If <math>X<Y</math>, then because <math>T</math> is a search tree, <math>X</math> will be in the left subtree and will
have a next bit of <math>0</math>, whereas <math>Y</math> will be in the right subtree and will have a next bit of <math>1</math>.

Suppose instead that, without loss of generality, the least common ancestor of <math>X</math> and <math>Y</math> is <math>X</math>, and that <math>X</math> has depth <math>d</math>. If <math>Y</math> is in the left subtree of <math>X</math>, then <math>\sigma(X)</math> and <math>\sigma(Y)</math> share the first <math>d+1</math> bits. The remaining bits of <math>\sigma(X)</math> are all 1s, whereas the remaining bits of <math>\sigma(Y)</math> must have a <math>0</math>, so <math>\sigma(Y)<\sigma(X)</math>. If instead <math>Y</math> is in the right subtree of <math>X</math>, then
<math>\sigma(X)</math> and <math>\sigma(Y)</math> share the first <math>d</math> bits and the <math>d+1</math>st bit of <math>\sigma(X)</math> is <math>0</math>, whereas
the <math>d+1</math>st bit of <math>\sigma(Y)</math> is <math>1</math>. Hence <math>\sigma(X)<\sigma(Y)</math>.

We conclude that the <math>\sigma()</math> function fulfills the monotonicity property of the the <code>label()</code> function. Thus if we can balance the binary tree to a depth of <math>(\log m) -1</math>, we will have a solution to the list labeling problem for labels in the range <math>\{0,\ldots,m-1\}</math>.

===Weight-balanced trees===

In order to used a [[self-balancing binary search tree]] to solve the list labeling problem, we need to first define the cost function of a balancing operation on insertion or deletion to equal the number of labels that are changed, since every rebalancing operation
of the tree would have to also update all path labels in the subtree rooted at the site of the rebalance. So,
for example, rotating a node with a subtree of size
<math>k</math>, which can be done in constant time under usual
circumstances, requires <math>\Omega(k)</math> path label updates. In
particular, if the node being rotated is the root then the rotation
would take time linear in the size of the whole tree. With that much
time the entire tree could be rebuilt. We will see below that there
are self-balancing binary search tree data structures that cause an
appropriate number of label updates during rebalancing.

A [[weight-balanced tree]] BB[<math>\alpha</math>] is defined as follows. For every <math>X</math> in a root tree <math>T</math>, define <math>size(X)</math> to be the number of nodes in the subtree rooted at <math>X</math>. Let the left and right children of <math>X</math> be <math>X.left</math> and <math>X.right</math>, respectively. A tree <math>T</math> is <math>\alpha</math>-weight balanced if for every internal node <math>X</math> in <math>T</math>, <math>size(X.left) \ge \lfloor \alpha \cdot size(X.right)\rfloor</math> and <math>size(X.right) \ge \lfloor \alpha \cdot size(X.left)\rfloor.</math>

The height of a BB[<math>\alpha</math>] tree with <math>n</math> nodes is at most <math>\log_{1/(1-\alpha)} n=-\log(n)/\log(1-\alpha).</math> Therefore, in order to solve the list-labeling problem, we need <math>\alpha = 1 - 1/(1-n^{-1/(\log(m)-1)})</math> to achieve a depth of <math>\log(m) - 1.</math>

A [[scapegoat tree]] is a weight-balanced tree where whenever a node no longer satisfies the weight-balance condition the entire subtree rooted at that node is rebuilt. This rebalancing scheme is ideal for list labeling, since the cost of rebalancing now equals the cost of relabeling. The amortized cost of an insertion or deletion is <math>(1+ 1/(1-2\alpha))\log_{1/1-\alpha} n + O(1).</math> For the list labeling problem, the cost becomes:

* <math> m = n^{\Omega(1)}</math>: <math>\alpha = O(1)</math>, the cost of list labeling is amortized <math>O(\log n).</math> (Folklore, modification of Itai, Konheim and Rodeh<ref name="ItaiKoRo">{{citation
| last1 = Itai | first1 = Alon
| last2 = Konheim | first2 = Alan G.
| last3 = Rodeh | first3 = Michael
| contribution = A Sparse Table Implementation of Priority Queues
| pages = 417-431
| title = ICALP
| year = 1981
}}</ref>.)
* <math> m = O(n) </math>: <math>\alpha = 1+ \Theta(1/\log n)</math>, the cost of list labeling is amortized <math>O(\log^2 n).</math> This bound was first achieved by Itai, Konheim, and Rodeh<ref name="ItaiKoRo" /> and deamortized by Willard<ref name="Willard">{{citation
| last1 = Willard | first1 = Dan E.
| contribution = A Density Control Algorithm for Doing Insertions and Deletions in a Sequentially Ordered File in Good Worst-Case Time
| pages = 150-204
| title = Information and Computation
| volume = 97
| number = 2
| year = 1992
}}.</ref>.
* <math> m = (1+\varepsilon) n </math>: If <math>m</math> is a power of two, then we can set <math>\alpha = 1+ \Theta(\varepsilon/\log n)</math>, and the cost of list labeling is <math>O(\varepsilon^{-1}\log^2 n)</math>. A more careful algorithm can achieve this bound even in the case where <math>m</math> is not a power of two.



==Lower bounds and open problems==

In the case where <math> m = n^{1+\Theta(1)}</math>, a lower bound of <math>\Omega(\log
n)</math><ref name="DietzPaul">{{citation
| last1 = Dietz | first1 = Paul F.
| last2 = Seiferas | first2 = Joel I.
| last3 = Zhang | first3 = Ju
| contribution = A tight lower bound for on-line monotonic list labeling
| doi = 10.1007/3-540-58218-5_12
| location = Berlin
| mr = 1315312
| pages = 131–142
| publisher = Springer
| series = Lecture Notes in Computer Science
| title = Algorithm theory—SWAT '94 (Aarhus, 1994)
| volume = 824
| year = 1994| isbn = 978-3-540-58218-2
}}.</ref> has been established for list labeling. This lower bound applies to randomized algorithms, and so the known bounds for this case are tight.

In the case where <math> m = {(1+\Theta(1))}n</math>, there is a lower bound of <math>\Omega(\log^2 n)</math> list labeling cost for deterministic algorithms<ref name="BulanekKoSa15" />. Furthermore, the same lower bound holds for ''smooth'' algorithms, which are those whose only relabeling operation assigns labels evenly in a range of items<ref name="DietzZh">{{citation
| last1 = Dietz | first1 = Paul F.
| last2 = Zhang | first2 = Ju
| contribution = Lower bounds for monotonic list labeling
| pages = 173-180
| title = Algorithm theory—SWAT '90
| year = 1990
}}.</ref> This lower bound is surprisingly strong in that it applies in the offline cases where all insertions and deletions are known ahead of time.

However, the best lower bound known for the linear case of algorithms that are allowed to be non-smooth and randomized is <math>\Omega(\log n)</math>. Indeed, it has been an open problem since 1981 to close the gap between the <math>O(\log^2 n)</math> upper bound and the <math>\Omega(\log n)</math> in the linear case<ref name="ItaiKoRo" /><ref name="SaksOpen">{citation
| last1 = Saks | first1 = Michael
| contribution = Online Labeling: Algorithms, Lower Bounds and Open Questions
| pages = 23-28
| title = International Computer Science Symposium in Russia
| year = 2018
}}.</ref>



==Applications==
The best known applications of list labeling are the [[order-maintenance problem]] and '''packed-memory arrays''' for [[cache-oblivious algorithm|cache-oblivious data structures]]. The order-maintenance problem is that of maintaining a data structure on a linked list to answer order queries: given two items in the linked list, which is closer to the front of the list? This problem can be solved directly by polynomial list labeling in <math>O(\log n)</math> per insertion and deletion and <math>O(1)</math> time per query, by assigning labels that are monotone with the rank in the list. The time for insertions and deletions can be improved to constant time by combining exponential polynomial list labeling with exponential list labeling on small lists.

The packed-memory array is an array of size <math>(1+\varepsilon)n</math> to hold <math>n</math> items so that any subarray of size <math>k</math> holds <math>\Theta(k)</math> items. This can be solved directly by the <math>m=(1+\varepsilon)n</math> case of list labeling, by using the labels as addresses in the array, as long as the solution guarantees that the space between items is <math>O(1)</math>. Packed-memory arrays are used in cache-oblivious data structures to store data that must be indexed and scanned. The density bounds guarantee that a scan through the data is asymptotically optimal in the external-memory model for any block transfer size.


==References==
{{reflist}}



[[Category:Data structures]]
[[Category:Amortized data structures]]






== References ==
<!-- Inline citations added to your article will automatically display here. See en.wikipedia.org/wiki/WP:REFB for instructions on how to add citations. -->
{{reflist}}

Revision as of 22:33, 16 February 2022

In computer science, the list-labeling problem involves maintaining a totally ordered set S supporting the following operations:

  • insert(X), which inserts X into set S;
  • delete(X), which removes X from set S;
  • label(X), which returns a label assigned to X subject to:
    • label(X)
    • X,Y S, X < Y implies label(X) < label(Y)

The cost of a list labeling algorithm is the number of label (re-)assignments per insertion or deletion. List labeling algorithms have applications in many areas, including the order-maintenance problem, cache-oblivious data structures,[1] data structure persistence,[2] graph algorithms[3][4] and fault-tolerant data structures.[5]


Upper bounds

The cost of list labeling is related , the range of the labels assigned. Suppose that no more than items are stored in the list-labeling structure at any time. Four cases have been studied:

Exponential Labels

In the exponential label case, each item that is inserted can be given a label that is the average of its neighboring labels. It takes insertions before two items are at adjacent labels and there are no labels available for items in between them. When this happens, all items are relabelled evenly from the space of all labels. This incurs relabeling cost. Thus, the amortized relabeling cost in this case is .[6]

Polynomial Labels

The other cases of list labeling can be solved via balanced binary search trees. Consider , a binary search tree on S of height . We can label every node in the tree via a path label as follows: Let be the sequence of left and right edges on the root-to- path, encoded as bits. So if is in the left subtree of the root, the high-order bit of is , and if it is in the right subtree of the root, the high-order bit of is . Once we reach , we complete to a length of as follows. If is a leaf, we append s as the low order bits until has bits. If is an internal node, we append one and then s as the low order bits until has bits.

The important properties of are that: these labels are in the range ; and for two nodes with keys and in if then . To see this latter property, notice that the property is true if the least common ancestor of and is neither nor , because and will share bits until their least common ancestor. If , then because is a search tree, will be in the left subtree and will have a next bit of , whereas will be in the right subtree and will have a next bit of .

Suppose instead that, without loss of generality, the least common ancestor of and is , and that has depth . If is in the left subtree of , then and share the first bits. The remaining bits of are all 1s, whereas the remaining bits of must have a , so . If instead is in the right subtree of , then and share the first bits and the st bit of is , whereas the st bit of is . Hence .

We conclude that the function fulfills the monotonicity property of the the label() function. Thus if we can balance the binary tree to a depth of , we will have a solution to the list labeling problem for labels in the range .

Weight-balanced trees

In order to used a self-balancing binary search tree to solve the list labeling problem, we need to first define the cost function of a balancing operation on insertion or deletion to equal the number of labels that are changed, since every rebalancing operation of the tree would have to also update all path labels in the subtree rooted at the site of the rebalance. So, for example, rotating a node with a subtree of size , which can be done in constant time under usual circumstances, requires path label updates. In particular, if the node being rotated is the root then the rotation would take time linear in the size of the whole tree. With that much time the entire tree could be rebuilt. We will see below that there are self-balancing binary search tree data structures that cause an appropriate number of label updates during rebalancing.

A weight-balanced tree BB[] is defined as follows. For every in a root tree , define to be the number of nodes in the subtree rooted at . Let the left and right children of be and , respectively. A tree is -weight balanced if for every internal node in , and

The height of a BB[] tree with nodes is at most Therefore, in order to solve the list-labeling problem, we need to achieve a depth of

A scapegoat tree is a weight-balanced tree where whenever a node no longer satisfies the weight-balance condition the entire subtree rooted at that node is rebuilt. This rebalancing scheme is ideal for list labeling, since the cost of rebalancing now equals the cost of relabeling. The amortized cost of an insertion or deletion is For the list labeling problem, the cost becomes:

  • : , the cost of list labeling is amortized (Folklore, modification of Itai, Konheim and Rodeh[7].)
  • : , the cost of list labeling is amortized This bound was first achieved by Itai, Konheim, and Rodeh[7] and deamortized by Willard[8].
  • : If is a power of two, then we can set , and the cost of list labeling is . A more careful algorithm can achieve this bound even in the case where is not a power of two.


Lower bounds and open problems

In the case where , a lower bound of [9] has been established for list labeling. This lower bound applies to randomized algorithms, and so the known bounds for this case are tight.

In the case where , there is a lower bound of list labeling cost for deterministic algorithms[6]. Furthermore, the same lower bound holds for smooth algorithms, which are those whose only relabeling operation assigns labels evenly in a range of items[10] This lower bound is surprisingly strong in that it applies in the offline cases where all insertions and deletions are known ahead of time.

However, the best lower bound known for the linear case of algorithms that are allowed to be non-smooth and randomized is . Indeed, it has been an open problem since 1981 to close the gap between the upper bound and the in the linear case[7][11]


Applications

The best known applications of list labeling are the order-maintenance problem and packed-memory arrays for cache-oblivious data structures. The order-maintenance problem is that of maintaining a data structure on a linked list to answer order queries: given two items in the linked list, which is closer to the front of the list? This problem can be solved directly by polynomial list labeling in per insertion and deletion and time per query, by assigning labels that are monotone with the rank in the list. The time for insertions and deletions can be improved to constant time by combining exponential polynomial list labeling with exponential list labeling on small lists.

The packed-memory array is an array of size to hold items so that any subarray of size holds items. This can be solved directly by the case of list labeling, by using the labels as addresses in the array, as long as the solution guarantees that the space between items is . Packed-memory arrays are used in cache-oblivious data structures to store data that must be indexed and scanned. The density bounds guarantee that a scan through the data is asymptotically optimal in the external-memory model for any block transfer size.


References

  1. ^ Bender, Michael A.; Demaine, Erik D.; Farach-Colton, Martin (2005), "Cache-oblivious B-trees" (PDF), SIAM Journal on Computing, 35 (2): 341–358, doi:10.1137/S0097539701389956, MR 2191447.
  2. ^ Driscoll, James R.; Sarnak, Neil; Sleator, Daniel D.; Tarjan, Robert E. (1989), "Making data structures persistent", Journal of Computer and System Sciences, 38 (1): 86–124, doi:10.1016/0022-0000(89)90034-2, MR 0990051.
  3. ^ Eppstein, David; Galil, Zvi; Italiano, Giuseppe F.; Nissenzweig, Amnon (1997), "Sparsification—a technique for speeding up dynamic graph algorithms", Journal of the ACM, 44 (5): 669–696, doi:10.1145/265910.265914, MR 1492341.
  4. ^ Katriel, Irit; Bodlaender, Hans L. (2006), "Online topological ordering", ACM Transactions on Algorithms, 2 (3): 364–379, CiteSeerX 10.1.1.78.7933, doi:10.1145/1159892.1159896, MR 2253786.
  5. ^ Aumann, Yonatan; Bender, Michael A. (1996), "Fault tolerant data structures", Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS 1996), pp. 580–589, doi:10.1109/SFCS.1996.548517, ISBN 978-0-8186-7594-2.
  6. ^ a b Bulánek, Jan; Koucký, Michal; Saks, Michael E. (2015), "Tight Lower Bounds for the Online Labeling Problem", SIAM Journal on Computing, vol. 44, pp. 1765--1797.
  7. ^ a b c Itai, Alon; Konheim, Alan G.; Rodeh, Michael (1981), "A Sparse Table Implementation of Priority Queues", ICALP, pp. 417–431
  8. ^ Willard, Dan E. (1992), "A Density Control Algorithm for Doing Insertions and Deletions in a Sequentially Ordered File in Good Worst-Case Time", Information and Computation, vol. 97, pp. 150–204.
  9. ^ Dietz, Paul F.; Seiferas, Joel I.; Zhang, Ju (1994), "A tight lower bound for on-line monotonic list labeling", Algorithm theory—SWAT '94 (Aarhus, 1994), Lecture Notes in Computer Science, vol. 824, Berlin: Springer, pp. 131–142, doi:10.1007/3-540-58218-5_12, ISBN 978-3-540-58218-2, MR 1315312.
  10. ^ Dietz, Paul F.; Zhang, Ju (1990), "Lower bounds for monotonic list labeling", Algorithm theory—SWAT '90, pp. 173–180.
  11. ^ {citation | last1 = Saks | first1 = Michael | contribution = Online Labeling: Algorithms, Lower Bounds and Open Questions | pages = 23-28 | title = International Computer Science Symposium in Russia | year = 2018 }}.




References