Rope (data structure): Difference between revisions

Content deleted Content added

Inline

Revision as of 22:19, 6 May 2011

In computer programming, a rope (also known as cord) is a Data structure for storing a heavy weight string, in other word, a single very long string. It is used for a common problem that occurs when developing, for example, a text editor is how to represent a very long string (i.e. the text file) so that operations on the string (insertions, deletions, jumping to a particular point, i.e. random access) can be done efficiently. ^[1]

Description

Rope is a binary tree in which each node v has a weight (v). All leaves and some one-child internode contains an array storing a light weight string (a short string) and each of those nodes has a weight(v) equals to the length of the short string. For the other nodes, the weight is the sum of weights in its left subtree. So, the two-child nodes in Rope can be considered as dividing the whole string into two parts: the left subtree represent the first part of string and the right subtree represent the second part of string. And the weight can be considered as the length of the substring in left part. We can set the weight for each node by doing an in-order traversal and set the weightFile:Formula for rope.jpg

From another view, the tree of Rope can be seen as several levels of node. The bottom level contains all the nodes which contain an extra field string (v) and when the level become higher, the number of nodes become less, and finally, there is just one node in the highest level. So, we can build the Rope by putting all the nodes with light-weight string in the bottom level, then create the second least level by randomly pick half nodes from bottom level to create new node as the parents. for the left half nodes in bottom level who has no parents, they become the right child of the nodes located in its left. In this way, we build the higher level, finally there would be only one node in some level, which means this node is the root.

Operations

Search

Definition : Search(i): finding the character at position i

To execute the query for i-th character which must be in a one-child internode or a leaf, we begin our search at the root. When the search reaches some node u, there are four cases:

1) If weight (u) <i, go to the right child of u and change i to value of i-weight(u), which means, search for the character in the position i-weight(u) in right part of string which is divided by node u;

2) If weight (u)>=i and u is a two-child node, go to the left child.

3) If weight (u)>=i and u has no left child, the position i in the array of node u is what we are searching for.

For example, we are looking for the tenth character in following rope, we start at root with the weight of 22,because 22>10, go to leftchild , because 9<10, then go to rightchild and looking for 10-9=1, then because 7>1 so go to leftchild, then because 6>1 then go leftchild, then compare 2 >1 , now it may go to left child but this node has no leftchild which mean we reached bottom level, so go to the array of that node and get the first character which is exactly what we are looking for.

Split

Defination: Split (i, S): split the string S into two new strings S1 and S2, S1 = C1,…Ci and S2 = Ci+1, …, Cm.

There are two cases, firstly, i-th character is the end of an array like the following picture; secondly, if the character is in the middle of an array, first we split the node into two node with each part of array and make the second node as the right child of first node, and then this case become the same situation as first case. we can explain this operation by the following example.

For example, we want to split the following rope into two parts. First we query the i-th character to locate the node v at the bottom level. Then we cut down the link between v and right(v) – v’. Then go up to the parent u and minus weight(u) by weight of v’. Since the parent has right(u) – u’ , then modify u’ to link to v’ and add the weight of u’ by weight of its new left child right v’. And the former left child of u’ become the right child of v’. The result is shown in the following second picture. In this way, we continue go up and reach the parent(u) – w. First minus the weight(w) by weight of v’. Then modify right(w) - w’ to link to u’ and the former child of w’ become the right child of u’ . Then add weight w’ by weight of v’. Then go up and reach parent(w) – x, because w is already the right child of x, so no need to do other modification. Then go up and reach paren(x) – y, minus x(weight) by weight(w’).Clearly, the cost is O(logN) in expectation.

Concrate

Definetion: Concrete(S1, S2): concrete two rope S1, S2 into one single rope.

This operation can be considered as the reversion of merge. The time complexity is also O(log N).

Insert

Definetion: Insert(i, S’): insert the string S’ beginning at position i in the string s, to form a new string C1,….,Ci, S’, Ci+1,…, Cm.

This operation can be done by a split() and a concrete(). First split the rope at the i-th character, the add a new node v with string(v) = S’ to the right child of the rightmost node of the first rope. Then update the weight of nodes in the path from the new node to root. Finally concrete the two ropes. Because the split() and concrete() both cost O(logN) time, the time complexity of this operation is also O(logN).

Delete

Defination: Delete(i, j): delete the substring Ci, …, Ci+j-1, from s to form a new string C1, …, Ci-1, Ci+j, …, Cm.

This operation can be done by two split() and a concrete(). First, split the rope into three ropes divided by i-th and j-th character respectively,that is, first split S into S1 and S2 at i-th charactor, then split S1 into S3 and S4 at (j-i)-th charactor, then concrete the S1 and S2. Because the split() and concrete() both cost O(logN) time, the time complexity of this operation is also O(logN).

Report

Defination: Report(i, j): output the string Ci, …, Ci+j-1.

To report the string Ci, …, Ci+j-1, we first search for the node u that contains ci and weight(u) >=j, and then traverse T starting at node u. we can then output Ci, …, Ci+j-1 in O(j+logN) expected time by doing an in-order traversal of T starting at node u.

Advantages and Disadvantages

The main advantages of ropes as compared to storing strings as character arrays is that they enable much faster insertion and deletion than ordinary strings and don't use extra memory while array need to use extra O(N)memory to do copy, and don't require a large contiguous memory space to store a string. The main disadvantages are a little greater overall space usage to store the nodes without string and the split and concrete cost a little more time, although there are all the cost for a much faster insertion and deletion.

Comparison to array-based strings

This table compares the algorithmic characteristics of string and rope implementations, not their "raw speed". Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster. But Array-based strings are used on stable and small data, the insertion and deletion is unacceptable not only because the time complexity is too large but it cost extra memory to rebuilt new strings for each change about the data. However, rope does good on each operations and support dynamic data very well. Moreover, the space complexity for rope and array are all O(n). In summary, rope is a better algorithm for large and dynamic data.

Operation	Rope performance	String performance
search	O(log n)	O(1)
split	O(log n)	O(1)
concrete	O(log n)	O(1)
insert	O(log n)	O(n)
delete	O(log n)	O(n)
report	O(log n)	O(1)
build	O(n)	O(n)

^{[citation needed]}

References

^ ^a ^b Boehm, Hans-J (December 1995). "Ropes: an Alternative to Strings" (PDF). Software—Practice & Experience. 25 (12). New York, NY, USA: John Wiley & Sons, Inc.: 1315–1330. doi:10.1002/spe.4380251203. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

External links

This algorithms or data structures-related article is a stub. You can help Wikipedia by expanding it.

[Boehm-1] Boehm, Hans-J (December 1995). "Ropes: an Alternative to Strings" (PDF). Software—Practice & Experience. 25 (12). New York, NY, USA: John Wiley & Sons, Inc.: 1315–1330. doi:10.1002/spe.4380251203. {{cite journal}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)

[1]

@@ Line 41: / Line 41: @@
 </br>
-For example, we are looking for the tenth character in following rope, we start at root with the weight of 22,because 22>10, go to leftchild , because 9<10, then go to rightchild and looking for 10-9=1, then because 7>1 so go to leftchild, then becauese 6>1 then go leftchild,  then compare 2 >1 , now it may go to left child but this node has no leftchild which mean we reached bottom level, so go to the array of that node and get the first character which is exactly what we are looking for.
+For example, we are looking for the tenth character in following rope, we start at root with the weight of 22,because 22>10, go to leftchild , because 9<10, then go to rightchild and looking for 10-9=1, then because 7>1 so go to leftchild, then because 6>1 then go leftchild,  then compare 2 >1 , now it may go to left child but this node has no leftchild which mean we reached bottom level, so go to the array of that node and get the first character which is exactly what we are looking for.
 <center>[[Image:Search_rope.jpg|400px|A small complete binary tree stored in an array]]</center>