Rope (computer science)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
A simple rope built on the string of "Hello_my_name_is_Simon".

In computer programming a rope, or cord, is a data structure for efficiently storing and manipulating a very long string. For example, a text editing program may use a rope to represent the text being edited, so that operations such as insertion, deletion, and random access can be done efficiently.[1]

Contents

[edit] Description

A rope is a binary tree in which each node has a weight. Leaf nodes (as well as some single-child internal nodes) also contain a short string. The weight of a node is equal to the length of its string plus the sum of all the weights in its left subtree. Thus a node with two children divides the whole string into two parts: the left subtree stores the first part of the string, the right subtree stores the second part, and the weight is the length of the first part.

Seen from another perspective, the binary tree of a rope can be seen as several levels of nodes. The bottom level contains all the nodes that have a short string. Higher levels have fewer and fewer nodes, until finally there is just one root node in the top level. We can build the rope by putting all the nodes with short strings in the bottom level, then randomly picking half those nodes in order to form parent nodes in the next level. Nodes with no parent (for example, the two nodes storing the strings "my_" and "me_i" in the diagram above) become the right child of the node located immediately to their left, thus forming a chain. In this way, we can build higher levels one level at a time. We stop when there is only one node remaining.

[edit] Operations

[edit] Search

Definition: Search(i): return the character at position i
Time complexity: O(log N) where N is the length of the rope

To retrieve the i-th character, we begin a recursive search from the root node:

// Note: Assumes 1-based indexing.
function search(RopeNode node, integer i)
    if node.weight < i then
        return search(node.right, i - node.weight)
    else
        if exists(node.left) then
            return search(node.left, i)
        else
            return node.string[i]
        endif
    endif
end

For example, suppose we are looking for the character at i=10 in the following rope. We start at the root node, find that 22 is greater than 10 and there's a left child, so we go to the left child. Next we find that 9 is less than 10, so we subtract 9 from 10 (leaving i=1) and go to the right child. Then because 7 is greater than 1 and there's a left child, we go to the left child. There we find that 6 is greater than 1 and there's a left child, so we go to the left child again. Finally we see that 2 is greater than 1 but there is no left child, so we take the character at index 1 of the short string "na", returning "n" as the final answer.

Search rope.jpg

[edit] Split

Definition: Split (i, S): split the string S into two new strings S1 and S2, S1 = C1,…Ci and S2 = Ci+1, …, Cm.

There are two cases: in the first, the i-th character is the end of an array like the following picture; in the second, the character is in the middle of an array. We can reduce the second case to the first case as follows: first we split the node at the character into two nodes each with part of the array and make the second node as the right child of first node. We handle the first case as explained in the following example.

For example, we want to split the following rope into two parts. First we query the i-th character to locate the node v at the bottom level. Then we cut down the link between v and the right child of v, which we'll call v’. Then go up to the parent u and subtract the weight of v’ from the weight of u. Since the parent has the right child of u, which we'll call u’, modify u’ to link to v’ and increase the weight of u’ by the weight of its new left child v’. The former left child of u’ become the right child of v’. The result is shown in the second picture below. In this way, we continue up and reach the parent of u, which we'll call w. First subtract the weight of v’ from the weight of w. Then modify the right child of w, which we'll call w’, to link to u’ and the former child of w’ becomes the right child of u’. Then increase the weight of w’ by the weight of v’. Then go up and reach the parent of w, which we'll call x. Since w is already the right child of x, there is no need for further modification. Then go up and reach the parent of x, which we'll call y, and reduce the weight of x by the weight of w’.Clearly, the expected cost is O(log N).

original
step1
step2

[edit] Concat

Definition: Concat(S1, S2): concatenate two rope S1, S2 into one single rope.

This operation can be considered as the reversion of split. The time complexity is also O(log N). Alternatively, create a new root node with left=S1 and right=S2. This is constant time, but can lead to an unbalanced tree.

[edit] Insert

Definition: Insert(i, S’): insert the string S’ beginning at position i in the string s, to form a new string C1,….,Ci, S’, Ci+1,…, Cm.

This operation can be done by a split() and a concat(). First split the rope at the i-th character, the add a new node v with string(v) = S’ to the right child of the rightmost node of the first rope. Then update the weight of nodes in the path from the new node to root. Finally concatenate the two ropes. Because the split() and concat() both cost O(logN) time, the time complexity of this operation is also O(logN).

[edit] Delete

Definition: Delete(i, j): delete the substring Ci, …, Ci+j-1, from s to form a new string C1, …, Ci-1, Ci+j, …, Cm.

This operation can be done by two split() and a concat(). First, split the rope into three ropes divided by i-th and j-th character respectively,that is, first split S into S1 and S2 at i-th character, then split S1 into S3 and S4 at (j-i)-th character, then concatenate the S1 and S2. Because the split() and concat() both cost O(logN) time, the time complexity of this operation is also O(logN).

[edit] Report

Definition: Report(i, j): output the string Ci, …, Ci+j-1.

To report the string Ci, …, Ci+j-1, we first search for the node u that contains ci and weight(u) >=j, and then traverse T starting at node u. we can then output Ci, …, Ci+j-1 in O(j+logN) expected time by doing an in-order traversal of T starting at node u.

[edit] Advantages and Disadvantages

Advantages:

  • Ropes enable much faster insertion and deletion of text than ordinary strings.
  • Ropes don't need the extra O(n) memory that arrays need for copying operations, and ropes don't require a large contiguous memory space to store a string.

The main disadvantages are a little greater overall space usage (mainly to store the nodes that don't have strings) and the consequent increase in time that such extra storage requires.

[edit] Comparison to array-based strings

This table compares the algorithmic characteristics of string and rope implementations, not their "raw speed". Array-based strings have smaller overhead, so (for example) concatenation and split operations are faster on small datasets. However, when array-based strings are used for large sets of data, time complexity and memory usage for insertion and deletion of characters becomes unacceptably large. A rope data structure, on the other hand, has a stable performance regardless of data size. Moreover, the space complexity for ropes and arrays are both O(n). In summary, ropes are better suited when the data is large and frequently modified.

Performance
Operation Rope String
search O(log n) O(1)
split O(log n) O(1)
concatenate O(log n) O(n)
insert O(log n) O(n)
delete O(log n) O(n)
report O(log n) O(1)
build O(n) O(n)

[citation needed]

[edit] See also

  • The Cedar programming environment, which used ropes "almost since its inception"[1]
  • The Model T enfilade, a similar data structure from the early 1970s.
  • Gap buffer, a data structure commonly used in text editors that allows efficient insertion and deletion operations clustered near the same location

[edit] References

  1. ^ a b Boehm, Hans-J; Atkinson, Russ; and Plass, Michael (December 1995). "Ropes: an Alternative to Strings" (PDF). Software—Practice & Experience (New York, NY, USA: John Wiley & Sons, Inc.) 25 (12): 1315–1330. doi:10.1002/spe.4380251203. http://citeseer.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf. 

[edit] External links

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages