Jump to content

LCP array: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m testing doi
m add citations
Line 89: Line 89:
*{{cite journal|last1=Kärkkäinen|first1=Juha|last2=Sanders|first2=Peter|last3=Burkhardt|first3=Stefan|title=Linear work suffix array construction|journal=Journal of the ACM|volume=53|issue=6|year=2006|pages=918–936|issn=00045411|doi=10.1145/1217856.1217858}}
*{{cite journal|last1=Kärkkäinen|first1=Juha|last2=Sanders|first2=Peter|last3=Burkhardt|first3=Stefan|title=Linear work suffix array construction|journal=Journal of the ACM|volume=53|issue=6|year=2006|pages=918–936|issn=00045411|doi=10.1145/1217856.1217858}}
* {{cite arXiv | eprint=1101.3448v1 }}
* {{cite arXiv | eprint=1101.3448v1 }}
*{{Citation
*{{cite doi | 10.1.1.6.3811}}
| title = Two space saving tricks for linear time LCP computation
| year = 2004
| author = Manzini, Giovanni
| journal = Proc. SWAT. Volume 3111 of Lecture Notes in Computer Science
| pages = 372–383
| volume = 3111
}}
*{{cite doi | 10.1007/978-3-642-02441-2_17}}
*{{cite doi | 10.1007/978-3-540-92182-0_14}}
*{{Citation
| title = Fast and Lightweight LCP-Array Construction Algorithms
| url = http://www.siam.org/proceedings/alenex/2011/alx11_03_gogs.pdf
| year = 2011
| journal = Proceedings of the Workshop on Algorithm Engineering and Experiments, ALENEX 2011
| pages = 25–34
| last1 = Gog | first1 = Simon
| last2 = Ohlebusch | first2 = Enno
| accessdate = 2012-08-28 }}
*{{cite doi | 10.1109/DCC.2009.42}}

Revision as of 11:19, 28 August 2012

In computer science, the longest common prefix array (LCP array) is an auxiliary data structure to the suffix array, which stores the length of the longest common prefix between every pair of lexicographically consecutive suffixes in the suffix array.

Augmenting the suffix array with the LCP array allows to efficiently simulate top-down and bottom-up traversals of the suffix tree (Kasai et al. 2001)[1], speeds up pattern matching on the suffix array[2] and is a prerequisite for compressed suffix trees(Ohlebusch, Fischer, and Gog 2010).

History

The LCP array was introduced by Manber and Gene Myers alongside the suffix array in order to improve the running time of their string search algorithm.[2]

Definition

Let SA be the suffix array of the text T= and let lcp(v,w) denote the length of the longest common prefix between two strings v and w. Let further denote T[i,j] the substring of T ranging from i to j.

Then the LCP array H[1,n] is an integer array of size n such that H[1] is undefined and H[i]=lcp(T[SA[i-1],n],T[SA[i],n]) for every . Thus H[i] stores the length of longest common prefix of the lexicographic (i-1)'st and ith smallest suffix of T.

Example

Efficient Construction Algorithms

LCP array construction algorithms can be divided into two different categories:

  • algorithms that compute the LCP array as a byproduct to the suffix array (Manber and Myers 1993)(Kärkkäinen and Sanders 2003)(Fischer 2011)
  • algorithms that compute the LCP array from the already constructed suffix array (Kasai et al. 2001)(Manzini 2004)(Kärkkäinen, Manzini, and Puglisi 2009)(Puglisi and Turpin 2008)(Gog and Ohlebusch 2011)

Currently, the fastest linear-time LCP array construction algorithm is due to (Fischer 2011), which in turn is based on one of the fastest suffix array construction algorithms (Nong, Zhang, and Chan 2009)

Applications

Suffix Tree Construction

Simulating top-down traversal

Simulating top-down traversal

Succinct/Compact representation

Notes

  1. ^ Abouelhoda & Ohlebusch (2004)
  2. ^ a b Manber & Myers (1993)

References

  • Abouelhoda, Mohamed Ibrahim; Ohlebusch, Enno; Ohlebusch, Enno (2004), "Replacing suffix trees with enhanced suffix arrays", J. of Discrete Algorithms, 2 (1): 53–86, doi:10.1016/S1570-8667(03)00065-0, retrieved 2012-08-28
  • Manber, U.; Myers, G. (1993), </div "Suffix Arrays: A New Method for On-Line String Searches", SIAM Journal on Computing, 22 (5): 935–948, doi:10.1137/0222058, retrieved 2012-08-28
  • Kasai, Toru; Lee, Gunho; Arimura, Hiroki; Arikawa, Setsuo; Park, Kunsoo} (2001), "Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications", Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching: 181–192, ISBN 3-540-42271-4, retrieved 2012-08-28
  • Ohlebusch, Enno; Fischer, Johannes; Gog, Simon (2010), "CST++", String Processing and Information Retrieval - 17th International Symposium, SPIRE 2010, Los Cabos, Mexico, October 11-13, 2010. Proceedings, 6393: 322–333, doi:http://dx.doi.org/10.1007/978-3-642-16321-0_34 {{citation}}: Check |doi= value (help); External link in |doi= (help)
  • Kärkkäinen, Juha; Sanders, Peter; Burkhardt, Stefan (2006). "Linear work suffix array construction". Journal of the ACM. 53 (6): 918–936. doi:10.1145/1217856.1217858. ISSN 0004-5411.
  • A bot will complete this citation soon. Click here to jump the queue arXiv:1101.3448v1.
  • Manzini, Giovanni (2004), "Two space saving tricks for linear time LCP computation", Proc. SWAT. Volume 3111 of Lecture Notes in Computer Science, 3111: 372–383
  • Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/978-3-642-02441-2_17, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/978-3-642-02441-2_17 instead.
  • Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1007/978-3-540-92182-0_14, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1007/978-3-540-92182-0_14 instead.
  • Gog, Simon; Ohlebusch, Enno (2011), "Fast and Lightweight LCP-Array Construction Algorithms" (PDF), Proceedings of the Workshop on Algorithm Engineering and Experiments, ALENEX 2011: 25–34, retrieved 2012-08-28
  • Attention: This template ({{cite doi}}) is deprecated. To cite the publication identified by doi: 10.1109/DCC.2009.42, please use {{cite journal}} (if it was published in a bona fide academic journal, otherwise {{cite report}} with |doi= 10.1109/DCC.2009.42 instead.