Jump to content

Talk:Lyndon word

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 193.227.108.10 (talk) at 13:15, 13 November 2014 (→‎Algorithm dubious?). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

This was requested! however all it really entails is a definition...perhaps a redirect? Wikipedia appears to be lacking in general on the broader subject however. Robbjedi 04:45, 18 October 2005 (UTC)[reply]

Necklace

I am not merging this into necklace (combinatorics), as Lyndon word is a standalone concept, deserving its own article. Hope nobody minds. Kompas 09:49, 3 August 2006 (UTC)[reply]

"A theorem of Radford states...."

needs a reference 173.206.238.69 (talk) 00:26, 21 May 2012 (UTC)[reply]

Why periodic strings are omitted?

What's wrong to include the strings "00", "11" etc. in the sequence? --RokerHRO (talk) 15:17, 7 July 2014 (UTC)[reply]

They are equal to their rotations, and therefore are not the unique smallest string among their rotations. Also because then the standard factorization wouldn't work. —David Eppstein (talk) 16:02, 7 July 2014 (UTC)[reply]

Algorithm dubious?

The algorithm given for computing the standard factorisation looks to be quadratic rather than linear as stated. If the input is 'bbbb...bbba' (n 'b's, 1 'a') then the algorithm takes O(n) time to find the first Lyndon word in the factorisation is 'b', and then it proceeds to factorise 'bbbb...bbba' (with n-1 'b's).

Also, I think the algorithm's finishing condition is faulty, unless it assumes the string ends with a terminating character that's smaller than every other character. As written, 'aa' will factorise as ['aa'] rather than ['a', 'a']. 125.30.89.127 (talk) 08:59, 3 November 2014 (UTC)[reply]

Plus one to faulty finishing condition - from what I gathered one should iterate until the string is empty, and treat m reaching the end of the string as also going to step 3.3.

Also (but that could just be me) I could only understand the algorithm as described after I understood the algorithm already, it isn't clear that it starts with the assumption that a single letter initial Lyndon word is the best it can do and tries proving (3.3) or disproving said assumption by looking at further letters, each time learning a new length assumption for the initial Lyndon word when the previous assumption is proven false (3.2), where equality contributes "nothing".

I also agree with the quadratic runtime assessment, which is most likely due to case 3.3 discarding all accumulated knowledge, though I wouldn't yet know how to do better. 193.227.108.10 (talk) 13:11, 12 November 2014 (UTC)[reply]

Maybe something closer to pseudocode would be clearer and less bug-prone than this textual description? I have an implementation in http://www.ics.uci.edu/~eppstein/PADS/Lyndon.py (see in particular the ChenFoxLyndonBreakpoints function) that is probably too specific to Python to use directly but might make an appropriate starting point. —David Eppstein (talk) 16:10, 12 November 2014 (UTC)[reply]
The implementation looks fine (for obvious reasons quite similar to what I came up with, see below, it returns the words directly instead of breakpoints), though I believe it deserves some (textual) explanation as to why one is doing that: In essence it starts by assuming a single symbol Lyndon word is the longest possible in the beginning and tries to prove or disprove that by looking further, and each time it disproves the hypothesis it did in fact find a longer Lyndon word, which becomes the new assumption. If I've some more time I'll try to make this a tad more readable...
Making the runtime linear instead of quadratic for a full factorization then results from additional observations in the case the hypothesis is proven (namely that there may be repetitions of what we just reported), this might make it easier to grasp, also a more space-efficient version (avoiding appending the string to itself, and reporting the minimum rotation needed to make it minimal) for finding the minimum cyclic permutation does become apparent then.
 def lyndon_factorize(S):
     start, left, right = 0, 0, 1
     while start < len(S):
         if right == len(S) or S[left] > S[right]:
             length = right - left
             while start + length <= right:
                 yield S[start:start+length]
                 start += length
             left, right = start, start + 1
         elif S[left] == S[right]:
             left, right = left + 1, right + 1
         else:
             left, right = start, right + 1

193.227.108.10 (talk) 13:15, 13 November 2014 (UTC)[reply]

The numbers of binary Lyndon words of each length

(For starters, English is not my primary language..) My first understanding of this was:

Every binary Lyndon word of each length has its own number.

Probably just my fault, but wouldnt the word amounts be a tiny bit more exact? Although it doesnt sound that good. Or could someone think of another way to say it?

Anyway, if anyone else had trouble understanding, the sequence 1, 2, 1, ... means:

Actually, "amounts" would be a word you'd use for a continuous quantity (a real number like an amount of water) while number makes more sense for discrete integer values (a number of glasses of water). —David Eppstein (talk) 18:32, 5 November 2014 (UTC)[reply]