Talk:Aho–Corasick algorithm

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated Start-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Start-Class article Start  This article has been rated as Start-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.

Something is wrong with the first sentence in the "Informally, the algorithm constructs a trie ..." paragraph -- some kind of copy-paste replacement error; haven't figured out exactly how to fix it yet. Looks like there isn't a clean version to go back to; the mismatched parenthetical expression hasn't changed since it was added (17:36, 27 July 2006). Anyone? dvgrn 18:46, 19 September 2006 (UTC)

  • Got rid of the obvious error, but may have introduced some subtle inaccuracy -- will have to study the algorithm some more to be sure. dvgrn 20:40, 19 September 2006 (UTC)
  • the Link that should lead to the php/javascript implementation of the algorithm is broken. anyone knows where to find it? —Preceding unsigned comment added by (talk) 20:25, 23 May 2008 (UTC)

Bad example dictionary[edit]

The example is bad from a didactical perspactive, because it doesn't contain any shared nodes/paths exept for proper prefixes, like the root/empty prefix. Shinobu (talk) 02:20, 6 October 2008 (UTC)


could somebody add details regarding when this algorithm was invented and when made public etc. Also, it would be better to add some background about the algorithm such as what kind of practical problems does it address? what are its applications etc.

Lokesh 2000 (talk) 04:42, 20 January 2010 (UTC)

"The Aho–Corasick string matching algorithm formed the basis of the original Unix command fgrep."

Not a great example. There were better algorithms for finding a single fixed string in text which fgrep could have used (eg, Knuth-Morris-Pratt). Remember that the strength of Aho-Corasick lies in finding a match for any of a large set of fixed strings in text; and this is not what fgrep does. Aho-Corasick was easily extended to implement Kleene's regular expressions and this extension is used in grep and egrep. So Aho-Corasick's use in fgrep was simply a case of ease of implementation. A virus scanner is by far the best example of the use of Aho-Corasick: in one sequential read of a file there is a set of many thousands of virus "signature" strings compared. Gdt (talk) 06:16, 10 August 2012 (UTC)

I suppose, but an important part of fgrep is that it allows for many fixed strings, likely with the -f flag. As I understand, regular grep was then extended with -f. GNU grep then, as well as I know, added the Boyer-Moore search on the longest fixed string before doing the regex match, and so also allowing for very fast fixed string matching. That removes the need for a separate fgrep. But yes, the optimal case for Aho-Corasick is a large number of pattern strings. Gah4 (talk) 22:27, 4 July 2013 (UTC)

The text should also mention the Rabin-Karp algorithm with Bloom filters, as that is the major alternative to Aho-Corasick. The two algorithms have differing strengths and weaknesses, so one can't be recommended over the other. Gdt (talk) 06:24, 10 August 2012 (UTC)

The diagram doesn't show the dictionary suffix links. I'll upload a new image that does. (talk) 12:56, 16 August 2012 (UTC)


I have made an interactive visualization of Aho-Corasick algorithm in flash. If you find it appropriate for this article, please add it to External links. I don't want to be accused of COI. The link is

Just regex?[edit]

Isn't Aho-Corasick really just a specific case that regular expression engines that generate compiled state machines already support? — Preceding unsigned comment added by (talk) 18:17, 16 June 2013 (UTC)