Talk:Edit distance

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computer science (Rated Mid-importance)
WikiProject icon This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 ???  This article has not yet received a rating on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.
 
WikiProject Computing (Rated Stub-class)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Stub-Class article Stub  This article has been rated as Stub-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
Note icon
This article has been automatically rated by a bot or other tool as Stub-Class because it uses a stub template. Please ensure the assessment is correct before removing the |auto= parameter.

Name of article[edit]

As far as I know, the name 'Edit distance traditionally means Levenshtein distance in Computer Science (see [1]). The list presented here also has overlap with string metrics. Maybe that page could be organised into categories instead of starting this page? Klem fra Nils Grimsmo 20:04, 19 March 2007 (UTC)

You're rignt. I fixed it. Touchatou (talk) 16:19, 31 October 2011 (UTC)
I think it would be even better to merge some of the many articles here that describe Levenshtein distance or varieties. There are many! Rp (talk) 09:57, 1 November 2011 (UTC)

Proposed merge with Levenshtein distance[edit]

... as well as while moving the algorithmic parts to Wagner-Fischer algorithm. This page duplicates the information on those two pages.

In the literature about this problem, edit distance is often synonymous with Levenshtein distance, although sometimes it is extended, though seldom very dramatically. See e.g. Gonzalo Navarro's Guided tour to approximate string matching, references on edit distance.

Furthermore, this page spends a lot of time discussing the W-F algorithm, even though many applications of edit distance do not use that algorithm at all, circumventing the explicit computation of pairwise distances as it is simply too expensive. Therefore, this article does not reflect the state of the art in computer science and the discussion of the algorithm should really be separated from the algorithm to compute it. (A short discussion of the W-F algorithm is featured on edit distance, of course, as it's historically important.) QVVERTYVS (hm?) 08:55, 17 January 2014 (UTC)

Please merge the three! I'm all for it! Rp (talk) 09:10, 17 January 2014 (UTC)
To clarify: I did not mean to merge all three into a single article. I want the material on Levenshtein distance to be split into edit distance and Wagner-Fischer algorithm, so that we have one overview of the concept and its applications, and one on the basic algorithm that computes it. QVVERTYVS (hm?) 09:24, 17 January 2014 (UTC)
Maybe a hatnote would be appropriate, along the lines of "This article is about the distance metric; for details of its computation, see Wagner-Fischer algorithm" (?) QVVERTYVS (hm?) 09:27, 17 January 2014 (UTC)
I think prior to any restructuring it would pay off to hunt for all articles explaining the concept of edit distance or an algorithm to compute it (I'm sure Levenshtein_distance#See_also isn't complete), then design a new structure, and most of all, a way to make that structure immediately clear to readers, so they won't keep adding the same information in different places, or after removal, over and over and over. I stopped editing these articles after going through the Levenshtein distance page history; there is a clear lack of direction. Rp (talk) 09:38, 17 January 2014 (UTC)
One problem is that there is a whole space of closely related algorithms and distances, so it's not very clear how to chop it up into articles. Rp (talk) 09:38, 17 January 2014 (UTC)
I've decided to use Navarro as my guide here, since his is one of the best cited overviews of the field. (And because I don't have a copy of Sankoff and Kruskal's book, unfortunately, but that's also quite old.) Following that, rather than random web resources, should give us a common vocabulary. QVVERTYVS (hm?) 10:54, 17 January 2014 (UTC)

merge - Putting it all in edit distance seems sensible. You could argue that the section on Levenshtein distance would be so large as to warrant its own article, but, for me, the benefits in terms of reduced repetition outweigh this argument in this case. Yaris678 (talk) 10:09, 17 January 2014 (UTC)

No merge - Levenshtein distance is one of many edit distances. It happens to occupy an overly large percentage of peoples' mind-share, but that is because they don't know much about the other kinds of edit distances. What is really needed is to beef up the edit distance article to cover a lot more of the material. At the moment the edit distance article reads like it was written by somebody who mostly knows about Levenshtein distance, hence the desire to merge. Derek farn (talk) 13:31, 17 January 2014 (UTC)

I welcome any suggestions as to how to "beef up the article". I've tried to generalize beyond Levenshtein distance + transposition, but I certainly did not read up on all the literature that discusses substring swaps, learning edit weights and q-gram distances, so go ahead. QVVERTYVS (hm?) 22:42, 17 January 2014 (UTC)
  • Oppose merge. Edit distance is the general idea; it need not go into detail about the specific definitions and algorithms to compute it. The LD article could also have more detail about optimizations -- material that would be out of place in the general SD article. Glrx (talk) 00:56, 19 January 2014 (UTC)
I want to separate the algorithmic content from the distance metric as such, and generalize the notion of Levenshtein distance in a single article edit distance. Optimizations and variants for specific distance metrics can be discussed in the page Wagner-Fischer algorithm, where IMHO they belong more properly than on a page about a distance metric that is embodied in many more algorithms than just its pairwise DP computation. QVVERTYVS (hm?) 12:18, 19 January 2014 (UTC)

No merge Just remove/move the stuff thats specifically about Levenshtein distance from the edit distance page. There will be enough left for a general article with the properties and discussion of different approaches. --Ray andrew (talk) 00:23, 26 February 2014 (UTC)

Merge, but even more urgently: merge with string metric, which covers exactly the same topic. The discussion could be widened to include block edit distances. 131.155.69.109 (talk) 10:14, 23 May 2014 (UTC)

String metric is even more problematic because it lumps together actual string metrics and similarity functions on sets and histograms. QVVERTYVS (hm?) 10:32, 23 May 2014 (UTC)