Levenshtein Distance
 Summary

Discussion
 Could you explain Levenshtein Distance with an example?
 What are "weights" in the context of edit distance?
 Which are the algorithms that can compute the Levenshtein Distance?
 What do you mean by Normalized Levenshtein Distance?
 How is Levenshtein Distance related to the triangle inequality?
 Are there any variants of Levenshtein Distance?
 What are some alternatives to Levenshtein Distance?
 What are some applications of Levenshtein Distance?
 What software tools are available for calculating Levenshtein Distance?
 Milestones
 Sample Code
 References
 Further Reading
 Article Stats
 Cite As
Given two words, we can ask how similar are the two words. We can also ask this question of two sentences or string sequences. To quantify the similarity, we need a measure. Levenshtein Distance is such a measure.
By means of simple operations (such as insertion, deletion and substitution), we can determine how to transform one word or sequence into the other word or sequence. There could be many ways to achieve this. Levenshtein Distance is defined as the minimum number of operations required to make the two inputs equal. Lower the number, the more similar are the two inputs that are being compared.^{}
There are a few algorithms to solve this distance problem. Levenshtein Distance is useful in many application domains including signal processing, natural language processing and computational biology.^{}
Levenshtein Distance is also called edit distance.^{}
Discussion

Could you explain Levenshtein Distance with an example? Consider the two words "rain" and "shine". To transform "rain" into "shine", we can replace 'r' with 's', replace 'a' with 'h' and insert 'e'. Thus, the edit distance between these two words is 3. This is assuming that all operations have the same cost of 1.^{} If we assign a higher cost to substitutions, say 2, then the edit distance becomes 2*2 + 1 = 5. To transform "shine" to "rain", the operations are reversed (insertions become deletions) but the edit distance is the same when costs are symmetric.
Consider the words "train" and "shine". If each letter is replaced, we need 5 operations but this not the minimum. We can do better by aligning "in", doing one insertion, two substitutions and one deletion. Edit distance is therefore 4.
A lower distance implies greater similarity between the two words. In NLP, we generally wish to minimise the distance. In computational biology, we wish to maximize similarity.^{} In error correcting codes, we wish to maximize the distance so that one codeword is not easily confused with another.^{}

What are "weights" in the context of edit distance? What we call "weights" in NLP are called "scores" in computational biology.^{} In the simplest case, we can give the same weights to insertions, deletions and substitutions, regardless of the symbol.
In reality, some errors are more likely that others. Consider how keys are laid out on a standard computer keyboard. It's easy to mistype an adjacent key. Some spelling errors are more likely than others. Likewise, in optical character recognition, it's possible to easily misinterpret between 'o' and 'e', or 'm' and 'n'. Likewise, in a DNA sequence some deletions or insertions are more likely than others. Weights help us account for these varying probabilities.^{}
Mathematically, given two words of length N and M, D(N,M) is the distance. Words are 1indexed but D(i,j) is 0indexed. Accounting for the weights, edit distance can be computed this way:^{}
Initialization
$$D(0,0) = 0\\D(i,0) = D(i1,0) + del[x(i)]; 1 < i ≤ N\\D(0,j) = D(0,j1) + ins[y(j)]; 1 < j ≤ M$$
Recurrence Relation
$$D(i,j) = min\begin{cases}{D(i1,j) + del[x(i)]\\D(i,j1) + ins[y(j)]\\D(i1,j1) + sub[x(i),y(j)]}\end{cases}$$

Which are the algorithms that can compute the Levenshtein Distance? There are many algorithms to compute the edit distance. Many of them are classified as dynamic programming algorithms.^{} One of them is the WagnerFischer algorithm that we describe here.
The idea is to make a matrix of edit distances between all prefixes of one string and all prefixes of the other string. Levenshtein Distance is calculated by flood filling, that is, a path connecting cells of least edit distances.^{} The approach is to start from upper left corner and move to the lower right corner. Moving horizontally implies insertion, vertically implies deletion, and diagonally implies substitution. Each cell minimizes the cost locally.^{}
Consider the example of transforming "levenshtein" to "meilenstein" with equal weights of 1. There are actually two solutions, both having edit distance of 4. In one solution, we insert 'i' and replace 'v' with 'l': horizontal, then diagonal. In the other solution, we replace 'v' with 'i' and the insert 'l': diagonal, then horizonal.^{}
If we wish to know the sequence of operations, we need to keep track of path, thus requiring more memory.^{}

What do you mean by Normalized Levenshtein Distance? Consider two strings of same length 3 with edit distance of 2. Consider another two strings of same length 9 with edit distance of 3. We may say that the latter pair is more similar. To quantify the similarity, we normalize the edit distance.^{}
One approach is to calculate edit distance as usual and then divide it by the number of operations, usually called the length of the edit path. This is called edit distance with postnormalization. The problem with this approach is that it may not produce the minimum normalized edit distance using WagnerFischer algorithm. Suppose weights are unequal for insertions/deletions and substitutions. It's possible to have a longer edit path that results in a lower normalized distance. Normalized Edit Distance (NED) algorithm finds this minimum.^{}
For example, assume additions/deletions have a weight of 2, substitutions have a weight of 3. Consider the pair of words (abbb, aaab). When we normalize after WagnerFischer algorithm, we get 1.5 but with NED we get 1.33. It's also been observed that NED in most cases obeys the triangle inequality.^{}

How is Levenshtein Distance related to the triangle inequality? In mathematics, edit distance can be seen as a metric in a metric space. In other words, the problem can be interpreted geometrically. The similarity between two words can be seen as the geometric distance between two points in the metric space. Such a metric obeys the triangle inequality. Given distance d, \(d(x,y) + d(y,z) \geq d(x,z)\). A composition of two edits can't be smaller than a direct edit.^{}
Let's consider the pair (rcik, rick) whose edit distance is 2. Another pair (rick, irkc) has edit distance 3. Now consider a direct edit involving (rcik, irkc), which is 4. This direct edit is less than the sum of the other two edits. Levenshtein Distance is indeed a metric and it obeys the triangle inequality.^{}
Wagner and Fischer formally proved this. In fact, their algorithm to compute the edit distance relies on this.^{} The triangle inequality is satisfied due to the constraint that no position in a string is transformed more than once. This implies that all operations can be applied in parallel.^{}

Are there any variants of Levenshtein Distance? Based on the work of Fred Damerau (1964) and V.I. Levenshtein (1965) Damerau–Levenshtein Distance is sometimes used instead of the classical edit distance. With Damerau–Levenshtein Distance, transpositions are also allowed where two adjacent symbols can be swapped.^{} Consider the pair (rcik, irkc). This has an edit distance of 4, due to 4 substitutions. But if transpositions are allowed then the Damerau–Levenshtein distance is 3:
rcik > rick > irck > irkc
.^{}For automatic speech recognition, Levenshtein Distance is calculated on words rather than characters. It's also been extended to three or more dimensions.^{}
Levenshtein Distance has been adapted to Chinese language to extract personaffiliation relations. Words "爱(love)" and "喜欢(like)" are synonyms and should be seen as similar. Likewise, "the apples" and "the sweet apples" are not too different. We can use weights to account for these.^{}

What are some alternatives to Levenshtein Distance? The choice of a suitable similarity metric depends on the application. If we're comparing two sets where order is irrelevant, then Jaccard Distance can be used.^{} A similar measure to Jaccard Distance is Sørensen–Dice coefficient.^{}
Longest Common Subsequence (LCS) allows only insertion and deletion. Jaro Distance allows only transposition. Hamming Distance allows only substitution and thus can be used to compare only strings of same length.^{} Episode distance is a measure that allows only insertions.^{}
If strings are treated as vectors, Cosine Similarity can be used. This is based on the angle between the two vectors.^{}

What are some applications of Levenshtein Distance? Computing the Levenshtein Distance has also been called the stringtostring correction problem. It's relevant to string matching problems occurring in various domains such as information retrieval, pattern recognition, error correction, and molecular genetics.^{}
In NLP, Levenshtein Distance is useful for suggesting spelling corrections, detecting plagiarism, and aiding translators in translation memory systems. For example, if a word is misspelt as "ligting", Levenshtein Distance can suggest that "lighting" is most similar, which helps the writer correct the spelling.^{} It's also been used to cluster words that share a root word.^{}
Levenshtein Distance in fact started in signal processing, in particular, to see how errors in communications systems can be corrected. Another application is speech recognition. A perfect match of an audio signal is impossible and edit distance can find the most suitable match.^{}
DNA is a sequence of letters such as A, C, G, T. Searching for specific sequences is often difficult due to measurement errors, mutations or evolutionary alterations. Thus, similarity of two sequences using Levenshtein Distance is more useful than exact matches.^{}

What software tools are available for calculating Levenshtein Distance? In Python's NLTK package, we can compute Levenshtein Distance between two strings using
nltk.edit_distance()
. We can optionally set a higher cost to substitutions. Another optional argument if set to true permits transpositions and thus helps us calculate the Damerau–Levenshtein Distance.^{}In TensorFlow,
tf.edit_distance()
gives us Levenshtein Distance and optionally normalize it.^{} In R language, package stringdist can calculate many string edit distances.^{}In some applications, when comparing sentences or paragraphs, we may want to exclude some words, or invoke some preprocessing tasks such as lemmatization or stemming. In such cases, we could first tokenize the input using
nltk.word_tokenize(s1)
(in Python) before calculating the edit distance.^{}If we wish to compute Levenshtein Distance by implementing known algorithms, Rosetta Code has a page sharing implementations in many languages.^{} A similar list of implementations is at Wikibooks. Some implementations are more memory efficient than others. It's said that most applications will use heap memory sparingly.^{} In general, a naive recursive implementation will be inefficient compared to a dynamic programming approach.^{}
Milestones
Fred Damerau at IBM looks the problem of matching an erroneous word to words in a dictionary. The error could be a missing character, an extra character, or a wrong character. By accounting for these error types, he was able to find a match in 95% of all error cases.^{}
In communication systems, information can get corrupted. Typically, 0s may becomes 1s, and vice versa. V.I. Levenshtein also considers the problem of extra bits getting introduced into the information stream or bits that go missing. He states that "the function r(x, y) defined on pairs of binary words as equal to the smallest number of deletions, insertions, and reversals that will transform x into y is a metric, and that a code K can correct s deletions, insertions, and reversals if and only if r(x, y) > 2s for any two different words x and y in K."^{}
The WagnerFischer Algorithm is invented to compute the edit distance between two strings. Algorithm has a time complexity of \(O(m\,n)\) when comparing two strings of length m and n. The term edit distance is also coined by Wagner and Fischer. P.H. Sellers coins evolutionary distance as an alternative term.^{}
Bahl and Jelinek provide a stochastic interpretation of edit distance.^{} Called similarity learning, the algorithm will learn pairwise similarity of words by analysing the corpus. Each edit operation has a cost and the idea is to learn these costs, such as by maximizing the likelihood of data.^{}
Masek and Paterson design an algorithm that brings down the worst case time complexity to \(O(m\,n/log_{\sigma}^{2}n)\) but requires \(O(n)\) more space. In practice, it can't beat the classical algorithm for text below 40GB.^{}
Oommen considers constrained edit distance where constraints are placed on edit operations. Example constraints could be "no more than k insertions" or "exactly k substitutions". They also give an algorithm that has \(O(m\,n\,min(m,n))\) time and space complexity.^{}
Marzal and Vidal define Normalized Edit Distance (NED) and an algorithm for computing it.^{} In 2000, two researchers improve the time complexity of finding NED from \(O(m\,n^{2})\) to \(O(m\,n\,log\,n)\).^{}
Cormode and Muthukrishnan consider substring moves and approximation of edit distance. With these modifications, they show that time complexity can be subquadratic.^{}
Indyk and Bačkurs prove that the problem of finding the edit distance can't be solved in less than quadratictime complexity. Thus, we can't do better than the WagnerFischer algorithm in terms of time complexity.^{}
Sample Code
References
 Arslan, Abdullah N. and Omer Egecioglu. 2000. "Efficient Algorithms For Normalized Edit Distance." J. of Discrete Algorithms, vol. 0, no. 0, pp. 118, Hermes Science Publications. Accessed 20190902.
 Bellet, Aurélien, Amaury Habrard, and Marc Sebban. 2012. "Good edit similarity learning by loss minimization." Machine Learning, vol. 89, no. 12, pp. 535, October. Accessed 20190904.
 Che, Wanxiang, Jianmin Jiang, Zhong Su, Yue Pan, and Ting Liu. 2005. "ImprovedEditDistance Kernel for Chinese Relation Extraction." Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts, ACL Anthology. Accessed 20190902.
 Cormode, Graham and S. Muthukrishnan. 2007. "The String Edit Distance Matching Problem with Moves." ACM Transactions on Algorithms, vol. 3, no. 1, February. Accessed 20190902.
 Damerau, Fred J. 1964. "A technique for computer detection and correction of spelling errors." Communications of the ACM, vol. 7, no. 3, pp. 171176, March. Accessed 20190902.
 Fiscus, Jonathan G., Jerome Ajot, Nicolas Radde, and Christophe Laprun. 2016. "Multiple Dimension Levenshtein Edit Distance Calculations for Evaluating Automatic Speech Recognition Systems During Simultaneous Speech." NIST, August 26. Accessed 20190902.
 Hardesty, Larry. 2015. "Longstanding problem put to rest." MIT News, June 10. Accessed 20190902.
 Jain, Deep. 2019. "How to Calculate Levenshtein Distance in Java?" Baeldung, June 23. Accessed 20190902.
 Jurafsky, Dan. 2018. "Minimum Edit Distance." CS 124: From Languages to Information, Stanford University, October 24. Accessed 20190902.
 Kun, Jeremy. 2011. "Metrics on Words." Math ∩ Programming, December 20. Accessed 20190902.
 Levenshtein, V.I. 1966. "Binary Codes Capable of Correcting Deletions, Insertions, and Reversals." Soviet Physics Doklady, vol. 10, no. 8, pp. 707710, February. Accessed 20190902.
 Levenshtein.net. 2019. "The LevenshteinAlgorithm." Accessed 20190902.
 Marzal, Andres and Enrique Vidal. 1993. "Computation of Normalized Edit Distance and Applications." IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, no. 9, pp. 926932, September. Accessed 20190902.
 Mathias. 2013. "KMeans clustering in F#." Blog, Clear Lines Consulting, February 10. Accessed 20190902.
 Minerich, Richard. 2012. "Levenshtein Distance and the Triangle Inequality." Inviting Epiphany, September 04. Accessed 20190902.
 Moslem, Yasmin. 2018. "Edit Distance and Jaccard Distance Calculation with NLTK." Gotrained Python Tutorials, July 01. Accessed 20190902.
 NLTK Docs. 2019. "nltk.metrics package." NLTK 3.4.5 documentation, August 20. Accessed 20190902.
 Oommen, B.J. 1986. "Constrained string editing." Information Sciences, Elsevier, vol. 40, no. 3, pp. 267184, December. Accessed 20190904.
 Ristad, Eric Sven and Peter N. Yianilos. 1996. "Learning String Edit Distance." Research Report CSTR53296, Department of Computer Science, Princeton University, October. Updated 19971001. Accessed 20190904.
 RosettaCode. 2019. "Levenshtein distance." RosettaCode, August 25. Accessed 20190902.
 TensorFlow Docs. 2019. "tf.edit_distance." TensorFlow Core r1.14, June 19. Accessed 20190902.
 Ukkonen, Esko. 1985. "Algorithms for approximate string matching." Information and Control, Elsevier, vol. 64, no. 13, pp. 100118, JanuaryMarch. Accessed 20190902.
 Wagner, Robert A. and Michael J. Fischer. 1974. "The StringtoString Correction Problem." Journal of the ACM, vol. 21, no. 1, pp. 168173, January. Accessed 20190903.
 Wikibooks. 2019. "Algorithm Implementation/Strings/Levenshtein distance." Wikibooks, April 21. Accessed 20190902.
 Wikipedia. 2019a. "Levenshtein distance." Wikipedia, August 29. Accessed 20190902.
 Wikipedia. 2019b. "Sørensen–Dice coefficient." Wikipedia, April 09. Accessed 20190902.
 Wikipedia. 2019c. "Wagner–Fischer algorithm." Wikipedia, June 30. Accessed 20190902.
 van der Loo, Mark. 2019. "Package stringdist." CRAN, June 06. Accessed 20190902.
Further Reading
 Nam, Ethan. 2019. "Understanding the Levenshtein Distance Equation for Beginners." Medium, February 27. Accessed 20190902.
 Jurafsky, Dan. 2018. "Minimum Edit Distance." CS 124: From Languages to Information, Stanford University, October 24. Accessed 20190902.
 Ukkonen, Esko. 1985. "Algorithms for approximate string matching." Information and Control, Elsevier, vol. 64, no. 13, pp. 100118, JanuaryMarch. Accessed 20190902.
 Navarro, Gonzalo. 2001. "A guided tour to approximate string matching." ACM Computing Surveys, vol 33, no. 1, pp. 31–88. Accessed 20190902.
Article Stats
Cite As
See Also
 Dynamic Programming
 Sequence Alignment
 Natural Language Processing
 Hamming Distance