next up previous contents index
Next: Alignment Functions ( alignment Up: Alignment Data Types Previous: Score Matrix ( score_matrix   Contents   Index


String Distance Function ( distance )

Definition

An instance D of the data type distance is an object maintaining two strings and providing several string distance functions.
The Hamming distance between two strings of equal length is the number of positions for which the corresponding symbols are different.
The Levenshtein distance or edit distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.
The General Levenshtein distance is given by the minimum sum of the costs over a sequence of operations needed to transform one string into the other, where operation is an insertion, deletion, or substitution and a cost is assigned to each of the operations.
The Damerau-Levenshtein distance is an extension of Levenshtein distance that counts transposition as a single edit operation. Strictly speaking, the Damerau-Levenshtein distance is equal to the minimal number of insertions, deletions, substitutions and transpositions needed to transform one string into the other.
The algorithms can be called explicitly by their name or implicitly by making an algorithm the "current" algorithm (method setAlgorithm(.) and using the run()-method.

Here is the list of the available parameters to specify the algorithm in the setAlgorithm(alg)-method in order to call it by the run()-method. For each algorithm, the enumeration index gives the corresponding int that can be used in the setAlgorithm(int)-method in order to specify the algorithm.

  1. HAMMING

  2. LEVENSHTEIN

  3. DAMERAU$\_$LEVENSHTEIN

  4. GENERAL$\_$LEVENSHTEIN

Remark: This data type is experimental. It can significantly change in future versions.

#include < LEDA/string/distance.h >

Creation

distance D creates an instance D of type distance.

distance D(const string& s1, const string& s2)
    creates an instance D of type distance and initializes the two strings to s1 resp. s2.

Operations

int D.hamming() returns the Hamming distance of two strings. If the strings have different lengths, -1 is returned.

int D.levenshtein() returns the Levenshtein distance of two strings.

int D.damerau_levenshtein() returns the Damerau Levenshtein distance of two strings.

unsigned int D.general_levenshtein() returns the General Levenshtein distance of two strings.

void D.setAlgorithm(alg A) sets the current algorithm to A (see table above).

void D.setAlgorithm(int A) sets the current algorithm to A by index (see table above).

int D.run() executes the current algorithm.

void D.setFirstString(const string& str)
    assigns str to the first string.

void D.setSecondString(const string& str)
    assigns str to the second string.

string D.getFirstString() returns first string.

string D.getSecondString() returns second string.


next up previous contents index
Next: Alignment Functions ( alignment Up: Alignment Data Types Previous: Score Matrix ( score_matrix   Contents   Index