Distance module#

A module providing a variety of distance metrics.

This module includes implementations of various distance metrics, including both common and less common measures. It allows for the calculation of distances between data points in a vectorized manner using numpy arrays. A part of this code is based on the work of Andrzej Zielezinski, originally retrieved on 20 November 2022 from aziele/statistical-distances, # noqa which was released via the GNU General Public License v3.0.

It was originally modified by Siddharth Chaini on 27 November 2022.

Notes

Modifications by Siddharth Chaini include the addition of the following distance measures:

  1. Meehl distance

  2. Sorensen distance

  3. Ruzicka distance

  4. Inner product distance

  5. Harmonic mean distance

  6. Fidelity

  7. Minimimum Symmetric Chi Squared

  8. Probabilistic Symmetric Chi Squared

In addition, the following code was added to all functions for array conversion:

u,v = np.asarray(u), np.asarray(v)

class distclassipy.distances.Distance(epsilon=None)#

Bases: object

A class to calculate various distance metrics between vectors.

This class provides methods to compute different types of distances between two vectors, such as Euclidean, Manhattan, Canberra, and other statistical distances. Each method takes two vectors as input and returns the calculated distance. The class can handle both numpy arrays and lists, converting them internally to numpy arrays for computation.

epsilon#

A small value to avoid division by zero errors in certain distance calculations. Default is the machine precision for float data type.

Type:

float, optional

acc(u, v)#

Returns the average of Cityblock/Manhattan and Chebyshev distances.

add_chisq(u, v)#

Returns the Additive Symmetric Chi-square distance.

(Other methods are not listed here for brevity)

Examples

>>> dist = Distance()
>>> u = [1, 2, 3]
>>> v = [4, 5, 6]
>>> print(dist.acc(u, v))
5.0
acc(u, v)#

Calculate the average of Cityblock and Chebyshev distance.

This function computes the ACC distance, also known as the Average distance, between two vectors u and v. It is the average of the Cityblock (or Manhattan) and Chebyshev distances.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The ACC distance between the two vectors.

References

  1. Krause EF (2012) Taxicab Geometry An Adventure in Non-Euclidean Geometry. Dover Publications.

  2. Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. vol. 1(4), pp. 300-307.

add_chisq(u, v)#

Compute the Additive Symmetric Chi-square distance between two vectors.

The Additive Symmetric Chi-square distance is a measure that can be used to compare two vectors. This function calculates it based on the input vectors u and v.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Additive Symmetric Chi-square distance between the two vectors.

References

  1. Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. vol. 1(4), pp. 300-307.

braycurtis(u, v, w=None)#

Calculate the Bray-Curtis distance between two vectors.

The Bray-Curtis distance is a measure of dissimilarity between two non-negative vectors, often used in ecology to measure the compositional dissimilarity between two sites based on counts of species at both sites. It is closely related to the Sørensen distance and is also known as Bray-Curtis dissimilarity.

Notes

When used for comparing two probability density functions (pdfs), the Bray-Curtis distance equals the Cityblock distance divided by 2.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Bray-Curtis distance between the two vectors.

References

  1. Bray JR, Curtis JT (1957) An ordination of the upland forest of southern Wisconsin. Ecological Monographs, 27, 325-349.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

  3. https://en.wikipedia.org/wiki/Bray–Curtis_dissimilarity

canberra(u, v, w=None)#

Calculate the Canberra distance between two vectors.

The Canberra distance is a weighted version of the Manhattan distance, used in numerical analysis.

Notes

When u[i] and v[i] are 0 for given i, then the fraction 0/0 = 0 is used in the calculation.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Canberra distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

chebyshev(u, v, w=None)#

Calculate the Chebyshev distance between two vectors.

The Chebyshev distance is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension.

Synonyms:

Chessboard distance King-move metric Maximum value distance Minimax approximation

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Chebyshev distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

chebyshev_min(u, v)#

Calculate the minimum value distance between two vectors.

This measure represents a custom approach by Zielezinski to distance measurement, focusing on the minimum absolute difference.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The minimum value distance between the two vectors.

cityblock(u, v, w=None)#

Calculate the Cityblock (Manhattan) distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Cityblock distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

Synonyms:

City block distance Manhattan distance Rectilinear distance Taxicab norm

Notes

Cityblock distance between two probability density functions (pdfs) equals: 1. Non-intersection distance multiplied by 2. 2. Gower distance multiplied by vector length. 3. Bray-Curtis distance multiplied by 2. 4. Google distance multiplied by 2.

clark(u, v)#

Calculate the Clark distance between two vectors.

The Clark distance equals the square root of half of the divergence.

Notes

When u[i] and v[i] are 0 for given i, then the fraction 0/0 = 0 is used in the calculation.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Clark distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

correlation(u, v, w=None, centered=True)#

Calculate the Pearson correlation distance between two vectors.

Returns a distance value between 0 and 2.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Pearson correlation distance between the two vectors.

cosine(u, v, w=None)#

Calculate the cosine distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The cosine distance between the two vectors.

References

  1. SciPy.

czekanowski(u, v)#

Calculate the Czekanowski distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Czekanowski distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

dice(u, v)#

Calculate the Dice dissimilarity between two vectors.

Synonyms:

Sorensen distance

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Dice dissimilarity between the two vectors.

References

  1. Dice LR (1945) Measures of the amount of ecologic association between species. Ecology. 26, 297-302.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

divergence(u, v)#

Calculate the divergence between two vectors.

Divergence equals squared Clark distance multiplied by 2.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The divergence between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

euclidean(u, v, w=None)#

Calculate the Euclidean distance between two vectors.

The Euclidean distance is the “ordinary” straight-line distance between two points in Euclidean space.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Euclidean distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

google(u, v)#

Calculate the Normalized Google Distance (NGD) between two vectors.

NGD is a measure of similarity derived from the number of hits returned by the Google search engine for a given set of keywords.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Normalized Google Distance between the two vectors.

Notes

When used for comparing two probability density functions (pdfs), Google distance equals half of Cityblock distance.

References

  1. Lee & Rashid (2008) Information Technology, ITSim 2008. doi:10.1109/ITSIM.2008.4631601.

gower(u, v)#

Calculate the Gower distance between two vectors.

The Gower distance equals the Cityblock distance divided by the vector length.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Gower distance between the two vectors.

References

  1. Gower JC. (1971) General Coefficient of Similarity and Some of Its Properties, Biometrics 27, 857-874.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

hellinger(u, v)#

Calculate the Hellinger distance between two vectors.

The Hellinger distance is a measure of similarity between two probability distributions.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Hellinger distance between the two vectors.

Notes

This implementation produces values two times larger than values obtained by Hellinger distance described in Wikipedia and also in https://gist.github.com/larsmans/3116927.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

jaccard(u, v)#

Calculate the Jaccard distance between two vectors.

The Jaccard distance measures dissimilarity between sample sets.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Jaccard distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

jeffreys(u, v)#

Calculate the Jeffreys divergence between two vectors.

The Jeffreys divergence is a symmetric version of the Kullback-Leibler divergence.

Parameters:
  • u (-)

  • v (Input vectors between which the divergence is to be calculated.)

Return type:

  • The Jeffreys divergence between the two vectors.

References

  1. Jeffreys H (1946) An Invariant Form for the Prior Probability in Estimation Problems. Proc.Roy.Soc.Lon., Ser. A 186, 453-461.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

jensen_difference(u, v)#

Calculate the Jensen difference between two vectors.

The Jensen difference is considered similar to the Jensen-Shannon divergence.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Jensen difference between the two vectors.

Notes

  1. Equals half of Topsøe distance

  2. Equals squared jensenshannon_distance.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

jensenshannon_divergence(u, v)#

Calculate the Jensen-Shannon divergence between two vectors.

The Jensen-Shannon divergence is a symmetric and finite measure of similarity between two probability distributions.

Parameters:
  • u (-)

  • v (Input vectors between which the divergence is to be calculated.)

Return type:

  • The Jensen-Shannon divergence between the two vectors.

References

  1. Lin J. (1991) Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1):145–151.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Comments:

Equals Jensen difference in Sung-Hyuk (2007): u = np.where(u==0, self.epsilon, u) v = np.where(v==0, self.epsilon, v) el1 = (u * np.log(u) + v * np.log(v)) / 2 el2 = (u + v)/2 el3 = np.log(el2) return np.sum(el1 - el2 * el3)

kulczynski(u, v)#

Calculate the Kulczynski distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Kulczynski distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

kumarjohnson(u, v)#

Calculate the Kumar-Johnson distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Kumar-Johnson distance between the two vectors.

References

  1. Kumar P, Johnson A. (2005) On a symmetric divergence measure and information inequalities, Journal of Inequalities in pure and applied Mathematics. 6(3).

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

lorentzian(u, v)#

Calculate the Lorentzian distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Lorentzian distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

Notes

One (1) is added to guarantee the non-negativity property and to eschew the log of zero.

marylandbridge(u, v)#

Calculate the Maryland Bridge distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Maryland Bridge distance between the two vectors.

References

  1. Deza M, Deza E (2009) Encyclopedia of Distances. Springer-Verlag Berlin Heidelberg. 1-590.

matusita(u, v)#

Calculate the Matusita distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Matusita distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

Notes

Equals square root of Squared-chord distance.

meehl(u, v)#

Calculate the Meehl distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Meehl distance between the two vectors.

Notes

Added by SC.

References

  1. Deza M. and Deza E. (2013) Encyclopedia of Distances. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-30958-8.

minkowski(u, v, p=2)#

Calculate the Minkowski distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

  • p (-)

Return type:

  • The Minkowski distance between the two vectors.

Notes

When p goes to infinite, the Chebyshev distance is derived.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity

    Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

motyka(u, v)#

Calculate the Motyka distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Motyka distance between the two vectors.

Notes

The distance between identical vectors is not equal to 0 but 0.5.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

penroseshape(u, v)#

Calculate the Penrose shape distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Penrose shape distance between the two vectors.

References

  1. Deza M, Deza E (2009) Encyclopedia of Distances. Springer-Verlag Berlin Heidelberg. 1-590.

prob_chisq(u, v)#

Calculate the Probabilistic chi-square distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Probabilistic chi-square distance between the two vectors.

Notes

Added by SC.

ruzicka(u, v)#

Calculate the Ruzicka distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Ruzicka distance between the two vectors.

Notes

Added by SC.

soergel(u, v)#

Calculate the Soergel distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Soergel distance between the two vectors.

Notes

Equals Tanimoto distance.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

sorensen(u, v)#

Calculate the Sorensen distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Sorensen distance between the two vectors.

Notes

The Sorensen distance equals the Manhattan distance divided by the sum of the two vectors.

Added by SC.

squared_chisq(u, v)#

Calculate the Squared chi-square distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Squared chi-square distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

squared_euclidean(u, v)#

Calculate the Squared Euclidean distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Squared Euclidean distance between the two vectors.

References

  1. Gavin DG et al. (2003) A statistical approach to evaluating distance metrics and analog assignments for pollen records. Quaternary Research 60:356–367.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals to squared Euclidean distance.

squaredchord(u, v)#

Calculate the Squared-chord distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Squared-chord distance between the two vectors.

References

  1. Gavin DG et al. (2003) A statistical approach to evaluating distance metrics and analog assignments for pollen records. Quaternary Research 60:356–367.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals to squared Matusita distance.

taneja(u, v)#

Calculate the Taneja distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Taneja distance between the two vectors.

References

  1. Taneja IJ. (1995), New Developments in Generalized Information Measures, Chapter in: Advances in Imaging and Electron Physics, Ed. P.W. Hawkes, 91, 37-135.

  2. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

tanimoto(u, v)#

Calculate the Tanimoto distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Tanimoto distance between the two vectors.

References

  1. Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals Soergel distance.

topsoe(u, v)#

Calculate the Topsøe distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Topsøe distance between the two vectors.

References

  1. Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals two times Jensen-Shannon divergence.

vicis_symmetric_chisq(u, v)#

Calculate the Vicis Symmetric chi-square distance.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Vicis Symmetric chi-square distance between the two vectors.

References

  1. Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307

vicis_wave_hedges(u, v)#

Calculate the Vicis-Wave Hedges distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Vicis-Wave Hedges distance between the two vectors.

References

  1. Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

wave_hedges(u, v)#

Calculate the Wave Hedges distance between two vectors.

Parameters:
  • u (-)

  • v (Input vectors between which the distance is to be calculated.)

Return type:

  • The Wave Hedges distance between the two vectors.

References

  1. Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307