Distance module

Distance module#

A module providing a variety of distance metrics.

This module includes implementations of various distance metrics, including both common and less common measures. It allows for the calculation of distances between data points in a vectorized manner using numpy arrays. A part of this code is based on the work of Andrzej Zielezinski, originally retrieved on 20 November 2022 from aziele/statistical-distances, # noqa which was released via the GNU General Public License v3.0.

It was originally modified by Siddharth Chaini on 27 November 2022.

Notes

Modifications by Siddharth Chaini include the addition of the following distance measures:

Meehl distance

Sorensen distance

Ruzicka distance

Inner product distance

Harmonic mean distance

Fidelity

Minimimum Symmetric Chi Squared

Probabilistic Symmetric Chi Squared

In addition, the following code was added to all functions for array conversion:: u,v = np.asarray(u), np.asarray(v)

Copyright (C) 2024 Siddharth Chaini#

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

distclassipy.distances.acc(u, v)#

Calculate the average of Cityblock and Chebyshev distance.

This function computes the ACC distance, also known as the Average distance, between two vectors u and v. It is the average of the Cityblock (or Manhattan) and Chebyshev distances.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The ACC distance between the two vectors.

References

Krause EF (2012) Taxicab Geometry An Adventure in Non-Euclidean Geometry. Dover Publications.
Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. vol. 1(4), pp. 300-307.

distclassipy.distances.add_chisq(u, v)#

Compute the Additive Symmetric Chi-square distance between two vectors.

The Additive Symmetric Chi-square distance is a measure that can be used to compare two vectors. This function calculates it based on the input vectors u and v.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Additive Symmetric Chi-square distance between the two vectors.

References

Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. vol. 1(4), pp. 300-307.

distclassipy.distances.braycurtis(u, v, w=None)#

Calculate the Bray-Curtis distance between two vectors.

The Bray-Curtis distance is a measure of dissimilarity between two non-negative vectors, often used in ecology to measure the compositional dissimilarity between two sites based on counts of species at both sites. It is closely related to the Sørensen distance and is also known as Bray-Curtis dissimilarity.

Notes

When used for comparing two probability density functions (pdfs), the Bray-Curtis distance equals the Cityblock distance divided by 2.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Bray-Curtis distance between the two vectors.

References

Bray JR, Curtis JT (1957) An ordination of the upland forest of southern Wisconsin. Ecological Monographs, 27, 325-349.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.
https://en.wikipedia.org/wiki/Bray–Curtis_dissimilarity

distclassipy.distances.canberra(u, v, w=None)#

Calculate the Canberra distance between two vectors.

The Canberra distance is a weighted version of the Manhattan distance, used in numerical analysis.

Notes

When u[i] and v[i] are 0 for given i, then the fraction 0/0 = 0 is used in the calculation.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Canberra distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.chebyshev(u, v, w=None)#

Calculate the Chebyshev distance between two vectors.

The Chebyshev distance is a metric defined on a vector space where the distance between two vectors is the greatest of their differences along any coordinate dimension.

Synonyms:: Chessboard distance King-move metric Maximum value distance Minimax approximation

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Chebyshev distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.chebyshev_min(u, v)#

Calculate the minimum value distance between two vectors.

This measure represents a custom approach by Zielezinski to distance measurement, focusing on the minimum absolute difference.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The minimum value distance between the two vectors.

distclassipy.distances.cityblock(u, v, w=None)#

Calculate the Cityblock (Manhattan) distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Cityblock distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

Synonyms:: City block distance Manhattan distance Rectilinear distance Taxicab norm

Notes

Cityblock distance between two probability density functions (pdfs) equals: 1. Non-intersection distance multiplied by 2. 2. Gower distance multiplied by vector length. 3. Bray-Curtis distance multiplied by 2. 4. Google distance multiplied by 2.

distclassipy.distances.clark(u, v)#

Calculate the Clark distance between two vectors.

The Clark distance equals the square root of half of the divergence.

Notes

When u[i] and v[i] are 0 for given i, then the fraction 0/0 = 0 is used in the calculation.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Clark distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.correlation(u, v, w=None, centered=True)#

Calculate the Pearson correlation distance between two vectors.

Returns a distance value between 0 and 2.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Pearson correlation distance between the two vectors.

distclassipy.distances.cosine(u, v, w=None)#

Calculate the cosine distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The cosine distance between the two vectors.

References

SciPy.

distclassipy.distances.czekanowski(u, v)#

Calculate the Czekanowski distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Czekanowski distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.dice(u, v)#

Calculate the Dice dissimilarity between two vectors.

Synonyms:: Sorensen distance

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Dice dissimilarity between the two vectors.

References

Dice LR (1945) Measures of the amount of ecologic association between species. Ecology. 26, 297-302.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.divergence(u, v)#

Calculate the divergence between two vectors.

Divergence equals squared Clark distance multiplied by 2.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The divergence between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.euclidean(u, v, w=None)#

Calculate the Euclidean distance between two vectors.

The Euclidean distance is the “ordinary” straight-line distance between two points in Euclidean space.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Euclidean distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.google(u, v)#

Calculate the Normalized Google Distance (NGD) between two vectors.

NGD is a measure of similarity derived from the number of hits returned by the Google search engine for a given set of keywords.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Normalized Google Distance between the two vectors.

Notes

When used for comparing two probability density functions (pdfs), Google distance equals half of Cityblock distance.

References

Lee & Rashid (2008) Information Technology, ITSim 2008.
doi:10.1109/ITSIM.2008.4631601.

distclassipy.distances.gower(u, v)#

Calculate the Gower distance between two vectors.

The Gower distance equals the Cityblock distance divided by the vector length.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Gower distance between the two vectors.

References

Gower JC. (1971) General Coefficient of Similarity
and Some of Its Properties, Biometrics 27, 857-874.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.hellinger(u, v)#

Calculate the Hellinger distance between two vectors.

The Hellinger distance is a measure of similarity between two probability distributions.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Hellinger distance between the two vectors.

Notes

This implementation produces values two times larger than values obtained by Hellinger distance described in Wikipedia and also in https://gist.github.com/larsmans/3116927.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.jaccard(u, v)#

Calculate the Jaccard distance between two vectors.

The Jaccard distance measures dissimilarity between sample sets.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Jaccard distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.jeffreys(u, v)#

Calculate the Jeffreys divergence between two vectors.

The Jeffreys divergence is a symmetric version of the Kullback-Leibler divergence.

Parameters:

u (-)
v (Input vectors between which the divergence is to be calculated.)

Return type:

The Jeffreys divergence between the two vectors.

References

Jeffreys H (1946) An Invariant Form for the Prior Probability
in Estimation Problems. Proc.Roy.Soc.Lon., Ser. A 186, 453-461.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.jensen_difference(u, v)#

Calculate the Jensen difference between two vectors.

The Jensen difference is considered similar to the Jensen-Shannon divergence.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Jensen difference between the two vectors.

Notes

Equals half of Topsøe distance
Equals squared jensenshannon_distance.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.jensenshannon_divergence(u, v)#

Calculate the Jensen-Shannon divergence between two vectors.

The Jensen-Shannon divergence is a symmetric and finite measure of similarity between two probability distributions.

Parameters:

u (-)
v (Input vectors between which the divergence is to be calculated.)

Return type:

The Jensen-Shannon divergence between the two vectors.

References

Lin J. (1991) Divergence measures based on the Shannon entropy.
IEEE Transactions on Information Theory, 37(1):145–151.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Comments:: Equals Jensen difference in Sung-Hyuk (2007): u = np.where(u==0, EPSILON, u) v = np.where(v==0, EPSILON, v) el1 = (u * np.log(u) + v * np.log(v)) / 2 el2 = (u + v)/2 el3 = np.log(el2) return np.sum(el1 - el2 * el3)

distclassipy.distances.kulczynski(u, v)#

Calculate the Kulczynski distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Kulczynski distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

distclassipy.distances.kumarjohnson(u, v)#

Calculate the Kumar-Johnson distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Kumar-Johnson distance between the two vectors.

References

Kumar P, Johnson A. (2005) On a symmetric divergence measure
and information inequalities, Journal of Inequalities in pure and applied Mathematics. 6(3).
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

distclassipy.distances.lorentzian(u, v)#

Calculate the Lorentzian distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Lorentzian distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

Notes

One (1) is added to guarantee the non-negativity property and to eschew the log of zero.

distclassipy.distances.marylandbridge(u, v)#

Calculate the Maryland Bridge distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Maryland Bridge distance between the two vectors.

References

Deza M, Deza E (2009) Encyclopedia of Distances. Springer-Verlag Berlin Heidelberg. 1-590.

distclassipy.distances.matusita(u, v)#

Calculate the Matusita distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Matusita distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

Notes

Equals square root of Squared-chord distance.

distclassipy.distances.meehl(u, v)#

Calculate the Meehl distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Meehl distance between the two vectors.

Notes

Added by SC.

References

Deza M. and Deza E. (2013) Encyclopedia of Distances. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-30958-8.

distclassipy.distances.minkowski(u, v, p=2)#

Calculate the Minkowski distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)
p (-)

Return type:

The Minkowski distance between the two vectors.

Notes

When p goes to infinite, the Chebyshev distance is derived.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4):300-307.

distclassipy.distances.motyka(u, v)#

Calculate the Motyka distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Motyka distance between the two vectors.

Notes

The distance between identical vectors is not equal to 0 but 0.5.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.penroseshape(u, v)#

Calculate the Penrose shape distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Penrose shape distance between the two vectors.

References

Deza M, Deza E (2009) Encyclopedia of Distances.
Springer-Verlag Berlin Heidelberg. 1-590.

distclassipy.distances.prob_chisq(u, v)#

Calculate the Probabilistic chi-square distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Probabilistic chi-square distance between the two vectors.

Notes

Added by SC.

distclassipy.distances.ruzicka(u, v)#

Calculate the Ruzicka distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Ruzicka distance between the two vectors.

Notes

Added by SC.

distclassipy.distances.soergel(u, v)#

Calculate the Soergel distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Soergel distance between the two vectors.

Notes

Equals Tanimoto distance.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.sorensen(u, v)#

Calculate the Sorensen distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Sorensen distance between the two vectors.

Notes

The Sorensen distance equals the Manhattan distance divided by the sum of the two vectors.

Added by SC.

distclassipy.distances.squared_chisq(u, v)#

Calculate the Squared chi-square distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Squared chi-square distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.squared_euclidean(u, v)#

Calculate the Squared Euclidean distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Squared Euclidean distance between the two vectors.

References

Gavin DG et al. (2003) A statistical approach to evaluating
distance metrics and analog assignments for pollen records. Quaternary Research 60:356–367.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals to squared Euclidean distance.

distclassipy.distances.squaredchord(u, v)#

Calculate the Squared-chord distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Squared-chord distance between the two vectors.

References

Gavin DG et al. (2003) A statistical approach to evaluating
distance metrics and analog assignments for pollen records. Quaternary Research 60:356–367.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals to squared Matusita distance.

distclassipy.distances.taneja(u, v)#

Calculate the Taneja distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Taneja distance between the two vectors.

References

Taneja IJ. (1995), New Developments in Generalized Information
Measures, Chapter in: Advances in Imaging and Electron Physics, Ed. P.W. Hawkes, 91, 37-135.
Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.tanimoto(u, v)#

Calculate the Tanimoto distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Tanimoto distance between the two vectors.

References

Sung-Hyuk C. (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals Soergel distance.

distclassipy.distances.topsoe(u, v)#

Calculate the Topsøe distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Topsøe distance between the two vectors.

References

Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

Notes

Equals two times Jensen-Shannon divergence.

distclassipy.distances.vicis_symmetric_chisq(u, v)#

Calculate the Vicis Symmetric chi-square distance.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Vicis Symmetric chi-square distance between the two vectors.

References

Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307

distclassipy.distances.vicis_wave_hedges(u, v)#

Calculate the Vicis-Wave Hedges distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Vicis-Wave Hedges distance between the two vectors.

References

Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity
Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307.

distclassipy.distances.wave_hedges(u, v)#

Calculate the Wave Hedges distance between two vectors.

Parameters:

u (-)
v (Input vectors between which the distance is to be calculated.)

Return type:

The Wave Hedges distance between the two vectors.

References

Sung-Hyuk C (2007) Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions. International Journal of Mathematical Models and Methods in Applied Sciences. 1(4), 300-307

Distance module

Contents

Distance module#

Copyright (C) 2024 Siddharth Chaini#