Pairwise distance¶
- allel.stats.distance.pairwise_distance(x, metric)[source]¶
Compute pairwise distance between individuals (e.g., samples or haplotypes).
Parameters: x : array_like, shape (n, m, ...)
Array of m observations (e.g., samples or haplotypes) in a space with n dimensions (e.g., variants). Note that the order of the first two dimensions is swapped compared to what is expected by scipy.spatial.distance.pdist.
metric : string or function
Distance metric. See documentation for the function scipy.spatial.distance.pdist() for a list of built-in distance metrics.
Returns: dist : ndarray, shape (n_individuals * (n_individuals - 1) / 2,)
Distance matrix in condensed form.
See also
Notes
If x is a bcolz carray, a chunk-wise implementation will be used to avoid loading the entire input array into memory. This means that a distance matrix will be calculated for each chunk in the input array, and the results will be summed to produce the final output. For some distance metrics this will return a different result from the standard implementation, although the relative distances may be equivalent.
Examples
>>> import allel >>> g = allel.model.GenotypeArray([[[0, 0], [0, 1], [1, 1]], ... [[0, 1], [1, 1], [1, 2]], ... [[0, 2], [2, 2], [-1, -1]]]) >>> d = allel.stats.pairwise_distance(g.to_n_alt(), metric='cityblock') >>> d array([ 3., 4., 3.]) >>> import scipy.spatial >>> scipy.spatial.distance.squareform(d) array([[ 0., 3., 4.], [ 3., 0., 3.], [ 4., 3., 0.]])
- allel.stats.distance.pairwise_dxy(pos, gac, start=None, stop=None, is_accessible=None)[source]¶
Convenience function to calculate a pairwise distance matrix using nucleotide divergence (a.k.a. Dxy) as the distance metric.
Parameters: pos : array_like, int, shape (n_variants,)
Variant positions.
gac : array_like, int, shape (n_variants, n_samples, n_alleles)
Per-genotype allele counts.
start : int, optional
Start position of region to use.
stop : int, optional
Stop position of region to use.
is_accessible : array_like, bool, shape (len(contig),), optional
Boolean array indicating accessibility status for all positions in the chromosome/contig.
Returns: dist : ndarray
Distance matrix in condensed form.