Linkage disequilibrium

allel.rogers_huff_r(gn, fill=nan)[source]

Estimate the linkage disequilibrium parameter r for each pair of variants using the method of Rogers and Huff (2008).

Parameters:

gn : array_like, int8, shape (n_variants, n_samples)

Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).

fill : float, optional

Value to use where r cannot be calculated.

Returns:

r : ndarray, float, shape (n_variants * (n_variants - 1) // 2,)

Matrix in condensed form.

Examples

>>> import allel
>>> g = allel.GenotypeArray([[[0, 0], [1, 1], [0, 0]],
...                          [[0, 0], [1, 1], [0, 0]],
...                          [[1, 1], [0, 0], [1, 1]],
...                          [[0, 0], [0, 1], [-1, -1]]], dtype='i1')
>>> gn = g.to_n_alt(fill=-1)
>>> gn
array([[ 0,  2,  0],
       [ 0,  2,  0],
       [ 2,  0,  2],
       [ 0,  1, -1]], dtype=int8)
>>> r = allel.stats.rogers_huff_r(gn)
>>> r  
array([ 1.        , -1.00000012,  1.        , -1.00000012,  1.        , -1.        ], ...
>>> r ** 2  
array([ 1.        ,  1.00000024,  1.        ,  1.00000024,  1.        ,  1.        ], ...
>>> from scipy.spatial.distance import squareform
>>> squareform(r ** 2)
array([[ 0.        ,  1.        ,  1.00000024,  1.        ],
       [ 1.        ,  0.        ,  1.00000024,  1.        ],
       [ 1.00000024,  1.00000024,  0.        ,  1.        ],
       [ 1.        ,  1.        ,  1.        ,  0.        ]], dtype=float32)
allel.rogers_huff_r_between(gna, gnb, fill=nan)[source]

Estimate the linkage disequilibrium parameter r for each pair of variants between the two input arrays, using the method of Rogers and Huff (2008).

Parameters:

gna, gnb : array_like, int8, shape (n_variants, n_samples)

Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).

fill : float, optional

Value to use where r cannot be calculated.

Returns:

r : ndarray, float, shape (m_variants, n_variants )

Matrix in rectangular form.

allel.windowed_r_squared(pos, gn, size=None, start=None, stop=None, step=None, windows=None, fill=nan, percentile=50)[source]

Summarise linkage disequilibrium in windows over a single chromosome/contig.

Parameters:

pos : array_like, int, shape (n_items,)

The item positions in ascending order, using 1-based coordinates..

gn : array_like, int8, shape (n_variants, n_samples)

Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).

size : int, optional

The window size (number of bases).

start : int, optional

The position at which to start (1-based).

stop : int, optional

The position at which to stop (1-based).

step : int, optional

The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.

windows : array_like, int, shape (n_windows, 2), optional

Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters.

fill : object, optional

The value to use where a window is empty, i.e., contains no items.

percentile : int or sequence of ints, optional

The percentile or percentiles to calculate within each window.

Returns:

out : ndarray, shape (n_windows,)

The value of the statistic for each window.

windows : ndarray, int, shape (n_windows, 2)

The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates.

counts : ndarray, int, shape (n_windows,)

The number of items in each window.

See also

allel.stats.window.windowed_statistic

Notes

Linkage disequilibrium (r**2) is calculated using the method of Rogers and Huff (2008).

allel.locate_unlinked(gn, size=100, step=20, threshold=0.1, blen=None)[source]

Locate variants in approximate linkage equilibrium, where r**2 is below the given threshold.

Parameters:

gn : array_like, int8, shape (n_variants, n_samples)

Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).

size : int

Window size (number of variants).

step : int

Number of variants to advance to the next window.

threshold : float

Maximum value of r**2 to include variants.

blen : int, optional

Block length to use for chunked computation.

Returns:

loc : ndarray, bool, shape (n_variants)

Boolean array where True items locate variants in approximate linkage equilibrium.

Notes

The value of r**2 between each pair of variants is calculated using the method of Rogers and Huff (2008).

allel.plot_pairwise_ld(m, colorbar=True, ax=None, imshow_kwargs=None)[source]

Plot a matrix of genotype linkage disequilibrium values between all pairs of variants.

Parameters:

m : array_like

Array of linkage disequilibrium values in condensed form.

colorbar : bool, optional

If True, add a colorbar to the current figure.

ax : axes, optional

The axes on which to draw. If not provided, a new figure will be created.

imshow_kwargs : dict-like, optional

Additional keyword arguments passed through to matplotlib.pyplot.imshow().

Returns:

ax : axes

The axes on which the plot was drawn.