Linkage disequilibrium

allel.stats.ld.rogers_huff_r(gn, fill=nan)[source]

Estimate the linkage disequilibrium parameter r for each pair of variants using the method of Rogers and Huff (2008).

Parameters:

gn : array_like, int8, shape (n_variants, n_samples)

Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).

Returns:

r : ndarray, float, shape (n_variants * (n_variants - 1) // 2,)

Matrix in condensed form.

Examples

>>> import allel
>>> g = allel.model.GenotypeArray([[[0, 0], [1, 1], [0, 0]],
...                                [[0, 0], [1, 1], [0, 0]],
...                                [[1, 1], [0, 0], [1, 1]],
...                                [[0, 0], [0, 1], [-1, -1]]], dtype='i1')
>>> gn = g.to_n_alt(fill=-1)
>>> gn
array([[ 0,  2,  0],
       [ 0,  2,  0],
       [ 2,  0,  2],
       [ 0,  1, -1]], dtype=int8)
>>> r = allel.stats.rogers_huff_r(gn)
>>> r
array([ 1.        , -1.00000012,  1.        , -1.00000012,  1.        , -1.        ], dtype=float32)
>>> r ** 2
array([ 1.        ,  1.00000024,  1.        ,  1.00000024,  1.        ,  1.        ], dtype=float32)
>>> from scipy.spatial.distance import squareform
>>> squareform(r ** 2)
array([[ 0.        ,  1.        ,  1.00000024,  1.        ],
       [ 1.        ,  0.        ,  1.00000024,  1.        ],
       [ 1.00000024,  1.00000024,  0.        ,  1.        ],
       [ 1.        ,  1.        ,  1.        ,  0.        ]])
allel.stats.ld.locate_unlinked(gn, size=100, step=20, threshold=0.1)[source]

Locate variants in approximate linkage equilibrium, where r**2 is below the given threshold.

Parameters:

gn : array_like, int8, shape (n_variants, n_samples)

Diploid genotypes at biallelic variants, coded as the number of alternate alleles per call (i.e., 0 = hom ref, 1 = het, 2 = hom alt).

size : int

Window size (number of variants).

step : int

Number of variants to advance to the next window.

threshold : float

Maximum value of r**2 to include variants.

Returns:

loc : ndarray, bool, shape (n_variants)

Boolean array where True items locate variants in approximate linkage equilibrium.

Notes

The value of r**2 between each pair of variants is calculated using the method of Rogers and Huff (2008).