Miscellaneous utilities

Windowed counting functions

allel.util.windowed_count(pos, b, window, start=None, stop=None)[source]

Count nonzero (i.e., True) elements in non-overlapping windows over a single chromosome or contig.

Parameters:

pos : array_like, int, shape (n_variants,)

Positions array.

b : array_like, bool, shape (n_variants,) or (n_variants, n_samples)

Boolean array.

window : int

Window size.

start : int, optional

Start position.

stop : int, optional

Stop position.

Returns:

counts : ndarray, int, shape (n_bins,) or (n_bins, n_samples)

Counts array.

edges : ndarray, int, shape (n_bins + 1,)

Bin edges used for counting.

Examples

>>> import allel
>>> g = allel.GenotypeArray([[[0, 0], [0, 0]],
...                          [[0, 1], [0, 1]],
...                          [[1, 1], [1, 2]],
...                          [[2, 2], [-1, -1]]], dtype='i1')
>>> pos = allel.PositionIndex([2, 14, 15, 27])
>>> b = g.is_variant()
>>> counts, edges = allel.windowed_count(pos, b, window=10)
>>> edges
array([ 2, 12, 22, 32])
>>> counts
array([0, 2, 1])
>>> counts, edges = allel.windowed_count(pos, b, window=10, start=1,
...                                      stop=27)
>>> edges
array([ 1, 11, 21, 27])
>>> counts
array([0, 2, 1])
>>> b = g.is_het()
>>> counts, edges = allel.windowed_count(pos, b, window=10)
>>> edges
array([ 2, 12, 22, 32])
>>> counts
array([[0, 0],
       [1, 2],
       [0, 0]])
allel.util.windowed_density(pos, b, window, start=None, stop=None, is_accessible=None, fill=0)[source]

Calculate the per-base-pair density of nonzero (i.e., True) elements in non-overlapping windows over a single chromosome or contig.

Parameters:

pos : array_like, int, shape (n_variants,)

Positions array.

b : array_like, bool, shape (n_variants,) or (n_variants, n_samples)

Boolean array.

window : int

Window size.

start : int, optional

Start position.

stop : int, optional

Stop position.

is_accessible : array_like, bool, shape (len(contig),), optional

Accessibility mask. If provided, the size of each window will be calculated as the number of accessible positions, rather than the absolute bin width.

Returns:

densities : ndarray, int, shape (n_bins,) or (n_bins, n_samples)

Densities array.

counts : ndarray, int, shape (n_bins,) or (n_bins, n_samples)

Counts array.

widths : ndarray, int, shape (n_bins,)

Size of each bin, taking accessibility into account.

edges : ndarray, int, shape (n_bins + 1,)

Bin edges used for counting.

Examples

Assuming all positions are accessible:

>>> import allel
>>> g = allel.GenotypeArray([[[0, 0], [0, 0]],
...                          [[0, 1], [0, 1]],
...                          [[1, 1], [1, 2]],
...                          [[2, 2], [-1, -1]]], dtype='i1')
>>> pos = allel.PositionIndex([2, 14, 15, 27])
>>> b = g.is_variant()
>>> densities, counts, widths, edges = allel.windowed_density(
...     pos, b, window=10
... )
>>> edges
array([ 2, 12, 22, 32])
>>> widths
array([10, 10, 11])
>>> counts
array([0, 2, 1])
>>> densities
array([ 0.        ,  0.2       ,  0.09090909])

Density calculations can take into account the number of accessible positions within each window, e.g.:

>>> is_accessible = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
...                           0, 1, 1, 1, 0, 0, 1, 1, 0, 0,
...                           1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
...                           dtype=bool)
>>> densities, counts, widths, edges = allel.windowed_density(
...     pos, b, start=1, stop=31, window=10,
...     is_accessible=is_accessible, fill=np.nan
... )
>>> edges
array([ 1, 11, 21, 31])
>>> widths
array([ 0,  5, 11])
>>> counts
array([0, 2, 1])
>>> densities
array([        nan,  0.4     ,  0.09090909])