Miscellaneous utilities¶
Windowed counting functions¶
- allel.util.windowed_count(pos, b, window, start=None, stop=None)[source]¶
Count nonzero (i.e., True) elements in non-overlapping windows over a single chromosome or contig.
Parameters: pos : array_like, int, shape (n_variants,)
Positions array.
b : array_like, bool, shape (n_variants,) or (n_variants, n_samples)
Boolean array.
window : int
Window size.
start : int, optional
Start position.
stop : int, optional
Stop position.
Returns: counts : ndarray, int, shape (n_bins,) or (n_bins, n_samples)
Counts array.
edges : ndarray, int, shape (n_bins + 1,)
Bin edges used for counting.
Examples
>>> import allel >>> g = allel.GenotypeArray([[[0, 0], [0, 0]], ... [[0, 1], [0, 1]], ... [[1, 1], [1, 2]], ... [[2, 2], [-1, -1]]], dtype='i1') >>> pos = allel.PositionIndex([2, 14, 15, 27]) >>> b = g.is_variant() >>> counts, edges = allel.windowed_count(pos, b, window=10) >>> edges array([ 2, 12, 22, 32]) >>> counts array([0, 2, 1]) >>> counts, edges = allel.windowed_count(pos, b, window=10, start=1, ... stop=27) >>> edges array([ 1, 11, 21, 27]) >>> counts array([0, 2, 1]) >>> b = g.is_het() >>> counts, edges = allel.windowed_count(pos, b, window=10) >>> edges array([ 2, 12, 22, 32]) >>> counts array([[0, 0], [1, 2], [0, 0]])
- allel.util.windowed_density(pos, b, window, start=None, stop=None, is_accessible=None, fill=0)[source]¶
Calculate the per-base-pair density of nonzero (i.e., True) elements in non-overlapping windows over a single chromosome or contig.
Parameters: pos : array_like, int, shape (n_variants,)
Positions array.
b : array_like, bool, shape (n_variants,) or (n_variants, n_samples)
Boolean array.
window : int
Window size.
start : int, optional
Start position.
stop : int, optional
Stop position.
is_accessible : array_like, bool, shape (len(contig),), optional
Accessibility mask. If provided, the size of each window will be calculated as the number of accessible positions, rather than the absolute bin width.
Returns: densities : ndarray, int, shape (n_bins,) or (n_bins, n_samples)
Densities array.
counts : ndarray, int, shape (n_bins,) or (n_bins, n_samples)
Counts array.
widths : ndarray, int, shape (n_bins,)
Size of each bin, taking accessibility into account.
edges : ndarray, int, shape (n_bins + 1,)
Bin edges used for counting.
Examples
Assuming all positions are accessible:
>>> import allel >>> g = allel.GenotypeArray([[[0, 0], [0, 0]], ... [[0, 1], [0, 1]], ... [[1, 1], [1, 2]], ... [[2, 2], [-1, -1]]], dtype='i1') >>> pos = allel.PositionIndex([2, 14, 15, 27]) >>> b = g.is_variant() >>> densities, counts, widths, edges = allel.windowed_density( ... pos, b, window=10 ... ) >>> edges array([ 2, 12, 22, 32]) >>> widths array([10, 10, 11]) >>> counts array([0, 2, 1]) >>> densities array([ 0. , 0.2 , 0.09090909])
Density calculations can take into account the number of accessible positions within each window, e.g.:
>>> is_accessible = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, ... 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], ... dtype=bool) >>> densities, counts, widths, edges = allel.windowed_density( ... pos, b, start=1, stop=31, window=10, ... is_accessible=is_accessible, fill=np.nan ... ) >>> edges array([ 1, 11, 21, 31]) >>> widths array([ 0, 5, 11]) >>> counts array([0, 2, 1]) >>> densities array([ nan, 0.4 , 0.09090909])