Window utilities

allel.moving_statistic(values, statistic, size, start=0, stop=None, step=None, **kwargs)[source]

Calculate a statistic in a moving window over values.

Parameters:
values : array_like

The data to summarise.

statistic : function

The statistic to compute within each window.

size : int

The window size (number of values).

start : int, optional

The index at which to start.

stop : int, optional

The index at which to stop.

step : int, optional

The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.

kwargs

Additional keyword arguments are passed through to the statistic function.

Returns:
out : ndarray, shape (n_windows,)

Examples

>>> import allel
>>> values = [2, 5, 8, 16]
>>> allel.moving_statistic(values, np.sum, size=2)
array([ 7, 24])
>>> allel.moving_statistic(values, np.sum, size=2, step=1)
array([ 7, 13, 24])
allel.windowed_statistic(pos, values, statistic, size=None, start=None, stop=None, step=None, windows=None, fill=nan)[source]

Calculate a statistic from items in windows over a single chromosome/contig.

Parameters:
pos : array_like, int, shape (n_items,)

The item positions in ascending order, using 1-based coordinates..

values : array_like, int, shape (n_items,)

The values to summarise. May also be a tuple of values arrays, in which case each array will be sliced and passed through to the statistic function as separate arguments.

statistic : function

The statistic to compute.

size : int, optional

The window size (number of bases).

start : int, optional

The position at which to start (1-based).

stop : int, optional

The position at which to stop (1-based).

step : int, optional

The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.

windows : array_like, int, shape (n_windows, 2), optional

Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters.

fill : object, optional

The value to use where a window is empty, i.e., contains no items.

Returns:
out : ndarray, shape (n_windows,)

The value of the statistic for each window.

windows : ndarray, int, shape (n_windows, 2)

The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates.

counts : ndarray, int, shape (n_windows,)

The number of items in each window.

Notes

The window stop positions are included within a window.

The final window will be truncated to the specified stop position, and so may be smaller than the other windows.

Examples

Count non-zero (i.e., True) items in non-overlapping windows:

>>> import allel
>>> pos = [1, 7, 12, 15, 28]
>>> values = [True, False, True, False, False]
>>> nnz, windows, counts = allel.windowed_statistic(
...     pos, values, statistic=np.count_nonzero, size=10
... )
>>> nnz
array([1, 1, 0])
>>> windows
array([[ 1, 10],
       [11, 20],
       [21, 28]])
>>> counts
array([2, 2, 1])

Compute a sum over items in half-overlapping windows:

>>> values = [3, 4, 2, 6, 9]
>>> x, windows, counts = allel.windowed_statistic(
...     pos, values, statistic=np.sum, size=10, step=5, fill=0
... )
>>> x
array([ 7, 12,  8,  0,  9])
>>> windows
array([[ 1, 10],
       [ 6, 15],
       [11, 20],
       [16, 25],
       [21, 28]])
>>> counts
array([2, 3, 2, 0, 1])
allel.windowed_count(pos, size=None, start=None, stop=None, step=None, windows=None)[source]

Count the number of items in windows over a single chromosome/contig.

Parameters:
pos : array_like, int, shape (n_items,)

The item positions in ascending order, using 1-based coordinates..

size : int, optional

The window size (number of bases).

start : int, optional

The position at which to start (1-based).

stop : int, optional

The position at which to stop (1-based).

step : int, optional

The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.

windows : array_like, int, shape (n_windows, 2), optional

Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters.

Returns:
counts : ndarray, int, shape (n_windows,)

The number of items in each window.

windows : ndarray, int, shape (n_windows, 2)

The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates.

Notes

The window stop positions are included within a window.

The final window will be truncated to the specified stop position, and so may be smaller than the other windows.

Examples

Non-overlapping windows:

>>> import allel
>>> pos = [1, 7, 12, 15, 28]
>>> counts, windows = allel.windowed_count(pos, size=10)
>>> counts
array([2, 2, 1])
>>> windows
array([[ 1, 10],
       [11, 20],
       [21, 28]])

Half-overlapping windows:

>>> counts, windows = allel.windowed_count(pos, size=10, step=5)
>>> counts
array([2, 3, 2, 0, 1])
>>> windows
array([[ 1, 10],
       [ 6, 15],
       [11, 20],
       [16, 25],
       [21, 28]])
allel.per_base(x, windows, is_accessible=None, fill=nan)[source]

Calculate the per-base value of a windowed statistic.

Parameters:
x : array_like, shape (n_windows,)

The statistic to average per-base.

windows : array_like, int, shape (n_windows, 2)

The windows used, as an array of (window_start, window_stop) positions using 1-based coordinates.

is_accessible : array_like, bool, shape (len(contig),), optional

Boolean array indicating accessibility status for all positions in the chromosome/contig.

fill : object, optional

Use this value where there are no accessible bases in a window.

Returns:
y : ndarray, float, shape (n_windows,)

The input array divided by the number of (accessible) bases in each window.

n_bases : ndarray, int, shape (n_windows,)

The number of (accessible) bases in each window

allel.equally_accessible_windows(is_accessible, size, start=0, stop=None, step=None)[source]

Create windows each containing the same number of accessible bases.

Parameters:
is_accessible : array_like, bool, shape (n_bases,)

Array defining accessible status of all bases on a contig/chromosome.

size : int

Window size (number of accessible bases).

start : int, optional

The genome position at which to start.

stop : int, optional

The genome position at which to stop.

step : int, optional

The number of accessible sites between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows. Use half the window size to get half-overlapping windows.

Returns:
windows : ndarray, int, shape (n_windows, 2)

Window start/stop positions (1-based).