Selection¶

Integrated haplotype score (IHS)¶

allel.ihs(h, pos, map_pos=None, min_ehh=0.05, min_maf=0.05, include_edges=False, gap_scale=20000, max_gap=200000, is_accessible=None, use_threads=True)[source]¶

Compute the unstandardized integrated haplotype score (IHS) for each variant, comparing integrated haplotype homozygosity between the reference (0) and alternate (1) alleles.

Parameters:

h : array_like, int, shape (n_variants, n_haplotypes): Haplotype array.
pos : array_like, int, shape (n_variants,): Variant positions (physical distance).
map_pos : array_like, float, shape (n_variants,): Variant positions (genetic map distance).
min_ehh: float, optional: Minimum EHH beyond which to truncate integrated haplotype homozygosity calculation.
min_maf : float, optional: Do not compute integrated haplotype homozogysity for variants with minor allele frequency below this value.
include_edges : bool, optional: If True, report scores even if EHH does not decay below min_ehh before reaching the edge of the data.
gap_scale : int, optional: Rescale distance between variants if gap is larger than this value.
max_gap : int, optional: Do not report scores if EHH spans a gap larger than this number of base pairs.
is_accessible : array_like, bool, optional: Genome accessibility array. If provided, distance between variants will be computed as the number of accessible bases between them.
use_threads : bool, optional: If True use multiple threads to compute.

Returns:

score : ndarray, float, shape (n_variants,): Unstandardized IHS scores.

See also

standardize_by_allele_count

Notes

This function will calculate IHS for all variants. To exclude variants below a given minor allele frequency, filter the input haplotype array before passing to this function.

This function computes IHS comparing the reference and alternate alleles. These can be polarised by switching the sign for any variant where the reference allele is derived.

This function returns NaN for any IHS calculations where haplotype homozygosity does not decay below min_ehh before reaching the first or last variant. To disable this behaviour, set include_edges to True.

Note that the unstandardized score is returned. Usually these scores are then standardized in different allele frequency bins.

Cross-population extended haplotype homozygosity (XPEHH)¶

allel.xpehh(h1, h2, pos, map_pos=None, min_ehh=0.05, include_edges=False, gap_scale=20000, max_gap=200000, is_accessible=None, use_threads=True)[source]¶

Compute the unstandardized cross-population extended haplotype homozygosity score (XPEHH) for each variant.

Parameters:

h1 : array_like, int, shape (n_variants, n_haplotypes): Haplotype array for the first population.
h2 : array_like, int, shape (n_variants, n_haplotypes): Haplotype array for the second population.
pos : array_like, int, shape (n_variants,): Variant positions on physical or genetic map.
map_pos : array_like, float, shape (n_variants,): Variant positions (genetic map distance).
min_ehh: float, optional: Minimum EHH beyond which to truncate integrated haplotype homozygosity calculation.
include_edges : bool, optional: If True, report scores even if EHH does not decay below min_ehh before reaching the edge of the data.
gap_scale : int, optional: Rescale distance between variants if gap is larger than this value.
max_gap : int, optional: Do not report scores if EHH spans a gap larger than this number of base pairs.
is_accessible : array_like, bool, optional: Genome accessibility array. If provided, distance between variants will be computed as the number of accessible bases between them.
use_threads : bool, optional: If True use multiple threads to compute.

Returns:

score : ndarray, float, shape (n_variants,): Unstandardized XPEHH scores.

See also

standardize

Notes

This function will calculate XPEHH for all variants. To exclude variants below a given minor allele frequency, filter the input haplotype arrays before passing to this function.

This function returns NaN for any EHH calculations where haplotype homozygosity does not decay below min_ehh before reaching the first or last variant. To disable this behaviour, set include_edges to True.

Note that the unstandardized score is returned. Usually these scores are then standardized genome-wide.

Haplotype arrays from the two populations may have different numbers of haplotypes.

Number of segregating sites by length (NSL)¶

allel.nsl(h, use_threads=True)[source]¶

Compute the unstandardized number of segregating sites by length (nSl) for each variant, comparing the reference and alternate alleles, after Ferrer-Admetlla et al. (2014).

Parameters:	h : array_like, int, shape (n_variants, n_haplotypes) Haplotype array. use_threads : bool, optional If True use multiple threads to compute.
Returns:	score : ndarray, float, shape (n_variants,)

See also

standardize_by_allele_count

Notes

This function will calculate nSl for all variants. To exclude variants below a given minor allele frequency, filter the input haplotype array before passing to this function.

This function computes nSl by comparing the reference and alternate alleles. These can be polarised by switching the sign for any variant where the reference allele is derived.

This function does nothing about nSl calculations where haplotype homozygosity extends up to the first or last variant. There may be edge effects.

Note that the unstandardized score is returned. Usually these scores are then standardized in different allele frequency bins.

allel.xpnsl(h1, h2, use_threads=True)[source]¶

Cross-population version of the NSL statistic.

Parameters:	h1 : array_like, int, shape (n_variants, n_haplotypes) Haplotype array for the first population. h2 : array_like, int, shape (n_variants, n_haplotypes) Haplotype array for the second population. use_threads : bool, optional If True use multiple threads to compute.
Returns:	score : ndarray, float, shape (n_variants,) Unstandardized XPNSL scores.

Haplotype diversity, Garud’s H statistics¶

allel.garud_h(h)[source]¶

Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015).

Parameters:

h : array_like, int, shape (n_variants, n_haplotypes): Haplotype array.

Returns:

h1 : float: H1 statistic (sum of squares of haplotype frequencies).
h12 : float: H12 statistic (sum of squares of haplotype frequencies, combining the two most common haplotypes into a single frequency).
h123 : float: H123 statistic (sum of squares of haplotype frequencies, combining the three most common haplotypes into a single frequency).
h2_h1 : float: H2/H1 statistic, indicating the “softness” of a sweep.

allel.moving_garud_h(h, size, start=0, stop=None, step=None)[source]¶

Compute the H1, H12, H123 and H2/H1 statistics for detecting signatures of soft sweeps, as defined in Garud et al. (2015), in moving windows,

Parameters:

h : array_like, int, shape (n_variants, n_haplotypes): Haplotype array.
size : int: The window size (number of variants).
start : int, optional: The index at which to start.
stop : int, optional: The index at which to stop.
step : int, optional: The number of variants between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.

Returns:

h1 : ndarray, float, shape (n_windows,): H1 statistics (sum of squares of haplotype frequencies).
h12 : ndarray, float, shape (n_windows,): H12 statistics (sum of squares of haplotype frequencies, combining the two most common haplotypes into a single frequency).
h123 : ndarray, float, shape (n_windows,): H123 statistics (sum of squares of haplotype frequencies, combining the three most common haplotypes into a single frequency).
h2_h1 : ndarray, float, shape (n_windows,): H2/H1 statistics, indicating the “softness” of a sweep.

allel.haplotype_diversity(h)[source]¶

Estimate haplotype diversity.

Parameters:	h : array_like, int, shape (n_variants, n_haplotypes) Haplotype array.
Returns:	hd : float Haplotype diversity.

allel.moving_haplotype_diversity(h, size, start=0, stop=None, step=None)[source]¶

Estimate haplotype diversity in moving windows.

Parameters:	h : array_like, int, shape (n_variants, n_haplotypes) Haplotype array. size : int The window size (number of variants). start : int, optional The index at which to start. stop : int, optional The index at which to stop. step : int, optional The number of variants between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.
Returns:	hd : ndarray, float, shape (n_windows,) Haplotype diversity.

Population branching statistic (PBS)¶

allel.pbs(ac1, ac2, ac3, window_size, window_start=0, window_stop=None, window_step=None, normed=True)[source]¶

Compute the population branching statistic (PBS) which performs a comparison of allele frequencies between three populations to detect genome regions that are unusually differentiated in one population relative to the other two populations.

Parameters:

ac1 : array_like, int: Allele counts from the first population.
ac2 : array_like, int: Allele counts from the second population.
ac3 : array_like, int: Allele counts from the third population.
window_size : int: The window size (number of variants) within which to compute PBS values.
window_start : int, optional: The variant index at which to start windowed calculations.
window_stop : int, optional: The variant index at which to stop windowed calculations.
window_step : int, optional: The number of variants between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.
normed : bool, optional: If True (default), use the normalised version of PBS, also known as PBSn1 [2]. Otherwise, use the PBS statistic as originally defined in [1].

Returns:

pbs : ndarray, float: Windowed PBS values.

Notes

The F:sub:ST calculations use Hudson’s estimator.

References

[1]	Yi et al., “Sequencing of Fifty Human Exomes Reveals Adaptation to High Altitude”, Science, 329(5987): 75–78, 2 July 2010.

[2]	Malaspinas et al., “A genomic history of Aboriginal Australia”, Nature. volume 538, pages 207–214, 13 October 2016.

Delta Tajima’s D¶

allel.moving_delta_tajima_d(ac1, ac2, size, start=0, stop=None, step=None)[source]¶

Compute the difference in Tajima’s D between two populations in moving windows.

Parameters:

ac1 : array_like, int, shape (n_variants, n_alleles): Allele counts array for the first population.
ac2 : array_like, int, shape (n_variants, n_alleles): Allele counts array for the second population.
size : int: The window size (number of variants).
start : int, optional: The index at which to start.
stop : int, optional: The index at which to stop.
step : int, optional: The number of variants between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows.

Returns:

delta_d : ndarray, float, shape (n_windows,): Standardized delta Tajima’s D.

See also

allel.stats.diversity.moving_tajima_d

Utilities and plotting functions¶

allel.standardize(score)[source]¶: Centre and scale to unit variance.

allel.standardize_by_allele_count(score, aac, bins=None, n_bins=None, diagnostics=True)[source]¶

Standardize score within allele frequency bins.

Parameters:	score : array_like, float The score to be standardized, e.g., IHS or NSL. aac : array_like, int An array of alternate allele counts. bins : array_like, int, optional Allele count bins, overrides n_bins. n_bins : int, optional Number of allele count bins to use. diagnostics : bool, optional If True, plot some diagnostic information about the standardization.
Returns:	score_standardized : ndarray, float Standardized scores. bins : ndarray, int Allele count bins used for standardization.

allel.ehh_decay(h, truncate=False)[source]¶

Compute the decay of extended haplotype homozygosity (EHH) moving away from the first variant.

Parameters:	h : array_like, int, shape (n_variants, n_haplotypes) Haplotype array. truncate : bool, optional If True, the return array will exclude trailing zeros.
Returns:	ehh : ndarray, float, shape (n_variants, ) EHH at successive variants from the first variant.

allel.voight_painting(h)[source]¶

Paint haplotypes, assigning a unique integer to each shared haplotype prefix.

Parameters:	h : array_like, int, shape (n_variants, n_haplotypes) Haplotype array.
Returns:	painting : ndarray, int, shape (n_variants, n_haplotypes) Painting array. indices : ndarray, int, shape (n_hapotypes,) Haplotype indices after sorting by prefix.

allel.plot_voight_painting(painting, palette='colorblind', flank='right', ax=None, height_factor=0.01)[source]¶

Plot a painting of shared haplotype prefixes.

Parameters:

painting : array_like, int, shape (n_variants, n_haplotypes): Painting array.
ax : axes, optional: The axes on which to draw. If not provided, a new figure will be created.
palette : string, optional: A Seaborn palette name.
flank : {‘right’, ‘left’}, optional: If left, painting will be reversed along first axis.
height_factor : float, optional: If no axes provided, determine height of figure by multiplying height of painting array by this number.

Returns:

ax : axes

allel.fig_voight_painting(h, index=None, palette='colorblind', height_factor=0.01, fig=None)[source]¶

Make a figure of shared haplotype prefixes for both left and right flanks, centred on some variant of choice.

Parameters:

h : array_like, int, shape (n_variants, n_haplotypes): Haplotype array.
index : int, optional: Index of the variant within the haplotype array to centre on. If not provided, the middle variant will be used.
palette : string, optional: A Seaborn palette name.
height_factor : float, optional: If no axes provided, determine height of figure by multiplying height of painting array by this number.
fig : figure: The figure on which to draw. If not provided, a new figure will be created.

Returns:

fig : figure

Notes

N.B., the ordering of haplotypes on the left and right flanks will be different. This means that haplotypes on the right flank will not correspond to haplotypes on the left flank at the same vertical position.

allel.plot_haplotype_frequencies(h, palette='Paired', singleton_color='w', ax=None)[source]¶

Plot haplotype frequencies.

Parameters:	h : array_like, int, shape (n_variants, n_haplotypes) Haplotype array. palette : string, optional A Seaborn palette name. singleton_color : string, optional Color to paint singleton haplotypes. ax : axes, optional The axes on which to draw. If not provided, a new figure will be created.
Returns:	ax : axes

allel.plot_moving_haplotype_frequencies(pos, h, size, start=0, stop=None, n=None, palette='Paired', singleton_color='w', ax=None)[source]¶

Plot haplotype frequencies in moving windows over the genome.

Parameters:

pos : array_like, int, shape (n_items,): Variant positions, using 1-based coordinates, in ascending order.
h : array_like, int, shape (n_variants, n_haplotypes): Haplotype array.
size : int: The window size (number of variants).
start : int, optional: The index at which to start.
stop : int, optional: The index at which to stop.
n : int, optional: Color only the n most frequent haplotypes (by default, all non-singleton haplotypes are colored).
palette : string, optional: A Seaborn palette name.
singleton_color : string, optional: Color to paint singleton haplotypes.
ax : axes, optional: The axes on which to draw. If not provided, a new figure will be created.

Returns:

ax : axes