# F-statistics¶

`allel.``weir_cockerham_fst`(g, subpops, max_allele=None, blen=None)[source]

Compute the variance components from the analyses of variance of allele frequencies according to Weir and Cockerham (1984).

Parameters: g : array_like, int, shape (n_variants, n_samples, ploidy) Genotype array. subpops : sequence of sequences of ints Sample indices for each subpopulation. max_allele : int, optional The highest allele index to consider. blen : int, optional Block length to use for chunked computation. a : ndarray, float, shape (n_variants, n_alleles) Component of variance between populations. b : ndarray, float, shape (n_variants, n_alleles) Component of variance between individuals within populations. c : ndarray, float, shape (n_variants, n_alleles) Component of variance between gametes within individuals.

Examples

Calculate variance components from some genotype data:

```>>> import allel
>>> g = [[[0, 0], [0, 0], [1, 1], [1, 1]],
...      [[0, 1], [0, 1], [0, 1], [0, 1]],
...      [[0, 0], [0, 0], [0, 0], [0, 0]],
...      [[0, 1], [1, 2], [1, 1], [2, 2]],
...      [[0, 0], [1, 1], [0, 1], [-1, -1]]]
>>> subpops = [[0, 1], [2, 3]]
>>> a, b, c = allel.stats.weir_cockerham_fst(g, subpops)
>>> a
array([[ 0.5  ,  0.5  ,  0.   ],
[ 0.   ,  0.   ,  0.   ],
[ 0.   ,  0.   ,  0.   ],
[ 0.   , -0.125, -0.125],
[-0.375, -0.375,  0.   ]])
>>> b
array([[ 0.        ,  0.        ,  0.        ],
[-0.25      , -0.25      ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.125     ,  0.25      ],
[ 0.41666667,  0.41666667,  0.        ]])
>>> c
array([[ 0.        ,  0.        ,  0.        ],
[ 0.5       ,  0.5       ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.125     ,  0.25      ,  0.125     ],
[ 0.16666667,  0.16666667,  0.        ]])
```

Estimate the parameter theta (a.k.a., Fst) for each variant and each allele individually:

```>>> fst = a / (a + b + c)
>>> fst
array([[ 1. ,  1. ,  nan],
[ 0. ,  0. ,  nan],
[ nan,  nan,  nan],
[ 0. , -0.5, -0.5],
[-1.8, -1.8,  nan]])
```

Estimate Fst for each variant individually (averaging over alleles):

```>>> fst = (np.sum(a, axis=1) /
...        (np.sum(a, axis=1) + np.sum(b, axis=1) + np.sum(c, axis=1)))
>>> fst
array([ 1. ,  0. ,  nan, -0.4, -1.8])
```

Estimate Fst averaging over all variants and alleles:

```>>> fst = np.sum(a) / (np.sum(a) + np.sum(b) + np.sum(c))
>>> fst
-4.3680905886891398e-17
```

Note that estimated Fst values may be negative.

`allel.``hudson_fst`(ac1, ac2, fill=nan)[source]

Calculate the numerator and denominator for Fst estimation using the method of Hudson (1992) elaborated by Bhatia et al. (2013).

Parameters: ac1 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the first population. ac2 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the second population. fill : float Use this value where there are no pairs to compare (e.g., all allele calls are missing). num : ndarray, float, shape (n_variants,) Divergence between the two populations minus average of diversity within each population. den : ndarray, float, shape (n_variants,) Divergence between the two populations.

Examples

Calculate numerator and denominator for Fst estimation:

```>>> import allel
>>> g = allel.GenotypeArray([[[0, 0], [0, 0], [1, 1], [1, 1]],
...                          [[0, 1], [0, 1], [0, 1], [0, 1]],
...                          [[0, 0], [0, 0], [0, 0], [0, 0]],
...                          [[0, 1], [1, 2], [1, 1], [2, 2]],
...                          [[0, 0], [1, 1], [0, 1], [-1, -1]]])
>>> subpops = [[0, 1], [2, 3]]
>>> ac1 = g.count_alleles(subpop=subpops[0])
>>> ac2 = g.count_alleles(subpop=subpops[1])
>>> num, den = allel.stats.hudson_fst(ac1, ac2)
>>> num
array([ 1.        , -0.16666667,  0.        , -0.125     , -0.33333333])
>>> den
array([ 1.   ,  0.5  ,  0.   ,  0.625,  0.5  ])
```

Estimate Fst for each variant individually:

```>>> fst = num / den
>>> fst
array([ 1.        , -0.33333333,         nan, -0.2       , -0.66666667])
```

Estimate Fst averaging over variants:

```>>> fst = np.sum(num) / np.sum(den)
>>> fst
0.1428571428571429
```
`allel.``patterson_fst`(aca, acb)[source]

Estimator of differentiation between populations A and B based on the F2 parameter.

Parameters: aca : array_like, int, shape (n_variants, 2) Allele counts for population A. acb : array_like, int, shape (n_variants, 2) Allele counts for population B. num : ndarray, shape (n_variants,), float Numerator. den : ndarray, shape (n_variants,), float Denominator.

Notes

See Patterson (2012), Appendix A.

TODO check if this is numerically equivalent to Hudson’s estimator.

`allel.``average_weir_cockerham_fst`(g, subpops, blen, max_allele=None)[source]

Estimate average Fst and standard error using the block-jackknife.

Parameters: g : array_like, int, shape (n_variants, n_samples, ploidy) Genotype array. subpops : sequence of sequences of ints Sample indices for each subpopulation. blen : int Block size (number of variants). max_allele : int, optional The highest allele index to consider. fst : float Estimated value of the statistic using all data. se : float Estimated standard error. vb : ndarray, float, shape (n_blocks,) Value of the statistic in each block. vj : ndarray, float, shape (n_blocks,) Values of the statistic from block-jackknife resampling.
`allel.``average_hudson_fst`(ac1, ac2, blen)[source]

Estimate average Fst between two populations and standard error using the block-jackknife.

Parameters: ac1 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the first population. ac2 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the second population. blen : int Block size (number of variants). fst : float Estimated value of the statistic using all data. se : float Estimated standard error. vb : ndarray, float, shape (n_blocks,) Value of the statistic in each block. vj : ndarray, float, shape (n_blocks,) Values of the statistic from block-jackknife resampling.
`allel.``average_patterson_fst`(ac1, ac2, blen)[source]

Estimate average Fst between two populations and standard error using the block-jackknife.

Parameters: ac1 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the first population. ac2 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the second population. blen : int Block size (number of variants). fst : float Estimated value of the statistic using all data. se : float Estimated standard error. vb : ndarray, float, shape (n_blocks,) Value of the statistic in each block. vj : ndarray, float, shape (n_blocks,) Values of the statistic from block-jackknife resampling.
`allel.``moving_weir_cockerham_fst`(g, subpops, size, start=0, stop=None, step=None, max_allele=None)[source]

Estimate average Fst in moving windows over a single chromosome/contig, following the method of Weir and Cockerham (1984).

Parameters: g : array_like, int, shape (n_variants, n_samples, ploidy) Genotype array. subpops : sequence of sequences of ints Sample indices for each subpopulation. size : int The window size (number of variants). start : int, optional The index at which to start. stop : int, optional The index at which to stop. step : int, optional The number of variants between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows. max_allele : int, optional The highest allele index to consider. fst : ndarray, float, shape (n_windows,) Average Fst in each window.
`allel.``moving_hudson_fst`(ac1, ac2, size, start=0, stop=None, step=None)[source]

Estimate average Fst in moving windows over a single chromosome/contig, following the method of Hudson (1992) elaborated by Bhatia et al. (2013).

Parameters: ac1 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the first population. ac2 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the second population. size : int The window size (number of variants). start : int, optional The index at which to start. stop : int, optional The index at which to stop. step : int, optional The number of variants between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows. fst : ndarray, float, shape (n_windows,) Average Fst in each window.
`allel.``moving_patterson_fst`(ac1, ac2, size, start=0, stop=None, step=None)[source]

Estimate average Fst in moving windows over a single chromosome/contig, following the method of Patterson (2012).

Parameters: ac1 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the first population. ac2 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the second population. size : int The window size (number of variants). start : int, optional The index at which to start. stop : int, optional The index at which to stop. step : int, optional The number of variants between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows. fst : ndarray, float, shape (n_windows,) Average Fst in each window.
`allel.``windowed_weir_cockerham_fst`(pos, g, subpops, size=None, start=None, stop=None, step=None, windows=None, fill=nan, max_allele=None)[source]

Estimate average Fst in windows over a single chromosome/contig, following the method of Weir and Cockerham (1984).

Parameters: pos : array_like, int, shape (n_items,) Variant positions, using 1-based coordinates, in ascending order. g : array_like, int, shape (n_variants, n_samples, ploidy) Genotype array. subpops : sequence of sequences of ints Sample indices for each subpopulation. size : int The window size (number of bases). start : int, optional The position at which to start (1-based). stop : int, optional The position at which to stop (1-based). step : int, optional The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows. windows : array_like, int, shape (n_windows, 2), optional Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters. fill : object, optional The value to use where there are no variants within a window. max_allele : int, optional The highest allele index to consider. fst : ndarray, float, shape (n_windows,) Average Fst in each window. windows : ndarray, int, shape (n_windows, 2) The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates. counts : ndarray, int, shape (n_windows,) Number of variants in each window.
`allel.``windowed_hudson_fst`(pos, ac1, ac2, size=None, start=None, stop=None, step=None, windows=None, fill=nan)[source]

Estimate average Fst in windows over a single chromosome/contig, following the method of Hudson (1992) elaborated by Bhatia et al. (2013).

Parameters: pos : array_like, int, shape (n_items,) Variant positions, using 1-based coordinates, in ascending order. ac1 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the first population. ac2 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the second population. size : int, optional The window size (number of bases). start : int, optional The position at which to start (1-based). stop : int, optional The position at which to stop (1-based). step : int, optional The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows. windows : array_like, int, shape (n_windows, 2), optional Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters. fill : object, optional The value to use where there are no variants within a window. fst : ndarray, float, shape (n_windows,) Average Fst in each window. windows : ndarray, int, shape (n_windows, 2) The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates. counts : ndarray, int, shape (n_windows,) Number of variants in each window.
`allel.``windowed_patterson_fst`(pos, ac1, ac2, size=None, start=None, stop=None, step=None, windows=None, fill=nan)[source]

Estimate average Fst in windows over a single chromosome/contig, following the method of Patterson (2012).

Parameters: pos : array_like, int, shape (n_items,) Variant positions, using 1-based coordinates, in ascending order. ac1 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the first population. ac2 : array_like, int, shape (n_variants, n_alleles) Allele counts array from the second population. size : int, optional The window size (number of bases). start : int, optional The position at which to start (1-based). stop : int, optional The position at which to stop (1-based). step : int, optional The distance between start positions of windows. If not given, defaults to the window size, i.e., non-overlapping windows. windows : array_like, int, shape (n_windows, 2), optional Manually specify the windows to use as a sequence of (window_start, window_stop) positions, using 1-based coordinates. Overrides the size/start/stop/step parameters. fill : object, optional The value to use where there are no variants within a window. fst : ndarray, float, shape (n_windows,) Average Fst in each window. windows : ndarray, int, shape (n_windows, 2) The windows used, as an array of (window_start, window_stop) positions, using 1-based coordinates. counts : ndarray, int, shape (n_windows,) Number of variants in each window.