Release notes¶
v0.20.0¶
- Added new
allel.model.dask
module, providing implementations of the genotype, haplotype and allele counts classes backed by dask.array (#32). - Released the GIL where possible in Cython optimised functions (#43).
- Changed functions in
allel.stats.selection
that accept min_ehh argument, such that min_ehh = None should now be used to indicate that no minimum EHH threshold should be applied.
v0.19.0¶
The major change in v0.19.0 is the addition of the new
allel.model.chunked
module, which provides classes for variant
call data backed by chunked array storage (#31). This is a
generalisation of the previously available allel.model.bcolz
to
enable the use of both bcolz and HDF5 (via h5py) as backing
storage. The allel.model.bcolz
module is now deprecated but
will be retained for backwargs compatibility until the next major
release.
Other changes:
- Added function for computing the number of segregating sites by length
(nSl), a summary statistic comparing haplotype homozygosity between
different alleles (similar to IHS), see
allel.stats.selection.nsl()
(#40). - Added functions for computing haplotype diversity, see
allel.stats.selection.haplotype_diversity()
andallel.stats.selection.moving_haplotype_diversity()
(#29). - Added function
allel.stats.selection.plot_moving_haplotype_frequencies()
for visualising haplotype frequency spectra in moving windows over the genome (#30). - Added vstack() and hstack() methods to genotype and haplotype arrays to enable combining data from multiple arrays (#21).
- Added convenience function
allel.stats.window.equally_accessible_windows()
(#16). - Added methods from_hdf5_group() and to_hdf5_group() to
allel.model.ndarray.VariantTable
(#26). - Added
allel.util.hdf5_cache()
utility function. - Modified functions in the
allel.stats.selection
module that depend on calculation of integrated haplotype homozygosity to return NaN when haplotypes do not decay below a specified threshold (#39). - Fixed missing return value in
allel.stats.selection.plot_voight_painting()
(#23). - Fixed return type from array reshape() (#34).
v0.18.1¶
- Minor change to the Garud H statistics to avoid raising an exception when the number of distinct haplotypes is very low (#20).
v0.18.0¶
- Added functions for computing H statistics for detecting signatures of soft
sweeps, see
allel.stats.selection.garud_h()
,allel.stats.selection.moving_garud_h()
,allel.stats.selection.plot_haplotype_frequencies()
(#19). - Added function
allel.stats.selection.fig_voight_painting()
to paint both flanks either side of some variant under selection in a single figure (#17). - Changed return values from
allel.stats.selection.voight_painting()
to also return the indices used for sorting haplotypes by prefix (#18).
v0.17.0¶
- Added new module for computing and plotting site frequency spectra, see
allel.stats.sf
(#12). - All plotting functions have been moved into the appropriate stats module
that they naturally correspond to. The
allel.plot
module is deprecated (#13). - Improved performance of carray and ctable loading from HDF5 with a condition (#11).
v0.16.2¶
- Fixed behaviour of take() method on compressed arrays when indices are not in increasing order (#6).
- Minor change to scaler argument to PCA functions in
allel.stats.decomposition
to avoid confusion about when to fall back to default scaler (#7).
v0.16.1¶
- Added block-wise implementation to
allel.stats.ld.locate_unlinked()
so it can be used with compressed arrays as input.
v0.16.0¶
- Added new selection module with functions for haplotype-based analyses of
recent selection, see
allel.stats.selection
.
v0.15.2¶
- Improved performance of
allel.model.bcolz.carray_block_compress()
,allel.model.bcolz.ctable_block_compress()
andallel.model.bcolz.carray_block_subset()
for very sparse selections. - Fix bug in IPython HTML table captions.
- Fix bug in addcol() method on bcolz ctable wrappers.
v0.15.1¶
- Fix missing package in setup.py.
v0.15¶
- Added functions to estimate Fst with standard error via a
block-jackknife:
allel.stats.fst.blockwise_weir_cockerham_fst()
,allel.stats.fst.blockwise_hudson_fst()
,allel.stats.fst.blockwise_patterson_fst()
. - Fixed a serious bug in
allel.stats.fst.weir_cockerham_fst()
related to incorrect estimation of heterozygosity, which manifested if the subpopulations being compared were not a partition of the total population (i.e., there were one or more samples in the genotype array that were not included in the subpopulations to compare). - Added method
allel.model.AlleleCountsArray.max_allele()
to determine highest allele index for each variant. - Changed first return value from admixture functions
allel.stats.admixture.blockwise_patterson_f3()
andallel.stats.admixture.blockwise_patterson_d()
to return the estimator from the whole dataset. - Added utility functions to the
allel.stats.distance
module for transforming coordinates between condensed and uncondensed forms of a distance matrix. - Classes previously available from the allel.model and
allel.bcolz modules are now aliased from the root
allel
module for convenience. These modules have been reorganised into anallel.model
package with sub-modulesallel.model.ndarray
andallel.model.bcolz
. - All functions in the
allel.model.bcolz
module use cparams from input carray as default for output carray (convenient if you, e.g., want to use zlib level 1 throughout). - All classes in the
allel.model.ndarray
andallel.model.bcolz
modules have changed the default value for the copy keyword argument to False. This means that not copying the input data, just wrapping it, is now the default behaviour. - Fixed bug in
GenotypeArray.to_gt()
where maximum allele index is zero.
v0.14¶
- Added a new module
allel.stats.admixture
with statistical tests for admixture between populations, implementing the f2, f3 and D statistics from Patterson (2012). Functions includeallel.stats.admixture.blockwise_patterson_f3()
andallel.stats.admixture.blockwise_patterson_d()
which compute the f3 and D statistics respectively in blocks of a given number of variants and perform a block-jackknife to estimate the standard error.
v0.12¶
- Added functions for principal components analysis of genotype
data. Functions in the new module
allel.stats.decomposition
includeallel.stats.decomposition.pca()
to perform a PCA via full singular value decomposition, andallel.stats.decomposition.randomized_pca()
which uses an approximate truncated singular value decomposition to speed up computation. In tests with real data the randomized PCA is around 5 times faster and uses half as much memory as the conventional PCA, producing highly similar results. - Added function
allel.stats.distance.pcoa()
for principal coordinate analysis (a.k.a. classical multi-dimensional scaling) of a distance matrix. - Added new utility module
allel.stats.preprocessing
with classes for scaling genotype data prior to use as input for PCA or PCoA. By default the scaling (i.e., normalization) of Patterson (2006) is used with principal components analysis functions in theallel.stats.decomposition
module. Scaling functions can improve the ability to resolve population structure via PCA or PCoA. - Added method
allel.model.GenotypeArray.to_n_ref()
. Also addeddtype
argument toallel.model.GenotypeArray.to_n_ref()
andallel.model.GenotypeArray.to_n_alt()
methods to enable direct output as float arrays, which can be convenient if these arrays are then going to be scaled for use in PCA or PCoA. - Added
allel.model.GenotypeArray.mask
property which can be set with a Boolean mask to filter genotype calls from genotype and allele counting operations. A similar property is available on theallel.bcolz.GenotypeCArray
class. Also added methodallel.model.GenotypeArray.fill_masked()
and similar method on theallel.bcolz.GenotypeCArray
class to fill masked genotype calls with a value (e.g., -1).
v0.11¶
- Added functions for calculating Watterson’s theta (proportional to
the number of segregating variants):
allel.stats.diversity.watterson_theta()
for calculating over a given region, andallel.stats.diversity.windowed_watterson_theta()
for calculating in windows over a chromosome/contig. - Added functions for calculating Tajima’s D statistic (balance
between nucleotide diversity and number of segregating sites):
allel.stats.diversity.tajima_d()
for calculating over a given region andallel.stats.diversity.windowed_tajima_d()
for calculating in windows over a chromosome/contig. - Added
allel.stats.diversity.windowed_df()
for calculating the rate of fixed differences between two populations. - Added function
allel.model.locate_fixed_differences()
for locating variants that are fixed for different alleles in two different populations. - Added function
allel.model.locate_private_alleles()
for locating alleles and variants that are private to a single population.
v0.10¶
- Added functions implementing the Weir and Cockerham (1984)
estimators for F-statistics:
allel.stats.fst.weir_cockerham_fst()
andallel.stats.fst.windowed_weir_cockerham_fst()
. - Added functions implementing the Hudson (1992) estimator for Fst:
allel.stats.fst.hudson_fst()
andallel.stats.fst.windowed_hudson_fst()
. - Added new module
allel.stats.ld
with functions for calculating linkage disequilibrium estimators, includingallel.stats.ld.rogers_huff_r()
for pairwise variant LD calculation,allel.stats.ld.windowed_r_squared()
for windowed LD calculations, andallel.stats.ld.locate_unlinked()
for locating variants in approximate linkage equilibrium. - Added function
allel.plot.pairwise_ld()
for visualising a matrix of linkage disequilbrium values between pairs of variants. - Added function
allel.model.create_allele_mapping()
for creating a mapping of alleles into a different index system, i.e., if you want 0 and 1 to represent something other than REF and ALT, e.g., ancestral and derived. Also added methodsallel.model.GenotypeArray.map_alleles()
,allel.model.HaplotypeArray.map_alleles()
andallel.model.AlleleCountsArray.map_alleles()
which will perform an allele transformation given an allele mapping. - Added function
allel.plot.variant_locator()
ported across from anhima. - Refactored the
allel.stats
module into a package with sub-modules for easier maintenance.
v0.9¶
- Added documentation for the functions
allel.bcolz.carray_from_hdf5()
,allel.bcolz.carray_to_hdf5()
,allel.bcolz.ctable_from_hdf5_group()
,allel.bcolz.ctable_to_hdf5_group()
. - Refactoring of internals within the
allel.bcolz
module.
v0.8¶
- Added subpop argument to
allel.model.GenotypeArray.count_alleles()
andallel.model.HaplotypeArray.count_alleles()
to enable count alleles within a sub-population without subsetting the array. - Added functions
allel.model.GenotypeArray.count_alleles_subpops()
andallel.model.HaplotypeArray.count_alleles_subpops()
to enable counting alleles in multiple sub-populations in a single pass over the array, without sub-setting. - Added classes
allel.model.FeatureTable
andallel.bcolz.FeatureCTable
for storing and querying data on genomic features (genes, etc.), with functions for parsing from a GFF3 file. - Added convenience function
allel.stats.distance.pairwise_dxy()
for computing a distance matrix using Dxy as the metric.
v0.7¶
- Added function
allel.io.write_fasta()
for writing a nucleotide sequence stored as a NumPy array out to a FASTA format file.
v0.6¶
- Added method
allel.model.VariantTable.to_vcf()
for writing a variant table to a VCF format file.