Release notes


  • Fixed a bug in the count_alleles() methods on genotype and haplotype array classes that manifested if the max_allele argument was provided (#59.
  • Fixed a bug in Jupyter notebook display method for chunked tables (#57.
  • Fixed a bug in site frequency spectrum scaling functions (#54.
  • Changed behaviour of subset method on genotype and haplotype arrays to better infer argument types and handle None argument values (#55).
  • Changed table eval and query methods to make python the default for expression evaluation, because it is more expressive than numexpr (#58).




  • Added new allel.model.dask module, providing implementations of the genotype, haplotype and allele counts classes backed by dask.array (#32).
  • Released the GIL where possible in Cython optimised functions (#43).
  • Changed functions in allel.stats.selection that accept min_ehh argument, such that min_ehh = None should now be used to indicate that no minimum EHH threshold should be applied.


The major change in v0.19.0 is the addition of the new allel.model.chunked module, which provides classes for variant call data backed by chunked array storage (#31). This is a generalisation of the previously available allel.model.bcolz to enable the use of both bcolz and HDF5 (via h5py) as backing storage. The allel.model.bcolz module is now deprecated but will be retained for backwargs compatibility until the next major release.

Other changes:

Contributors: alimanfoo, hardingnj


  • Minor change to the Garud H statistics to avoid raising an exception when the number of distinct haplotypes is very low (#20).



  • Added new module for computing and plotting site frequency spectra, see allel.stats.sf (#12).
  • All plotting functions have been moved into the appropriate stats module that they naturally correspond to. The allel.plot module is deprecated (#13).
  • Improved performance of carray and ctable loading from HDF5 with a condition (#11).


  • Fixed behaviour of take() method on compressed arrays when indices are not in increasing order (#6).
  • Minor change to scaler argument to PCA functions in allel.stats.decomposition to avoid confusion about when to fall back to default scaler (#7).



  • Added new selection module with functions for haplotype-based analyses of recent selection, see allel.stats.selection.



  • Fix missing package in




  • Added functions for principal components analysis of genotype data. Functions in the new module allel.stats.decomposition include allel.stats.decomposition.pca() to perform a PCA via full singular value decomposition, and allel.stats.decomposition.randomized_pca() which uses an approximate truncated singular value decomposition to speed up computation. In tests with real data the randomized PCA is around 5 times faster and uses half as much memory as the conventional PCA, producing highly similar results.
  • Added function allel.stats.distance.pcoa() for principal coordinate analysis (a.k.a. classical multi-dimensional scaling) of a distance matrix.
  • Added new utility module allel.stats.preprocessing with classes for scaling genotype data prior to use as input for PCA or PCoA. By default the scaling (i.e., normalization) of Patterson (2006) is used with principal components analysis functions in the allel.stats.decomposition module. Scaling functions can improve the ability to resolve population structure via PCA or PCoA.
  • Added method allel.model.GenotypeArray.to_n_ref(). Also added dtype argument to allel.model.GenotypeArray.to_n_ref() and allel.model.GenotypeArray.to_n_alt() methods to enable direct output as float arrays, which can be convenient if these arrays are then going to be scaled for use in PCA or PCoA.
  • Added allel.model.GenotypeArray.mask property which can be set with a Boolean mask to filter genotype calls from genotype and allele counting operations. A similar property is available on the allel.bcolz.GenotypeCArray class. Also added method allel.model.GenotypeArray.fill_masked() and similar method on the allel.bcolz.GenotypeCArray class to fill masked genotype calls with a value (e.g., -1).



  • Added functions implementing the Weir and Cockerham (1984) estimators for F-statistics: allel.stats.fst.weir_cockerham_fst() and allel.stats.fst.windowed_weir_cockerham_fst().
  • Added functions implementing the Hudson (1992) estimator for Fst: allel.stats.fst.hudson_fst() and allel.stats.fst.windowed_hudson_fst().
  • Added new module allel.stats.ld with functions for calculating linkage disequilibrium estimators, including allel.stats.ld.rogers_huff_r() for pairwise variant LD calculation, allel.stats.ld.windowed_r_squared() for windowed LD calculations, and allel.stats.ld.locate_unlinked() for locating variants in approximate linkage equilibrium.
  • Added function allel.plot.pairwise_ld() for visualising a matrix of linkage disequilbrium values between pairs of variants.
  • Added function allel.model.create_allele_mapping() for creating a mapping of alleles into a different index system, i.e., if you want 0 and 1 to represent something other than REF and ALT, e.g., ancestral and derived. Also added methods allel.model.GenotypeArray.map_alleles(), allel.model.HaplotypeArray.map_alleles() and allel.model.AlleleCountsArray.map_alleles() which will perform an allele transformation given an allele mapping.
  • Added function allel.plot.variant_locator() ported across from anhima.
  • Refactored the allel.stats module into a package with sub-modules for easier maintenance.


  • Added documentation for the functions allel.bcolz.carray_from_hdf5(), allel.bcolz.carray_to_hdf5(), allel.bcolz.ctable_from_hdf5_group(), allel.bcolz.ctable_to_hdf5_group().
  • Refactoring of internals within the allel.bcolz module.


  • Added subpop argument to allel.model.GenotypeArray.count_alleles() and allel.model.HaplotypeArray.count_alleles() to enable count alleles within a sub-population without subsetting the array.
  • Added functions allel.model.GenotypeArray.count_alleles_subpops() and allel.model.HaplotypeArray.count_alleles_subpops() to enable counting alleles in multiple sub-populations in a single pass over the array, without sub-setting.
  • Added classes allel.model.FeatureTable and allel.bcolz.FeatureCTable for storing and querying data on genomic features (genes, etc.), with functions for parsing from a GFF3 file.
  • Added convenience function allel.stats.distance.pairwise_dxy() for computing a distance matrix using Dxy as the metric.


  • Added function for writing a nucleotide sequence stored as a NumPy array out to a FASTA format file.


  • Added method allel.model.VariantTable.to_vcf() for writing a variant table to a VCF format file.