scikit-allel - Explore and analyse genetic variation

This package provides utilities for exploratory analysis of large scale genetic variation data. It is based on numpy, scipy and other established Python scientific libraries.

Please feel free to ask questions via cggh/pygenomics on Gitter. Release announcements are posted to the cggh/pygenomics Gitter channel and the biovalidation mailing list. If you find a bug or would like to suggest a feature, please raise an issue on GitHub.

This site provides reference documentation for scikit-allel. For worked examples with real data, see the following articles:

If you would like to cite scikit-allel please use the DOI below.

Why “scikit-allel”?

SciKits” (short for SciPy Toolkits) are add-on packages for SciPy, hosted and developed separately and independently from the main SciPy distribution.

“Allel” (Greek ἀλλήλ) is the root of the word “allele” short for “allelomorph”, a word coined by William Bateson to mean variant forms of a gene. Today we use “allele” to mean any of the variant forms found at a site of genetic variation, such as the different nucleotides observed at a single nucleotide polymorphism (SNP).


Install pre-built binaries via conda:

$ conda install -c conda-forge scikit-allel

Install and compile source code via pip:

$ pip install scikit-allel

N.B., this package requires numpy, scipy, matplotlib, seaborn, pandas, scikit-learn, h5py, numexpr, bcolz, zarr and dask. If installing via conda, these should be installed automatically. If installing via pip, please install these dependencies first, then use pip to install scikit-allel.


Development of this package is supported by the MRC Centre for Genomics and Global Health.

Indices and tables