scikit-allel - Explore and analyse genetic variation

This package provides utilities for exploratory analysis of large scale genetic variation data. It is based on numpy, scipy and other general-purpose Python scientific libraries.

Please feel free to ask questions via cggh/pygenomics on Gitter. Release announcements are posted to the cggh/pygenomics Gitter channel and the biovalidation mailing list. If you find a bug or would like to suggest a feature, please raise an issue on GitHub.

This site provides reference documentation for scikit-allel. For worked examples with real data, see the following articles:

If you would like to cite scikit-allel please use the DOI below.

Why “scikit-allel”?

SciKits” (short for SciPy Toolkits) are add-on packages for SciPy, hosted and developed separately and independently from the main SciPy distribution.

“Allel” (Greek ἀλλήλ) is the root of the word “allele” short for “allelomorph”, a word coined by William Bateson to mean variant forms of a gene. Today we use “allele” to mean any of the variant forms found at a site of genetic variation, such as the different nucleotides observed at a single nucleotide polymorphism (SNP).


Pre-built binaries are available for Windows, Mac and Linux, and can be installed via conda:

$ conda install -c conda-forge scikit-allel

Alternatively, if you have a C compiler on your system, scikit-allel can be installed via pip:

$ pip install scikit-allel

N.B., scikit-allel requires numpy, scipy, matplotlib, seaborn, pandas, scikit-learn, h5py, numexpr, bcolz, zarr and dask. If installing via conda, these should be installed automatically. If installing via pip, please install these dependencies first, then use pip to install scikit-allel.

If you have never installed Python before, you might find the following article useful: Installing Python for data analysis


This is academic software, written in the cracks of free time between other commitments, by people who are often learning as we code. We greatly appreciate bug reports, pull requests, and any other feedback or advice. If you do find a bug, we’ll do our best to fix it, but apologies in advance if we are not able to respond quickly. If you are doing any serious work with this package, please do not expect everything to work perfectly first time or be 100% correct. Treat everything with a healthy dose of suspicion, and don’t be afraid to dive into the source code if you have to. Pull requests are always welcome.


Development of this package is supported by the MRC Centre for Genomics and Global Health.

Indices and tables