scikit-allel - Explore and analyse genetic variation

This package provides utilities for exploratory analysis of large scale genetic variation data. It is based on numpy, scipy and other established Python scientific libraries.

Please feel free to ask questions via cggh/pygenomics on Gitter. Release announcements are posted to the cggh/pygenomics Gitter channel and the biovalidation mailing list. If you find a bug or would like to suggest a feature, please raise an issue on GitHub.

This site provides reference documentation for scikit-allel. For worked examples with real data, see the following articles:

If you would like to cite scikit-allel please use the DOI below.

Why “scikit-allel”?

SciKits” (short for SciPy Toolkits) are add-on packages for SciPy, hosted and developed separately and independently from the main SciPy distribution.

“Allel” (Greek ἀλλήλ) is the root of the word “allele” short for “allelomorph”, a word coined by William Bateson to mean variant forms of a gene. Today we use “allele” to mean any of the variant forms found at a site of genetic variation, such as the different nucleotides observed at a single nucleotide polymorphism (SNP).


This package requires numpy, scipy, matplotlib, seaborn, pandas, scikit-learn, h5py, numexpr, bcolz, zarr, dask and petl. Please install these dependencies first, then use pip to install scikit-allel:

$ pip install -U scikit-allel


Development of this package is supported by the MRC Centre for Genomics and Global Health.

Indices and tables