Chunked storage utilities

This module provides an abstraction layer over generic chunked array storage libraries. Currently HDF5 (via h5py) and zarr are supported.

Different storage configurations can be used with the functions and classes defined below. Wherever a function or method takes a storage keyword argument, the value of the argument will determine the storage used for the output.

If storage is a string, it will be used to look up one of several predefined storage configurations via the storage registry, which is a dictionary located at allel.chunked.storage_registry. The default storage can be changed globally by setting the value of the ‘default’ key in the storage registry.

Alternatively, storage may be an instance of one of the storage classes defined below, e.g., allel.chunked.storage_zarr.ZarrMemStorage or allel.chunked.storage_hdf5.HDF5TmpStorage, which allows custom configuration of storage parameters such as compression type and level.

For example:

>>> from allel import chunked
>>> import numpy as np
>>> a = np.arange(10000000)
>>> chunked.copy(a)
<zarr.core.Array (10000000,) int64>
>>> chunked.copy(a, storage='zarrmem')
<zarr.core.Array (10000000,) int64>
>>> chunked.copy(a, storage='zarrtmp')
<zarr.core.Array (10000000,) int64>
>>> chunked.copy(a, storage='hdf5mem_zlib1')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> chunked.copy(a, storage='hdf5tmp_zlib1')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> import h5py
>>> h5f = h5py.File('example.h5', mode='w')
>>> h5g = h5f.create_group('test')
>>> chunked.copy(a, storage='hdf5', group=h5g, name='data')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> h5f['test/data']
<HDF5 dataset "data": shape (10000000,), type "<i8">



HDF5 (h5py)

class allel.chunked.storage_hdf5.HDF5Storage(**kwargs)[source]

Storage layer using HDF5 dataset and group.

class allel.chunked.storage_hdf5.HDF5MemStorage(**kwargs)[source]
class allel.chunked.storage_hdf5.HDF5TmpStorage(**kwargs)[source]
allel.chunked.storage_hdf5.hdf5_storage = 'hdf5'

HDF5 storage with default parameters

allel.chunked.storage_hdf5.hdf5mem_storage = 'hdf5mem'

HDF5 in-memory storage with default compression

allel.chunked.storage_hdf5.hdf5tmp_storage = 'hdf5tmp'

HDF5 temporary file storage with default compression

allel.chunked.storage_hdf5.hdf5_zlib1_storage = 'hdf5_zlib1'

HDF5 storage with zlib level 1 compression

allel.chunked.storage_hdf5.hdf5mem_zlib1_storage = 'hdf5mem_zlib1'

HDF5 in-memory storage with zlib level 1 compression

allel.chunked.storage_hdf5.hdf5tmp_zlib1_storage = 'hdf5tmp_zlib1'

HDF5 temporary file storage with zlib level 1 compression

allel.chunked.storage_hdf5.hdf5_lzf_storage = 'hdf5_lzf'

HDF5 storage with LZF compression

allel.chunked.storage_hdf5.hdf5mem_lzf_storage = 'hdf5mem_lzf'

HDF5 in-memory storage with LZF compression

allel.chunked.storage_hdf5.hdf5tmp_lzf_storage = 'hdf5tmp_lzf'

HDF5 temporary file storage with LZF compression


Create an in-memory HDF5 file.


Create an HDF5 file backed by a temporary file.

Functions, arr, start=0, stop=None, offset=0, blen=None)[source]

Copy data block-wise into arr.

allel.chunked.core.copy(data, start=0, stop=None, blen=None, storage=None, create='array', **kwargs)[source]

Copy data block-wise into a new array.

allel.chunked.core.map_blocks(data, f, blen=None, storage=None, create='array', **kwargs)[source]

Apply function f block-wise over data.

allel.chunked.core.reduce_axis(data, reducer, block_reducer, mapper=None, axis=None, blen=None, storage=None, create='array', **kwargs)[source]

Apply an operation to data that reduces over one or more axes.

allel.chunked.core.amax(data, axis=None, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]

Compute the maximum value.

allel.chunked.core.amin(data, axis=None, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]

Compute the minimum value.

allel.chunked.core.asum(data, axis=None, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]

Compute the sum.

allel.chunked.core.count_nonzero(data, mapper=None, blen=None, storage=None, create='array', **kwargs)[source]

Count the number of non-zero elements.

allel.chunked.core.compress(condition, data, axis=0, out=None, blen=None, storage=None, create='array', **kwargs)[source]

Return selected slices of an array along given axis.

allel.chunked.core.take(data, indices, axis=0, out=None, mode='raise', blen=None, storage=None, create='array', **kwargs)[source]

Take elements from an array along an axis.

allel.chunked.core.subset(data, sel0=None, sel1=None, blen=None, storage=None, create='array', **kwargs)[source]

Return selected rows and columns of an array.

allel.chunked.core.concatenate(tup, axis=0, blen=None, storage=None, create='array', **kwargs)[source]

Concatenate arrays.

allel.chunked.core.binary_op(data, op, other, blen=None, storage=None, create='array', **kwargs)[source]

Compute a binary operation block-wise over data.

allel.chunked.core.copy_table(tbl, start=0, stop=None, blen=None, storage=None, create='table', **kwargs)[source]

Copy tbl block-wise into a new table.

allel.chunked.core.compress_table(condition, tbl, axis=None, out=None, blen=None, storage=None, create='table', **kwargs)[source]

Return selected rows of a table.

allel.chunked.core.take_table(tbl, indices, axis=None, out=None, mode='raise', blen=None, storage=None, create='table', **kwargs)[source]

Return selected rows of a table.

allel.chunked.core.concatenate_table(tup, blen=None, storage=None, create='table', **kwargs)[source]

Stack tables in sequence vertically (row-wise).

allel.chunked.core.eval_table(tbl, expression, vm='python', blen=None, storage=None, create='array', vm_kwargs=None, **kwargs)[source]

Evaluate expression against columns of a table.


class allel.chunked.core.ChunkedArrayWrapper(data)[source]

Wrapper class for chunked array-like data.

data : array_like

Data to be wrapped. May be a Zarr array, h5py dataset, or anything providing a similar interface.

class allel.chunked.core.ChunkedTableWrapper(data, names=None)[source]

Wrapper class for chunked table-like data.

data: table_like

Data to be wrapped. May be a tuple or list of columns (array-like), a dict mapping names to columns, h5py group, numpy recarray, or anything providing a similar interface.

names : sequence of strings

Column names.