Chunked storage utilities

This module provides an abstraction layer over generic chunked array storage libraries. Currently HDF5 (via h5py), bcolz and zarr are supported storage layers.

Different storage configurations can be used with the functions and classes defined below. Wherever a function or method takes a storage keyword argument, the value of the argument will determine the storage used for the output.

If storage is a string, it will be used to look up one of several predefined storage configurations via the storage registry, which is a dictionary located at allel.chunked.storage_registry. The default storage can be changed globally by setting the value of the ‘default’ key in the storage registry.

Alternatively, storage may be an instance of one of the storage classes defined below, e.g., allel.chunked.storage_bcolz.BcolzMemStorage or allel.chunked.storage_hdf5.HDF5TmpStorage, which allows custom configuration of storage parameters such as compression type and level.

For example:

>>> from allel import chunked
>>> import numpy as np
>>> a = np.arange(10000000)
>>> chunked.copy(a)
Array((10000000,), int64, chunks=(39063,), order=C)
  ...
>>> chunked.copy(a, storage='bcolzmem')
carray((10000000,), int64)
  ...
>>> chunked.copy(a, storage='bcolztmp')
carray((10000000,), int64)
  ...
>>> chunked.copy(a, storage='zarrmem')
Array((10000000,), int64, chunks=(39063,), order=C)
  ...
>>> chunked.copy(a, storage='zarrtmp')
Array((10000000,), int64, chunks=(39063,), order=C)
  ...
>>> chunked.copy(a, storage=chunked.BcolzStorage(cparams=bcolz.cparams(cname='lz4')))
carray((10000000,), int64)
  ...
>>> chunked.copy(a, storage='hdf5mem_zlib1')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> chunked.copy(a, storage='hdf5tmp_zlib1')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> import h5py
>>> h5f = h5py.File('example.h5', mode='w')
>>> h5g = h5f.create_group('test')
>>> chunked.copy(a, storage='hdf5', group=h5g, name='data')
<HDF5 dataset "data": shape (10000000,), type "<i8">
>>> h5f['test/data']
<HDF5 dataset "data": shape (10000000,), type "<i8">

Storage

bcolz

class allel.chunked.storage_bcolz.BcolzStorage(**kwargs)[source]

Storage layer using bcolz carray and ctable.

class allel.chunked.storage_bcolz.BcolzMemStorage(**kwargs)[source]
class allel.chunked.storage_bcolz.BcolzTmpStorage(**kwargs)[source]
allel.chunked.storage_bcolz.bcolz_storage = ‘bcolz’

bcolz storage with default parameters

allel.chunked.storage_bcolz.bcolzmem_storage = ‘bcolzmem’

bcolz in-memory storage with default compression

allel.chunked.storage_bcolz.bcolztmp_storage = ‘bcolztmp’

bcolz temporary file storage with default compression

allel.chunked.storage_bcolz.bcolz_zlib1_storage = ‘bcolz_zlib1’

bcolz storage with zlib level 1 compression

allel.chunked.storage_bcolz.bcolzmem_zlib1_storage = ‘bcolzmem_zlib1’

bcolz in-memory storage with zlib level 1 compression

allel.chunked.storage_bcolz.bcolztmp_zlib1_storage = ‘bcolztmp_zlib1’

bcolz temporary file storage with zlib level 1 compression

HDF5 (h5py)

class allel.chunked.storage_hdf5.HDF5Storage(**kwargs)[source]

Storage layer using HDF5 dataset and group.

class allel.chunked.storage_hdf5.HDF5MemStorage(**kwargs)[source]
class allel.chunked.storage_hdf5.HDF5TmpStorage(**kwargs)[source]
allel.chunked.storage_hdf5.hdf5_storage = ‘hdf5’

HDF5 storage with default parameters

allel.chunked.storage_hdf5.hdf5mem_storage = ‘hdf5mem’

HDF5 in-memory storage with default compression

allel.chunked.storage_hdf5.hdf5tmp_storage = ‘hdf5tmp’

HDF5 temporary file storage with default compression

allel.chunked.storage_hdf5.hdf5_zlib1_storage = ‘hdf5_zlib1’

HDF5 storage with zlib level 1 compression

allel.chunked.storage_hdf5.hdf5mem_zlib1_storage = ‘hdf5mem_zlib1’

HDF5 in-memory storage with zlib level 1 compression

allel.chunked.storage_hdf5.hdf5tmp_zlib1_storage = ‘hdf5tmp_zlib1’

HDF5 temporary file storage with zlib level 1 compression

allel.chunked.storage_hdf5.hdf5_lzf_storage = ‘hdf5_lzf’

HDF5 storage with LZF compression

allel.chunked.storage_hdf5.hdf5mem_lzf_storage = ‘hdf5mem_lzf’

HDF5 in-memory storage with LZF compression

allel.chunked.storage_hdf5.hdf5tmp_lzf_storage = ‘hdf5tmp_lzf’

HDF5 temporary file storage with LZF compression

allel.chunked.storage_hdf5.h5fmem(**kwargs)[source]

Create an in-memory HDF5 file.

allel.chunked.storage_hdf5.h5ftmp(**kwargs)[source]

Create an HDF5 file backed by a temporary file.

Functions

allel.chunked.core.store(data, arr, start=0, stop=None, offset=0, blen=None)[source]

Copy data block-wise into arr.

allel.chunked.core.copy(data, start=0, stop=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Copy data block-wise into a new array.

allel.chunked.core.map_blocks(data, f, blen=None, storage=None, create=’array’, **kwargs)[source]

Apply function f block-wise over data.

allel.chunked.core.reduce_axis(data, reducer, block_reducer, mapper=None, axis=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Apply an operation to data that reduces over one or more axes.

allel.chunked.core.amax(data, axis=None, mapper=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Compute the maximum value.

allel.chunked.core.amin(data, axis=None, mapper=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Compute the minimum value.

allel.chunked.core.asum(data, axis=None, mapper=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Compute the sum.

allel.chunked.core.count_nonzero(data, mapper=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Count the number of non-zero elements.

allel.chunked.core.compress(condition, data, axis=0, out=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Return selected slices of an array along given axis.

allel.chunked.core.take(data, indices, axis=0, out=None, mode=’raise’, blen=None, storage=None, create=’array’, **kwargs)[source]

Take elements from an array along an axis.

allel.chunked.core.subset(data, sel0=None, sel1=None, blen=None, storage=None, create=’array’, **kwargs)[source]

Return selected rows and columns of an array.

allel.chunked.core.concatenate(tup, axis=0, blen=None, storage=None, create=’array’, **kwargs)[source]

Concatenate arrays.

allel.chunked.core.binary_op(data, op, other, blen=None, storage=None, create=’array’, **kwargs)[source]

Compute a binary operation block-wise over data.

allel.chunked.core.copy_table(tbl, start=0, stop=None, blen=None, storage=None, create=’table’, **kwargs)[source]

Copy tbl block-wise into a new table.

allel.chunked.core.compress_table(condition, tbl, axis=None, out=None, blen=None, storage=None, create=’table’, **kwargs)[source]

Return selected rows of a table.

allel.chunked.core.take_table(tbl, indices, axis=None, out=None, mode=’raise’, blen=None, storage=None, create=’table’, **kwargs)[source]

Return selected rows of a table.

allel.chunked.core.concatenate_table(tup, blen=None, storage=None, create=’table’, **kwargs)[source]

Stack tables in sequence vertically (row-wise).

allel.chunked.core.eval_table(tbl, expression, vm=’python’, blen=None, storage=None, create=’array’, vm_kwargs=None, **kwargs)[source]

Evaluate expression against columns of a table.

Classes

class allel.chunked.core.ChunkedArrayWrapper(data)[source]

Wrapper class for chunked array-like data.

Parameters:

data : array_like

Data to be wrapped. May be a bcolz carray, h5py dataset, or anything providing a similar interface.

class allel.chunked.core.ChunkedTableWrapper(data, names=None)[source]

Wrapper class for chunked table-like data.

Parameters:

data: table_like

Data to be wrapped. May be a tuple or list of columns (array-like), a dict mapping names to columns, a bcolz ctable, h5py group, numpy recarray, or anything providing a similar interface.

names : sequence of strings

Column names.