bcolz arrays (deprecated)¶
This module provides alternative implementations of array
classes defined in the allel.model.ndarray
module, using
bcolz compressed arrays instead of numpy
arrays for data storage.
Note
Please note this module is now deprecated and will be removed in a
future release. It has been superseded by the
allel.model.chunked
module which supports both bcolz and
HDF5 as the underlying storage layer.
GenotypeCArray¶
-
class
allel.model.bcolz.
GenotypeCArray
(data=None, copy=False, **kwargs)[source]¶ Alternative implementation of the
allel.model.ndarray.GenotypeArray
class, using abcolz.carray
as the backing store.Parameters: data : array_like, int, shape (n_variants, n_samples, ploidy), optional
Data to initialise the array with. May be a bcolz carray, which will not be copied if copy=False. May also be None, in which case rootdir must be provided (disk-based array).
copy : bool, optional
If True, copy the input data into a new bcolz carray.
**kwargs : keyword arguments
Passed through to the bcolz carray constructor.
Examples
Instantiate a compressed genotype array from existing data:
>>> import allel >>> g = allel.GenotypeCArray([[[0, 0], [0, 1]], ... [[0, 1], [1, 1]], ... [[0, 2], [-1, -1]]], dtype='i1') >>> g GenotypeCArray((3, 2, 2), int8) nbytes: 12; cbytes: 16.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[[ 0 0] [ 0 1]] [[ 0 1] [ 1 1]] [[ 0 2] [-1 -1]]]
Obtain a numpy ndarray from a compressed array by slicing:
>>> g[:] GenotypeArray((3, 2, 2), dtype=int8) [[[ 0 0] [ 0 1]] [[ 0 1] [ 1 1]] [[ 0 2] [-1 -1]]]
Build incrementally:
>>> import bcolz >>> data = bcolz.zeros((0, 2, 2), dtype='i1') >>> data.append([[0, 0], [0, 1]]) >>> data.append([[0, 1], [1, 1]]) >>> data.append([[0, 2], [-1, -1]]) >>> g = allel.GenotypeCArray(data) >>> g GenotypeCArray((3, 2, 2), int8) nbytes: 12; cbytes: 16.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[[ 0 0] [ 0 1]] [[ 0 1] [ 1 1]] [[ 0 2] [-1 -1]]]
Load from HDF5:
>>> import h5py >>> with h5py.File('test1.h5', mode='w') as h5f: ... h5f.create_dataset('genotype', ... data=[[[0, 0], [0, 1]], ... [[0, 1], [1, 1]], ... [[0, 2], [-1, -1]]], ... dtype='i1', ... chunks=(2, 2, 2)) ... <HDF5 dataset "genotype": shape (3, 2, 2), type "|i1"> >>> g = allel.GenotypeCArray.from_hdf5('test1.h5', 'genotype') >>> g GenotypeCArray((3, 2, 2), int8) nbytes: 12; cbytes: 16.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[[ 0 0] [ 0 1]] [[ 0 1] [ 1 1]] [[ 0 2] [-1 -1]]]
Note that methods of this class will return bcolz carrays rather than numpy ndarrays where possible. E.g.:
>>> g.take([0, 2], axis=0) GenotypeCArray((2, 2, 2), int8) nbytes: 8; cbytes: 16.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[[ 0 0] [ 0 1]] [[ 0 2] [-1 -1]]] >>> g.is_called() CArrayWrapper((3, 2), bool) nbytes: 6; cbytes: 16.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[ True True] [ True True] [ True False]] >>> g.to_haplotypes() HaplotypeCArray((3, 4), int8) nbytes: 12; cbytes: 16.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[ 0 0 0 1] [ 0 1 1 1] [ 0 2 -1 -1]] >>> g.count_alleles() AlleleCountsCArray((3, 3), int32) nbytes: 36; cbytes: 16.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[3 1 0] [1 3 0] [1 0 1]]
HaplotypeCArray¶
-
class
allel.model.bcolz.
HaplotypeCArray
(data=None, copy=False, **kwargs)[source]¶ Alternative implementation of the
allel.model.ndarray.HaplotypeArray
class, using abcolz.carray
as the backing store.Parameters: data : array_like, int, shape (n_variants, n_haplotypes), optional
Data to initialise the array with. May be a bcolz carray, which will not be copied if copy=False. May also be None, in which case rootdir must be provided (disk-based array).
copy : bool, optional
If True, copy the input data into a new bcolz carray.
**kwargs : keyword arguments
Passed through to the bcolz carray constructor.
AlleleCountsCArray¶
-
class
allel.model.bcolz.
AlleleCountsCArray
(data=None, copy=False, **kwargs)[source]¶ Alternative implementation of the
allel.model.ndarray.AlleleCountsArray
class, using abcolz.carray
as the backing store.Parameters: data : array_like, int, shape (n_variants, n_alleles), optional
Data to initialise the array with. May be a bcolz carray, which will not be copied if copy=False. May also be None, in which case rootdir must be provided (disk-based array).
copy : bool, optional
If True, copy the input data into a new bcolz carray.
**kwargs : keyword arguments
Passed through to the bcolz carray constructor.
VariantCTable¶
-
class
allel.model.bcolz.
VariantCTable
(data=None, copy=False, index=None, **kwargs)[source]¶ Alternative implementation of the
allel.model.ndarray.VariantTable
class, using abcolz.ctable
as the backing store.Parameters: data : tuple or list of column objects, optional
The list of column data to build the ctable object. This can also be a pure NumPy structured array. May also be a bcolz ctable, which will not be copied if copy=False. May also be None, in which case rootdir must be provided (disk-based array).
copy : bool, optional
If True, copy the input data into a new bcolz ctable.
index : string or pair of strings, optional
If a single string, name of column to use for a sorted index. If a pair of strings, name of columns to use for a sorted multi-index.
**kwargs : keyword arguments
Passed through to the bcolz ctable constructor.
Examples
Instantiate from existing data:
>>> import allel >>> chrom = [b'chr1', b'chr1', b'chr2', b'chr2', b'chr3'] >>> pos = [2, 7, 3, 9, 6] >>> dp = [35, 12, 78, 22, 99] >>> qd = [4.5, 6.7, 1.2, 4.4, 2.8] >>> ac = [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)] >>> vt = allel.VariantCTable([chrom, pos, dp, qd, ac], ... names=['CHROM', 'POS', 'DP', 'QD', 'AC'], ... index=('CHROM', 'POS')) >>> vt VariantCTable((5,), [('CHROM', 'S4'), ('POS', '<i8'), ('DP', '<i8'), ('QD', '<f8'), ('AC', '<i8', (2,))]) nbytes: 220; cbytes: 80.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [(b'chr1', 2, 35, 4.5, [1, 2]) (b'chr1', 7, 12, 6.7, [3, 4]) (b'chr2', 3, 78, 1.2, [5, 6]) (b'chr2', 9, 22, 4.4, [7, 8]) (b'chr3', 6, 99, 2.8, [9, 10])]
Slicing rows returns
allel.model.ndarray.VariantTable
:>>> vt[:2] VariantTable((2,), dtype=(numpy.record, [('CHROM', 'S4'), ('POS', '<i8'), ('DP', '<i8'), ('QD', '<f8'), ('AC', '<i8', (2,))])) [(b'chr1', 2, 35, 4.5, array([1, 2])) (b'chr1', 7, 12, 6.7, array([3, 4]))]
Accessing columns returns
allel.model.bcolz.VariantCTable
:>>> vt[['DP', 'QD']] VariantCTable((5,), [('DP', '<i8'), ('QD', '<f8')]) nbytes: 80; cbytes: 32.00 KB; ratio: 0.00 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [(35, 4.5) (12, 6.7) (78, 1.2) (22, 4.4) (99, 2.8)]
Use the index to locate variants:
>>> loc = vt.index.locate_range(b'chr2', 1, 10) >>> vt[loc] VariantTable((2,), dtype=(numpy.record, [('CHROM', 'S4'), ('POS', '<i8'), ('DP', '<i8'), ('QD', '<f8'), ('AC', '<i8', (2,))])) [(b'chr2', 3, 78, 1.2, array([5, 6])) (b'chr2', 9, 22, 4.4, array([7, 8]))]
FeatureCTable¶
-
class
allel.model.bcolz.
FeatureCTable
(data=None, copy=False, **kwargs)[source]¶ Alternative implementation of the
allel.model.ndarray.FeatureTable
class, using abcolz.ctable
as the backing store.Parameters: data : tuple or list of column objects, optional
The list of column data to build the ctable object. This can also be a pure NumPy structured array. May also be a bcolz ctable, which will not be copied if copy=False. May also be None, in which case rootdir must be provided (disk-based array).
copy : bool, optional
If True, copy the input data into a new bcolz ctable.
index : pair or triplet of strings, optional
Names of columns to use for positional index, e.g., (‘start’, ‘stop’) if table contains ‘start’ and ‘stop’ columns and records from a single chromosome/contig, or (‘seqid’, ‘start’, ‘end’) if table contains records from multiple chromosomes/contigs.
**kwargs : keyword arguments
Passed through to the bcolz ctable constructor.
Utility functions¶
-
allel.model.bcolz.
carray_from_hdf5
(*args, **kwargs)[source]¶ Load a bcolz carray from an HDF5 dataset.
Either provide an h5py dataset as a single positional argument, or provide two positional arguments giving the HDF5 file path and the dataset node path within the file.
The following optional parameters may be given. Any other keyword arguments are passed through to the bcolz.carray constructor.
Parameters: start : int, optional
Index to start loading from.
stop : int, optional
Index to finish loading at.
condition : array_like, bool, optional
A 1-dimensional boolean array of the same length as the first dimension of the dataset to load, indicating a selection of rows to load.
blen : int, optional
Block size to use when loading.
-
allel.model.bcolz.
carray_to_hdf5
(carr, parent, name, **kwargs)[source]¶ Write a bcolz carray to an HDF5 dataset.
Parameters: carr : bcolz.carray
Data to write.
parent : string or h5py group
Parent HDF5 file or group. If a string, will be treated as HDF5 file name.
name : string
Name or path of dataset to write data into.
kwargs : keyword arguments
Passed through to h5py require_dataset() function.
Returns: h5d : h5py dataset
-
allel.model.bcolz.
ctable_from_hdf5_group
(*args, **kwargs)[source]¶ Load a bcolz ctable from columns stored as separate datasets with an HDF5 group.
Either provide an h5py group as a single positional argument, or provide two positional arguments giving the HDF5 file path and the group node path within the file.
The following optional parameters may be given. Any other keyword arguments are passed through to the bcolz.carray constructor.
Parameters: start : int, optional
Index to start loading from.
stop : int, optional
Index to finish loading at.
condition : array_like, bool, optional
A 1-dimensional boolean array of the same length as the columns of the table to load, indicating a selection of rows to load.
blen : int, optional
Block size to use when loading.
-
allel.model.bcolz.
ctable_to_hdf5_group
(ctbl, parent, name, **kwargs)[source]¶ Write each column in a bcolz ctable to a dataset in an HDF5 group.
Parameters: parent : string or h5py group
Parent HDF5 file or group. If a string, will be treated as HDF5 file name.
name : string
Name or path of group to write data into.
kwargs : keyword arguments
Passed through to h5py require_dataset() function.
Returns: h5g : h5py group