Mendelian inheritance

allel.mendel_errors(parent_genotypes, progeny_genotypes)[source]

Locate genotype calls not consistent with Mendelian transmission of alleles.

Parameters:
parent_genotypes : array_like, int, shape (n_variants, 2, 2)

Genotype calls for the two parents.

progeny_genotypes : array_like, int, shape (n_variants, n_progeny, 2)

Genotype calls for the progeny.

Returns:
me : ndarray, int, shape (n_variants, n_progeny)

Count of Mendel errors for each progeny genotype call.

Examples

The following are all consistent with Mendelian transmission. Note that a value of 0 is returned for missing calls:

>>> import allel
>>> import numpy as np
>>> genotypes = np.array([
...     # aa x aa -> aa
...     [[0, 0], [0, 0], [0, 0], [-1, -1], [-1, -1], [-1, -1]],
...     [[1, 1], [1, 1], [1, 1], [-1, -1], [-1, -1], [-1, -1]],
...     [[2, 2], [2, 2], [2, 2], [-1, -1], [-1, -1], [-1, -1]],
...     # aa x ab -> aa or ab
...     [[0, 0], [0, 1], [0, 0], [0, 1], [-1, -1], [-1, -1]],
...     [[0, 0], [0, 2], [0, 0], [0, 2], [-1, -1], [-1, -1]],
...     [[1, 1], [0, 1], [1, 1], [0, 1], [-1, -1], [-1, -1]],
...     # aa x bb -> ab
...     [[0, 0], [1, 1], [0, 1], [-1, -1], [-1, -1], [-1, -1]],
...     [[0, 0], [2, 2], [0, 2], [-1, -1], [-1, -1], [-1, -1]],
...     [[1, 1], [2, 2], [1, 2], [-1, -1], [-1, -1], [-1, -1]],
...     # aa x bc -> ab or ac
...     [[0, 0], [1, 2], [0, 1], [0, 2], [-1, -1], [-1, -1]],
...     [[1, 1], [0, 2], [0, 1], [1, 2], [-1, -1], [-1, -1]],
...     # ab x ab -> aa or ab or bb
...     [[0, 1], [0, 1], [0, 0], [0, 1], [1, 1], [-1, -1]],
...     [[1, 2], [1, 2], [1, 1], [1, 2], [2, 2], [-1, -1]],
...     [[0, 2], [0, 2], [0, 0], [0, 2], [2, 2], [-1, -1]],
...     # ab x bc -> ab or ac or bb or bc
...     [[0, 1], [1, 2], [0, 1], [0, 2], [1, 1], [1, 2]],
...     [[0, 1], [0, 2], [0, 0], [0, 1], [0, 1], [1, 2]],
...     # ab x cd -> ac or ad or bc or bd
...     [[0, 1], [2, 3], [0, 2], [0, 3], [1, 2], [1, 3]],
... ])
>>> me = allel.mendel_errors(genotypes[:, :2], genotypes[:, 2:])
>>> me
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])

The following are cases of ‘non-parental’ inheritance where one or two alleles are found in the progeny that are not present in either parent. Note that the number of errors may be 1 or 2 depending on the number of non-parental alleles:

>>> genotypes = np.array([
...     # aa x aa -> ab or ac or bb or cc
...     [[0, 0], [0, 0], [0, 1], [0, 2], [1, 1], [2, 2]],
...     [[1, 1], [1, 1], [0, 1], [1, 2], [0, 0], [2, 2]],
...     [[2, 2], [2, 2], [0, 2], [1, 2], [0, 0], [1, 1]],
...     # aa x ab -> ac or bc or cc
...     [[0, 0], [0, 1], [0, 2], [1, 2], [2, 2], [2, 2]],
...     [[0, 0], [0, 2], [0, 1], [1, 2], [1, 1], [1, 1]],
...     [[1, 1], [0, 1], [1, 2], [0, 2], [2, 2], [2, 2]],
...     # aa x bb -> ac or bc or cc
...     [[0, 0], [1, 1], [0, 2], [1, 2], [2, 2], [2, 2]],
...     [[0, 0], [2, 2], [0, 1], [1, 2], [1, 1], [1, 1]],
...     [[1, 1], [2, 2], [0, 1], [0, 2], [0, 0], [0, 0]],
...     # ab x ab -> ac or bc or cc
...     [[0, 1], [0, 1], [0, 2], [1, 2], [2, 2], [2, 2]],
...     [[0, 2], [0, 2], [0, 1], [1, 2], [1, 1], [1, 1]],
...     [[1, 2], [1, 2], [0, 1], [0, 2], [0, 0], [0, 0]],
...     # ab x bc -> ad or bd or cd or dd
...     [[0, 1], [1, 2], [0, 3], [1, 3], [2, 3], [3, 3]],
...     [[0, 1], [0, 2], [0, 3], [1, 3], [2, 3], [3, 3]],
...     [[0, 2], [1, 2], [0, 3], [1, 3], [2, 3], [3, 3]],
...     # ab x cd -> ae or be or ce or de
...     [[0, 1], [2, 3], [0, 4], [1, 4], [2, 4], [3, 4]],
... ])
>>> me = allel.mendel_errors(genotypes[:, :2], genotypes[:, 2:])
>>> me
array([[1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 2, 2],
       [1, 1, 1, 2],
       [1, 1, 1, 2],
       [1, 1, 1, 2],
       [1, 1, 1, 1]])

The following are cases of ‘hemi-parental’ inheritance, where progeny appear to have inherited two copies of an allele found only once in one of the parents:

>>> genotypes = np.array([
...     # aa x ab -> bb
...     [[0, 0], [0, 1], [1, 1], [-1, -1]],
...     [[0, 0], [0, 2], [2, 2], [-1, -1]],
...     [[1, 1], [0, 1], [0, 0], [-1, -1]],
...     # ab x bc -> aa or cc
...     [[0, 1], [1, 2], [0, 0], [2, 2]],
...     [[0, 1], [0, 2], [1, 1], [2, 2]],
...     [[0, 2], [1, 2], [0, 0], [1, 1]],
...     # ab x cd -> aa or bb or cc or dd
...     [[0, 1], [2, 3], [0, 0], [1, 1]],
...     [[0, 1], [2, 3], [2, 2], [3, 3]],
... ])
>>> me = allel.mendel_errors(genotypes[:, :2], genotypes[:, 2:])
>>> me
array([[1, 0],
       [1, 0],
       [1, 0],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1]])

The following are cases of ‘uni-parental’ inheritance, where progeny appear to have inherited both alleles from a single parent:

>>> genotypes = np.array([
...     # aa x bb -> aa or bb
...     [[0, 0], [1, 1], [0, 0], [1, 1]],
...     [[0, 0], [2, 2], [0, 0], [2, 2]],
...     [[1, 1], [2, 2], [1, 1], [2, 2]],
...     # aa x bc -> aa or bc
...     [[0, 0], [1, 2], [0, 0], [1, 2]],
...     [[1, 1], [0, 2], [1, 1], [0, 2]],
...     # ab x cd -> ab or cd
...     [[0, 1], [2, 3], [0, 1], [2, 3]],
... ])
>>> me = allel.mendel_errors(genotypes[:, :2], genotypes[:, 2:])
>>> me
array([[1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1],
       [1, 1]])
allel.paint_transmission(parent_haplotypes, progeny_haplotypes)[source]

Paint haplotypes inherited from a single diploid parent according to their allelic inheritance.

Parameters:
parent_haplotypes : array_like, int, shape (n_variants, 2)

Both haplotypes from a single diploid parent.

progeny_haplotypes : array_like, int, shape (n_variants, n_progeny)

Haplotypes found in progeny of the given parent, inherited from the given parent. I.e., haplotypes from gametes of the given parent.

Returns:
painting : ndarray, uint8, shape (n_variants, n_progeny)

An array of integers coded as follows: 1 = allele inherited from first parental haplotype; 2 = allele inherited from second parental haplotype; 3 = reference allele, also carried by both parental haplotypes; 4 = non-reference allele, also carried by both parental haplotypes; 5 = non-parental allele; 6 = either or both parental alleles missing; 7 = missing allele; 0 = undetermined.

Examples

>>> import allel
>>> haplotypes = allel.HaplotypeArray([
...     [0, 0, 0, 1, 2, -1],
...     [0, 1, 0, 1, 2, -1],
...     [1, 0, 0, 1, 2, -1],
...     [1, 1, 0, 1, 2, -1],
...     [0, 2, 0, 1, 2, -1],
...     [0, -1, 0, 1, 2, -1],
...     [-1, 1, 0, 1, 2, -1],
...     [-1, -1, 0, 1, 2, -1],
... ], dtype='i1')
>>> painting = allel.paint_transmission(haplotypes[:, :2],
...                                           haplotypes[:, 2:])
>>> painting
array([[3, 5, 5, 7],
       [1, 2, 5, 7],
       [2, 1, 5, 7],
       [5, 4, 5, 7],
       [1, 5, 2, 7],
       [6, 6, 6, 7],
       [6, 6, 6, 7],
       [6, 6, 6, 7]], dtype=uint8)
allel.phase_by_transmission(g, window_size, copy=True)[source]

Phase genotypes in a trio or cross where possible using Mendelian transmission.

Parameters:
g : array_like, int, shape (n_variants, n_samples, 2)

Genotype array, with parents as first two columns and progeny as remaining columns.

window_size : int

Number of previous heterozygous sites to include when phasing each parent. A number somewhere between 10 and 100 may be appropriate, depending on levels of heterozygosity and quality of data.

copy : bool, optional

If False, attempt to phase genotypes in-place. Note that this is only possible if the input array has int8 dtype, otherwise a copy is always made regardless of this parameter.

Returns:
g : GenotypeArray

Genotype array with progeny phased where possible.

allel.phase_progeny_by_transmission(g)[source]

Phase progeny genotypes from a trio or cross using Mendelian transmission.

Parameters:
g : array_like, int, shape (n_variants, n_samples, 2)

Genotype array, with parents as first two columns and progeny as remaining columns.

Returns:
g : ndarray, int8, shape (n_variants, n_samples, 2)

Genotype array with progeny phased where possible.

Examples

>>> import allel
>>> g = allel.GenotypeArray([
...     [[0, 0], [0, 0], [0, 0]],
...     [[1, 1], [1, 1], [1, 1]],
...     [[0, 0], [1, 1], [0, 1]],
...     [[1, 1], [0, 0], [0, 1]],
...     [[0, 0], [0, 1], [0, 0]],
...     [[0, 0], [0, 1], [0, 1]],
...     [[0, 1], [0, 0], [0, 1]],
...     [[0, 1], [0, 1], [0, 1]],
...     [[0, 1], [1, 2], [0, 1]],
...     [[1, 2], [0, 1], [1, 2]],
...     [[0, 1], [2, 3], [0, 2]],
...     [[2, 3], [0, 1], [1, 3]],
...     [[0, 0], [0, 0], [-1, -1]],
...     [[0, 0], [0, 0], [1, 1]],
... ], dtype='i1')
>>> g = allel.phase_progeny_by_transmission(g)
>>> print(g.to_str(row_threshold=None))
0/0 0/0 0|0
1/1 1/1 1|1
0/0 1/1 0|1
1/1 0/0 1|0
0/0 0/1 0|0
0/0 0/1 0|1
0/1 0/0 1|0
0/1 0/1 0/1
0/1 1/2 0|1
1/2 0/1 2|1
0/1 2/3 0|2
2/3 0/1 3|1
0/0 0/0 ./.
0/0 0/0 1/1
>>> g.is_phased
array([[False, False,  True],
       [False, False,  True],
       [False, False,  True],
       [False, False,  True],
       [False, False,  True],
       [False, False,  True],
       [False, False,  True],
       [False, False, False],
       [False, False,  True],
       [False, False,  True],
       [False, False,  True],
       [False, False,  True],
       [False, False, False],
       [False, False, False]])
allel.phase_parents_by_transmission(g, window_size)[source]

Phase parent genotypes from a trio or cross, given progeny genotypes already phased by Mendelian transmission.

Parameters:
g : GenotypeArray

Genotype array, with parents as first two columns and progeny as remaining columns, where progeny genotypes are already phased.

window_size : int

Number of previous heterozygous sites to include when phasing each parent. A number somewhere between 10 and 100 may be appropriate, depending on levels of heterozygosity and quality of data.

Returns:
g : GenotypeArray

Genotype array with parents phased where possible.