KmerAlphabet#
- class KmerAlphabet(words: tuple[str | bytes, ...], monomers: CharAlphabet, k: int, gap: str | None = None, missing: str | None = None)#
k-mer alphabet represents complete non-monomer alphabets
- Attributes:
- gap_char
- gap_index
- missing_char
- missing_index
- moltype
- motif_len
- num_canonical
Methods
count
(value, /)Return number of occurrences of value.
from_index
(kmer_index)decodes an integer into a k-mer
from_rich_dict
(data)returns an instance from a serialised dictionary
index
(value[, start, stop])Return first index of value.
is_valid
()whether integers are within the valid range
to_index
(-> int -> int)encodes a k-mer as a single integer
to_indices
(-> ~numpy.ndarray)returns a sequence of k-mer indices
to_json
()returns a serialisable string
returns a serialisable dictionary
with_gap_motif
([include_missing])returns a new KmerAlphabet with the gap motif added
from_indices
Notes
Differs from SenseCodonAlphabet case by representing all possible permutations of k-length of the provided monomer alphabet. More efficient mapping between
- count(value, /)#
Return number of occurrences of value.
- from_index(kmer_index: int) ndarray #
decodes an integer into a k-mer
- from_indices(kmer_indices: ndarray, independent_kmer: bool = True) ndarray #
- classmethod from_rich_dict(data: dict) KmerAlphabet #
returns an instance from a serialised dictionary
- property gap_char: str | None#
- property gap_index: int | None#
- index(value, start=0, stop=sys.maxsize, /)#
Return first index of value.
Raises ValueError if the value is not present.
- is_valid(seq: ndarray) bool #
- is_valid(seq: ndarray) bool
whether integers are within the valid range
- Parameters:
- seq
a numpy array of integers
Notes
This will raise a TypeError for string or bytes. Using to_indices() to convert those ensures a valid result.
- property missing_char: str | None#
- property missing_index: int | None#
- abstract property motif_len: int#
- property num_canonical: int#
- to_index(seq) int #
- to_index(seq: str) int
- to_index(seq: bytes) int
- to_index(seq: ndarray) int
encodes a k-mer as a single integer
- Parameters:
- seq
sequence to be encoded, can be either a string or numpy array
- overlapping
if False, performs operation on sequential k-mers, e.g. codons
Notes
If self.gap_char is defined, then the following rules apply: returns num_states**k if a k-mer contains a gap character, otherwise returns num_states**k + 1 if a k-mer contains a non-canonical character. If self.gap_char is not defined, returns num_states**k for both cases.
- to_indices(seq, independent_kmer: bool = True) ndarray #
- to_indices(seq: str, independent_kmer: bool = True) ndarray
- to_indices(seq: ndarray, independent_kmer: bool = True) ndarray
returns a sequence of k-mer indices
- Parameters:
- seq
a sequence of monomers
- independent_kmer
if True, returns non-overlapping k-mers
Notes
If self.gap_char is not None, then the following rules apply: If a sequence k-mer contains a gap character it is assigned an index of (num. monomer states**k). If a k-mer contains a non-canonical and non-gap character, it is assigned an index of (num. monomer states**k) + 1. If self.gap_char is None, then both of the above cases are defined as (num. monomer states**k).
- to_json() str #
returns a serialisable string
- to_rich_dict() dict #
returns a serialisable dictionary
- with_gap_motif(include_missing: bool = False)#
returns a new KmerAlphabet with the gap motif added
Notes
Adds gap state to monomers and recreates k-mer alphabet for self