CharAlphabet#

class CharAlphabet(chars: Sequence[str | bytes], gap: str | None = None, missing: str | None = None)#

representing fundamental monomer character sets.

Attributes:
gap_char
gap_index
missing_char
missing_index
moltype
motif_len
num_canonical

Methods

array_to_bytes(seq)

returns seq as a byte string

as_bytes()

returns self as a byte string

count(value, /)

Return number of occurrences of value.

from_rich_dict(data)

returns an instance from a serialised dictionary

get_kmer_alphabet(k[, include_gap])

returns kmer alphabet with words of size k

index(value[, start, stop])

Return first index of value.

to_json()

returns a serialisable string

to_rich_dict()

returns a serialisable dictionary

with_gap_motif([gap_char, missing_char])

returns new monomer alphabet with gap and missing characters added

from_indices

get_motif_len

get_word_alphabet

is_valid

to_indices

Notes

Provides methods for efficient conversion between characters and integers from fundamental types of strings, bytes and numpy arrays.

array_to_bytes(seq: ndarray) bytes#

returns seq as a byte string

as_bytes() bytes#

returns self as a byte string

count(value, /)#

Return number of occurrences of value.

from_indices(seq: str | bytes | ndarray) str#
from_indices(seq: str) str
from_indices(seq: bytes) str
from_indices(seq: ndarray) str
classmethod from_rich_dict(data: dict) CharAlphabet#

returns an instance from a serialised dictionary

property gap_char: str | None#
property gap_index: int | None#
get_kmer_alphabet(k: int, include_gap: bool = True) KmerAlphabet#

returns kmer alphabet with words of size k

Parameters:
k

word size

include_gap

if True, and self.gap_char, we set KmerAlphabet.gap_char = self.gap_char * k

get_motif_len() int#
get_word_alphabet(k: int, include_gap: bool = True) KmerAlphabet#
index(value, start=0, stop=sys.maxsize, /)#

Return first index of value.

Raises ValueError if the value is not present.

is_valid(seq: str | bytes | ndarray) bool#
is_valid(seq: str) bool
is_valid(seq: bytes) bool
is_valid(seq: ndarray) bool
property missing_char: str | None#
property missing_index: int | None#
property moltype: MolType | None#
property motif_len: int#
property num_canonical: int#
to_indices(seq: str | bytes | ndarray) ndarray[int]#
to_indices(seq: bytes) ndarray[int]
to_indices(seq: str) ndarray[int]
to_indices(seq: ndarray) ndarray[int]
to_json() str#

returns a serialisable string

to_rich_dict() dict#

returns a serialisable dictionary

with_gap_motif(gap_char='-', missing_char='?')#

returns new monomer alphabet with gap and missing characters added

Parameters:
gap_char

the IUPAC gap character “-”

missing_char

the IUPAC missing character “?”