CharAlphabet#
- class CharAlphabet(chars: Sequence[str | bytes], gap: str | None = None, missing: str | None = None)#
representing fundamental monomer character sets.
- Attributes:
- gap_char
- gap_index
- missing_char
- missing_index
- moltype
- motif_len
- num_canonical
Methods
array_to_bytes
(seq)returns seq as a byte string
as_bytes
()returns self as a byte string
convert_seq_array_to
(*, alphabet, seq[, ...])converts a numpy array with indices from self to other
count
(value, /)Return number of occurrences of value.
from_rich_dict
(data)returns an instance from a serialised dictionary
get_kmer_alphabet
(k[, include_gap])returns kmer alphabet with words of size k
get_subset
(motif_subset[, excluded])Returns a new Alphabet object containing a subset of motifs in self.
index
(value[, start, stop])Return first index of value.
to_json
()returns a serialisable string
to_rich_dict
([for_pickle])returns a serialisable dictionary
with_gap_motif
([gap_char, missing_char, ...])returns new monomer alphabet with gap and missing characters added
from_indices
get_motif_len
get_word_alphabet
is_valid
to_indices
Notes
Provides methods for efficient conversion between characters and integers from fundamental types of strings, bytes and numpy arrays.
- array_to_bytes(seq: ndarray) bytes #
returns seq as a byte string
- as_bytes() bytes #
returns self as a byte string
- convert_seq_array_to(*, alphabet: Self, seq: ndarray, check_valid: bool = True) ndarray #
converts a numpy array with indices from self to other
- Parameters:
- alphabet
alphabet to convert to
- seq
ndarray of uint8 integers
- check_valid
validates both input and out sequences are valid for self and other respectively. Validation failure raises an AlphabetError.
- Returns:
- the indices of characters in common between self and other
- are swapped
- count(value, /)#
Return number of occurrences of value.
- from_indices(seq: str | bytes | ndarray) str #
- from_indices(seq: str) str
- from_indices(seq: bytes) str
- from_indices(seq: ndarray) str
- classmethod from_rich_dict(data: dict) Self #
returns an instance from a serialised dictionary
- property gap_char: str | None#
- property gap_index: int | None#
- get_kmer_alphabet(k: int, include_gap: bool = True) KmerAlphabet #
returns kmer alphabet with words of size k
- Parameters:
- k
word size
- include_gap
if True, and self.gap_char, we set KmerAlphabet.gap_char = self.gap_char * k
Notes
If self.missing_char is present, it is included in the new alphabet as missing_char * k
- get_motif_len() int #
- get_subset(motif_subset: Sequence[str | bytes], excluded: bool = False) Self #
Returns a new Alphabet object containing a subset of motifs in self.
Raises an exception if any of the items in the subset are not already in self.
- get_word_alphabet(k: int, include_gap: bool = True) KmerAlphabet #
- index(value, start=0, stop=sys.maxsize, /)#
Return first index of value.
Raises ValueError if the value is not present.
- is_valid(seq: str | bytes | ndarray) bool #
- is_valid(seq: str) bool
- is_valid(seq: bytes) bool
- is_valid(seq: ndarray) bool
- property missing_char: str | None#
- property missing_index: int | None#
- property motif_len: int#
- property num_canonical: int#
- to_indices(seq: str | bytes | ndarray | tuple) ndarray[int] #
- to_indices(seq: tuple) ndarray[int]
- to_indices(seq: bytes) ndarray[int]
- to_indices(seq: str) ndarray[int]
- to_indices(seq: ndarray) ndarray[int]
- to_json() str #
returns a serialisable string
- to_rich_dict(for_pickle: bool = False) dict[str, Any] #
returns a serialisable dictionary
- with_gap_motif(gap_char: str = '-', missing_char: str = '?', include_missing: bool = False, gap_as_state: bool = False) Self #
returns new monomer alphabet with gap and missing characters added
- Parameters:
- gap_char
the IUPAC gap character “-”
- missing_char
the IUPAC missing character “?”
- include_missing
if True, and self.missing_char, it is included in the new alphabet
- gap_as_state
include the gap character as a state in the alphabet, drops gap_char attribute in resulting KmerAlphabet