AlignedSeqsData#

class AlignedSeqsData(*, gapped_seqs: ndarray, names: tuple[str], alphabet: AlphabetABC, ungapped_seqs: dict[str, ndarray] | None = None, gaps: dict[str, ndarray] | None = None, offset: dict[str, int] | None = None, align_len: int | None = None, check: bool = True)#

The builtin cogent3 implementation of a container of aligned sequences underlying an Alignment. Indexing this object returns an AlignedDataView which can realise the corresponding slice as a string, bytes, or numpy array, gapped or ungapped.

Attributes:
align_len
alphabet
names
offset

returns the offset of each sequence in the Alignment

Methods

add_seqs(seqs[, force_unique_keys, offset])

Returns a new AlignedSeqsData object with added sequences.

from_names_and_array(*, names, data, alphabet)

Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.

from_seqs(*, data, alphabet, **kwargs)

Construct an AlignedSeqsData object from a dict of aligned sequences

from_seqs_and_gaps(*, seqs, gaps, alphabet, ...)

Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.

get_gapped_seq_array(*, seqid[, start, ...])

Return sequence data corresponding to seqid as an array of indices.

get_gapped_seq_bytes(*, seqid[, start, ...])

Return sequence corresponding to seqid as a bytes string.

get_gapped_seq_str(*, seqid[, start, stop, step])

Return sequence corresponding to seqid as a string.

get_positions(names[, start, stop, step])

returns an array of the selected positions for names.

get_seq_array(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as an array of indices.

get_seq_bytes(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as a bytes string.

get_seq_length(seqid)

return length of the unaligned seq for seqid

get_seq_str(*, seqid[, start, stop, step])

Return ungapped sequence corresponding to seqid as a string.

get_ungapped(name_map[, start, stop, step])

Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.

to_alphabet(alphabet[, check_valid])

Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.

get_gaps

get_view

Notes

Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by IndelMap.

add_seqs(seqs: dict[str, str | ndarray[int]], force_unique_keys: bool = True, offset: dict[str, int] = None) AlignedSeqsData#

Returns a new AlignedSeqsData object with added sequences.

Parameters:
seqs

dict of sequences to add {name: seq, …}

force_unique_keys

if True, raises ValueError if any sequence names already exist in the collection

property align_len: int#
property alphabet: CharAlphabet#
classmethod from_names_and_array(*, names: Sequence[str], data: ndarray, alphabet: AlphabetABC)#

Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.

Parameters:
names

list of sequence names

data

numpy array of aligned sequence data

alphabet

alphabet object for the sequences

classmethod from_seqs(*, data: dict[str, str | ndarray[int]], alphabet: AlphabetABC, **kwargs)#

Construct an AlignedSeqsData object from a dict of aligned sequences

Parameters:
data

dict of gapped sequences {name: seq, …}. sequences must all be the same length

alphabet

alphabet object for the sequences

classmethod from_seqs_and_gaps(*, seqs: dict[str, str | bytes | ndarray[int]], gaps: dict[str, ndarray], alphabet: AlphabetABC, **kwargs)#

Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.

Parameters:
seqs

dict of ungapped sequences {name: seq, …}

gaps

gap data {name: [[seq gap position, cumulative gap length], …], …}

alphabet

alphabet object for the sequences

get_gapped_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) ndarray#

Return sequence data corresponding to seqid as an array of indices. start/stop are in alignment coordinates. Includes gaps.

get_gapped_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) bytes#

Return sequence corresponding to seqid as a bytes string. start/stop are in alignment coordinates. Includes gaps.

get_gapped_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) str#

Return sequence corresponding to seqid as a string. start/stop are in alignment coordinates. Includes gaps.

get_gaps(seqid: str) ndarray#
get_positions(names: Sequence[str], start: int | None = None, stop: int | None = None, step: int | None = None) ndarray#

returns an array of the selected positions for names.

get_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) ndarray#

Return ungapped sequence corresponding to seqid as an array of indices. assumes start/stop are in sequence coordinates. Excludes gaps.

get_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) bytes#

Return ungapped sequence corresponding to seqid as a bytes string. start/stop are in sequence coordinates. Excludes gaps.

get_seq_length(seqid: str) int#

return length of the unaligned seq for seqid

get_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) str#

Return ungapped sequence corresponding to seqid as a string. start/stop are in sequence coordinates. Excludes gaps.

get_ungapped(name_map: dict[str, str], start: int | None = None, stop: int | None = None, step: int | None = None) tuple[dict, dict]#

Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.

Parameters:
name_map

A dict of {aln_name: data_name, …} indicating the mapping between names in the encompassing Alignment (aln_name) and the names in self (data_name).

start

The alignment starting position.

stop

The alignment stopping position.

step

The step size.

Returns:
tuple

A tuple containing the following: - seqs (dict): A dictionary of {name: seq, …} where the sequences have no gaps

or missing characters.

  • kwargs (dict): A dictionary of keyword arguments for make_unaligned_seqs, e.g., {“offset”: self.offset, “name_map”: name_map}.

get_view(seqid: str)#
get_view(seqid: int)
property names: tuple[str]#
property offset: dict[str, int]#

returns the offset of each sequence in the Alignment

to_alphabet(alphabet: AlphabetABC, check_valid: bool = True)#

Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.