AlignedSeqsData#

class AlignedSeqsData(*, gapped_seqs: ndarray, names: tuple[str], alphabet: AlphabetABC, ungapped_seqs: dict[str, ndarray] | None = None, gaps: dict[str, ndarray] | None = None, offset: dict[str, int] | None = None, align_len: int | None = None, check: bool = True, reversed_seqs: set[str] | None = None)#

The builtin cogent3 implementation of a container of aligned sequences underlying an Alignment. Indexing this object returns an AlignedDataView which can realise the corresponding slice as a string, bytes, or numpy array, gapped or ungapped.

Attributes:

align_len
alphabet
names
offset: returns the offset of each sequence in the Alignment
reversed_seqs

Methods

`add_seqs`(seqs[, force_unique_keys, offset])	Returns a new AlignedSeqsData object with added sequences.
`copy`(**kwargs)	shallow copy of self
`from_names_and_array`(*, names, data, alphabet)	Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.
`from_seqs`(, data, alphabet, *kwargs)	Construct an AlignedSeqsData object from a dict of aligned sequences
`from_seqs_and_gaps`(*, seqs, gaps, alphabet, ...)	Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.
`get_gapped_seq_array`(*, seqid[, start, ...])	Return sequence data corresponding to seqid as an array of indices.
`get_gapped_seq_bytes`(*, seqid[, start, ...])	Return sequence corresponding to seqid as a bytes string.
`get_gapped_seq_str`(*, seqid[, start, stop, step])	Return sequence corresponding to seqid as a string.
`get_positions`(names[, start, stop, step])	returns an array of the selected positions for names.
`get_seq_array`(*, seqid[, start, stop, step])	Return ungapped sequence corresponding to seqid as an array of indices.
`get_seq_bytes`(*, seqid[, start, stop, step])	Return ungapped sequence corresponding to seqid as a bytes string.
`get_seq_length`(seqid)	return length of the unaligned seq for seqid
`get_seq_str`(*, seqid[, start, stop, step])	Return ungapped sequence corresponding to seqid as a string.
`get_ungapped`(name_map[, start, stop, step])	Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.
`to_alphabet`(alphabet[, check_valid])	Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.

get_gaps
get_view

Notes

Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by IndelMap.

add_seqs(seqs: dict[str, str | ndarray[int]], force_unique_keys: bool = True, offset: dict[str, int] | None = None) → AlignedSeqsData#

Returns a new AlignedSeqsData object with added sequences.

Parameters:

seqs: dict of sequences to add {name: seq, …}
force_unique_keys: if True, raises ValueError if any sequence names already exist in the collection

property align_len: int#

property alphabet: CharAlphabet#

copy(**kwargs) → Self#

shallow copy of self

Notes

kwargs are passed to constructor and will over-ride existing values

classmethod from_names_and_array(*, names: Sequence[str], data: ndarray, alphabet: AlphabetABC) → Self#

Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.

Parameters:

names: list of sequence names
data: numpy array of aligned sequence data
alphabet: alphabet object for the sequences

classmethod from_seqs(*, data: dict[str, str | ndarray[int]], alphabet: AlphabetABC, **kwargs) → Self#

Construct an AlignedSeqsData object from a dict of aligned sequences

Parameters:

data: dict of gapped sequences {name: seq, …}. sequences must all be the same length
alphabet: alphabet object for the sequences

classmethod from_seqs_and_gaps(*, seqs: dict[str, str | bytes | ndarray[int]], gaps: dict[str, ndarray], alphabet: AlphabetABC, **kwargs) → Self#

Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.

Parameters:

seqs: dict of ungapped sequences {name: seq, …}
gaps: gap data {name: [[seq gap position, cumulative gap length], …], …}
alphabet: alphabet object for the sequences

get_gapped_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) → ndarray#: Return sequence data corresponding to seqid as an array of indices. start/stop are in alignment coordinates. Includes gaps.

get_gapped_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) → bytes#: Return sequence corresponding to seqid as a bytes string. start/stop are in alignment coordinates. Includes gaps.

get_gapped_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) → str#: Return sequence corresponding to seqid as a string. start/stop are in alignment coordinates. Includes gaps.

get_gaps(seqid: str) → ndarray#

get_positions(names: Sequence[str], start: int | None = None, stop: int | None = None, step: int | None = None) → ndarray#: returns an array of the selected positions for names.

get_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) → ndarray#

Return ungapped sequence corresponding to seqid as an array of indices.

Notes

Assumes start/stop are in sequence coordinates. If seqid is in reversed_seqs, that sequence will be in plus strand orientation. It is client codes responsibility to ensure the coordinates are consistent with that.

get_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) → bytes#: Return ungapped sequence corresponding to seqid as a bytes string. start/stop are in sequence coordinates. Excludes gaps.

get_seq_length(seqid: str) → int#: return length of the unaligned seq for seqid

get_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) → str#: Return ungapped sequence corresponding to seqid as a string. start/stop are in sequence coordinates. Excludes gaps.

get_ungapped(name_map: dict[str, str], start: int | None = None, stop: int | None = None, step: int | None = None) → tuple[dict, dict]#

Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.

Parameters:

name_map: A dict of {aln_name: data_name, …} indicating the mapping between names in the encompassing Alignment (aln_name) and the names in self (data_name).
start: The alignment starting position.
stop: The alignment stopping position.
step: The step size.

Returns:

tuple

A tuple containing the following: - seqs (dict): A dictionary of {name: seq, …} where the sequences have no gaps

or missing characters.

kwargs (dict): A dictionary of keyword arguments for make_unaligned_seqs, e.g., {“offset”: self.offset, “name_map”: name_map}.

get_view(seqid: str, slice_record: SliceRecord | None = None) → AlignedDataView#
get_view(seqid: int)

property names: tuple[str]#

property offset: dict[str, int]#: returns the offset of each sequence in the Alignment

property reversed_seqs: frozenset#

to_alphabet(alphabet: AlphabetABC, check_valid: bool = True) → Self#: Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.