AlignedSeqsData#
- class AlignedSeqsData(*, gapped_seqs: ndarray, names: tuple[str], alphabet: AlphabetABC, ungapped_seqs: dict[str, ndarray] | None = None, gaps: dict[str, ndarray] | None = None, offset: dict[str, int] | None = None, align_len: int | None = None, check: bool = True)#
The builtin
cogent3
implementation of a container of aligned sequences underlying anAlignment
. Indexing this object returns anAlignedDataView
which can realise the corresponding slice as a string, bytes, or numpy array, gapped or ungapped.- Attributes:
- align_len
- alphabet
- names
offset
returns the offset of each sequence in the Alignment
Methods
add_seqs
(seqs[, force_unique_keys, offset])Returns a new AlignedSeqsData object with added sequences.
from_names_and_array
(*, names, data, alphabet)Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.
from_seqs
(*, data, alphabet, **kwargs)Construct an AlignedSeqsData object from a dict of aligned sequences
from_seqs_and_gaps
(*, seqs, gaps, alphabet, ...)Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.
get_gapped_seq_array
(*, seqid[, start, ...])Return sequence data corresponding to seqid as an array of indices.
get_gapped_seq_bytes
(*, seqid[, start, ...])Return sequence corresponding to seqid as a bytes string.
get_gapped_seq_str
(*, seqid[, start, stop, step])Return sequence corresponding to seqid as a string.
get_positions
(names[, start, stop, step])returns an array of the selected positions for names.
get_seq_array
(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as an array of indices.
get_seq_bytes
(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as a bytes string.
get_seq_length
(seqid)return length of the unaligned seq for seqid
get_seq_str
(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as a string.
get_ungapped
(name_map[, start, stop, step])Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.
to_alphabet
(alphabet[, check_valid])Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.
get_gaps
get_view
Notes
Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by
IndelMap
.- add_seqs(seqs: dict[str, str | ndarray[int]], force_unique_keys: bool = True, offset: dict[str, int] = None) AlignedSeqsData #
Returns a new AlignedSeqsData object with added sequences.
- Parameters:
- seqs
dict of sequences to add {name: seq, …}
- force_unique_keys
if True, raises ValueError if any sequence names already exist in the collection
- property align_len: int#
- property alphabet: CharAlphabet#
- classmethod from_names_and_array(*, names: Sequence[str], data: ndarray, alphabet: AlphabetABC)#
Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.
- Parameters:
- names
list of sequence names
- data
numpy array of aligned sequence data
- alphabet
alphabet object for the sequences
- classmethod from_seqs(*, data: dict[str, str | ndarray[int]], alphabet: AlphabetABC, **kwargs)#
Construct an AlignedSeqsData object from a dict of aligned sequences
- Parameters:
- data
dict of gapped sequences {name: seq, …}. sequences must all be the same length
- alphabet
alphabet object for the sequences
- classmethod from_seqs_and_gaps(*, seqs: dict[str, str | bytes | ndarray[int]], gaps: dict[str, ndarray], alphabet: AlphabetABC, **kwargs)#
Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.
- Parameters:
- seqs
dict of ungapped sequences {name: seq, …}
- gaps
gap data {name: [[seq gap position, cumulative gap length], …], …}
- alphabet
alphabet object for the sequences
- get_gapped_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) ndarray #
Return sequence data corresponding to seqid as an array of indices. start/stop are in alignment coordinates. Includes gaps.
- get_gapped_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) bytes #
Return sequence corresponding to seqid as a bytes string. start/stop are in alignment coordinates. Includes gaps.
- get_gapped_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) str #
Return sequence corresponding to seqid as a string. start/stop are in alignment coordinates. Includes gaps.
- get_gaps(seqid: str) ndarray #
- get_positions(names: Sequence[str], start: int | None = None, stop: int | None = None, step: int | None = None) ndarray #
returns an array of the selected positions for names.
- get_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) ndarray #
Return ungapped sequence corresponding to seqid as an array of indices. assumes start/stop are in sequence coordinates. Excludes gaps.
- get_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) bytes #
Return ungapped sequence corresponding to seqid as a bytes string. start/stop are in sequence coordinates. Excludes gaps.
- get_seq_length(seqid: str) int #
return length of the unaligned seq for seqid
- get_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) str #
Return ungapped sequence corresponding to seqid as a string. start/stop are in sequence coordinates. Excludes gaps.
- get_ungapped(name_map: dict[str, str], start: int | None = None, stop: int | None = None, step: int | None = None) tuple[dict, dict] #
Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.
- Parameters:
- name_map
A dict of {aln_name: data_name, …} indicating the mapping between names in the encompassing Alignment (aln_name) and the names in self (data_name).
- start
The alignment starting position.
- stop
The alignment stopping position.
- step
The step size.
- Returns:
- tuple
A tuple containing the following: - seqs (dict): A dictionary of {name: seq, …} where the sequences have no gaps
or missing characters.
kwargs (dict): A dictionary of keyword arguments for make_unaligned_seqs, e.g., {“offset”: self.offset, “name_map”: name_map}.
- get_view(seqid: str)#
- get_view(seqid: int)
- property names: tuple[str]#
- property offset: dict[str, int]#
returns the offset of each sequence in the Alignment
- to_alphabet(alphabet: AlphabetABC, check_valid: bool = True)#
Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.