AlignedSeqsData#
- class AlignedSeqsData(*, gapped_seqs: ndarray, names: tuple[str], alphabet: CharAlphabet, ungapped_seqs: dict[str, ndarray] | None = None, gaps: dict[str, ndarray] | None = None, offset: dict[str, int] | None = None, align_len: int | None = None, check: bool = True, reversed_seqs: set[str] | None = None)#
The builtin
cogent3
implementation of aligned sequences storage underlying anAlignment
. Indexing this object returns anAlignedDataView
which can realise the corresponding slice as a string, bytes, or numpy array, gapped or ungapped.- Attributes:
align_len
Return the length of the alignment.
alphabet
the character alphabet for validating, encoding, decoding sequences
names
returns the names of the sequences in the storage
offset
returns the offset of each sequence in the Alignment
reversed_seqs
names of sequences that are reverse complemented
Methods
add_seqs
(seqs[, force_unique_keys, offset])Returns a new AlignedSeqsData object with added sequences.
copy
(**kwargs)shallow copy of self
from_names_and_array
(*, names, data, alphabet)Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.
from_seqs
(*, data, alphabet, **kwargs)Construct an AlignedSeqsData object from a dict of aligned sequences
from_seqs_and_gaps
(*, seqs, gaps, alphabet, ...)Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.
get_gapped_seq_array
(*, seqid[, start, ...])Return sequence data corresponding to seqid as an array of indices.
get_gapped_seq_bytes
(*, seqid[, start, ...])Return sequence corresponding to seqid as a bytes string.
get_gapped_seq_str
(*, seqid[, start, stop, step])Return sequence corresponding to seqid as a string.
get_gaps
(seqid)returns the gap data for seqid
get_positions
(names[, start, stop, step])returns an array of the selected positions for names.
get_seq_array
(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as an array of indices.
get_seq_bytes
(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as a bytes string.
get_seq_length
(seqid)return length of the unaligned seq for seqid
get_seq_str
(*, seqid[, start, stop, step])Return ungapped sequence corresponding to seqid as a string.
get_ungapped
(name_map[, start, stop, step])Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.
get_view
()reurns view of aligned sequence data for seqid
to_alphabet
(alphabet[, check_valid])Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.
Notes
Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by
IndelMap
.- add_seqs(seqs: dict[str, str | ndarray[int]], force_unique_keys: bool = True, offset: dict[str, int] | None = None) AlignedSeqsData #
Returns a new AlignedSeqsData object with added sequences.
- Parameters:
- seqs
dict of sequences to add {name: seq, …}
- force_unique_keys
if True, raises ValueError if any sequence names already exist in the collection
- offset
dict of offsets relative to for the new sequences.
- property align_len: int#
Return the length of the alignment.
- property alphabet: CharAlphabet#
the character alphabet for validating, encoding, decoding sequences
- copy(**kwargs) Self #
shallow copy of self
Notes
kwargs are passed to constructor and will over-ride existing values
- classmethod from_names_and_array(*, names: Sequence[str], data: ndarray, alphabet: AlphabetABC) Self #
Construct an AlignedSeqsData object from a list of names and a numpy array of aligned sequence data.
- Parameters:
- names
list of sequence names
- data
numpy array of aligned sequence data
- alphabet
alphabet object for the sequences
- classmethod from_seqs(*, data: dict[str, str | ndarray[int]], alphabet: AlphabetABC, **kwargs) Self #
Construct an AlignedSeqsData object from a dict of aligned sequences
- Parameters:
- data
dict of gapped sequences {name: seq, …}. sequences must all be the same length
- alphabet
alphabet object for the sequences
- classmethod from_seqs_and_gaps(*, seqs: dict[str, str | bytes | ndarray[int]], gaps: dict[str, ndarray], alphabet: AlphabetABC, **kwargs) Self #
Construct an AlignedSeqsData object from a dict of ungapped sequences and a corresponding dict of gap data.
- Parameters:
- seqs
dict of ungapped sequences {name: seq, …}
- gaps
gap data {name: [[seq gap position, cumulative gap length], …], …}
- alphabet
alphabet object for the sequences
- get_gapped_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) ndarray #
Return sequence data corresponding to seqid as an array of indices. start/stop are in alignment coordinates. Includes gaps.
- get_gapped_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) bytes #
Return sequence corresponding to seqid as a bytes string. start/stop are in alignment coordinates. Includes gaps.
- get_gapped_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) str #
Return sequence corresponding to seqid as a string. start/stop are in alignment coordinates. Includes gaps.
- get_gaps(seqid: str) ndarray #
returns the gap data for seqid
- get_positions(names: Sequence[str], start: int | None = None, stop: int | None = None, step: int | None = None) ndarray #
returns an array of the selected positions for names.
- get_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) ndarray #
Return ungapped sequence corresponding to seqid as an array of indices.
Notes
Assumes start/stop are in sequence coordinates. If seqid is in reversed_seqs, that sequence will be in plus strand orientation. It is client codes responsibility to ensure the coordinates are consistent with that.
- get_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) bytes #
Return ungapped sequence corresponding to seqid as a bytes string. start/stop are in sequence coordinates. Excludes gaps.
- get_seq_length(seqid: str) int #
return length of the unaligned seq for seqid
- get_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) str #
Return ungapped sequence corresponding to seqid as a string. start/stop are in sequence coordinates. Excludes gaps.
- get_ungapped(name_map: dict[str, str], start: int | None = None, stop: int | None = None, step: int | None = None) tuple[dict, dict] #
Returns a dictionary of sequence data with no gaps or missing characters and a dictionary with information to construct a new SequenceCollection via make_unaligned_seqs.
- Parameters:
- name_map
A dict of {aln_name: data_name, …} indicating the mapping between names in the encompassing Alignment (aln_name) and the names in self (data_name).
- start
The alignment starting position.
- stop
The alignment stopping position.
- step
The step size.
- Returns:
- tuple
A tuple containing the following: - seqs (dict): A dictionary of {name: seq, …} where the sequences have no gaps
or missing characters.
kwargs (dict): A dictionary of keyword arguments for make_unaligned_seqs, e.g., {“offset”: self.offset, “name_map”: name_map}.
- get_view(seqid: str, slice_record: SliceRecord | None = None) AlignedDataView #
- get_view(seqid: int)
reurns view of aligned sequence data for seqid
- Parameters:
- seqid
sequence name
- slice_record
slice record to use for slicing the data. If None, uses the default slice record for the entire sequence.
- property names: tuple[str, ...]#
returns the names of the sequences in the storage
- property offset: dict[str, int]#
returns the offset of each sequence in the Alignment
- property reversed_seqs: frozenset[str]#
names of sequences that are reverse complemented
- to_alphabet(alphabet: AlphabetABC, check_valid: bool = True) Self #
Returns a new AlignedSeqsData object with the same underlying data with a new alphabet.