SeqsData#
- class SeqsData(*, data: dict[str, str | bytes | ndarray[int]], alphabet: CharAlphabet, offset: dict[str, int] | None = None, check: bool = True, reversed_seqs: set[str] | None = None)#
The builtin
cogent3implementation of sequence storage underlying aSequenceCollection. The sequence data is stored as numpy arrays. Indexing this object (using an int or seq name) returns aSeqDataView, which can realise the corresponding slice as a string, bytes, or numpy array via the alphabet.- Attributes:
alphabetthe character alphabet for validating, encoding, decoding sequences
namesreturns the names of the sequences in the storage
offsetannotation offsets for each sequence
reversed_seqsnames of sequences that are reverse complemented
Methods
add_seqs(seqs[, force_unique_keys, offset])Returns a new SeqsData object with added sequences.
copy(**kwargs)shallow copy of self
get_hash(seqid)returns hash of seqid
get_seq_length(seqid)return length for seqid
get_view(seqid)reurns view of sequence data for seqid
from_seqs
get_seq_array
get_seq_bytes
get_seq_str
to_alphabet
Notes
Methods on this object only accepts plust strand start, stop and step indices for selecting segments of data. It can return the gap coordinates for a sequence as used by IndelMap.
- add_seqs(seqs: dict[str, str | bytes | ndarray[int]], force_unique_keys: bool = True, offset: dict[str, int] | None = None) SeqsData#
Returns a new SeqsData object with added sequences. If force_unique_keys is True, raises ValueError if any names already exist in the collection.
- property alphabet: CharAlphabet#
the character alphabet for validating, encoding, decoding sequences
- copy(**kwargs) Self#
shallow copy of self
Notes
kwargs are passed to constructor and will over-ride existing values
- classmethod from_seqs(*, data: dict[str, str | bytes | ndarray[int]], alphabet: AlphabetABC, **kwargs)#
- get_hash(seqid: str) str | None#
returns hash of seqid
- get_seq_array(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) ndarray#
- get_seq_bytes(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) bytes#
- get_seq_length(seqid: str) int#
return length for seqid
- get_seq_str(*, seqid: str, start: int | None = None, stop: int | None = None, step: int | None = None) str#
- get_view(seqid: str) SeqDataView#
reurns view of sequence data for seqid
- property names: tuple[str, ...]#
returns the names of the sequences in the storage
- property offset: dict[str, int]#
annotation offsets for each sequence
- property reversed_seqs: frozenset[str]#
names of sequences that are reverse complemented