Molecular types#
Note
Alpha Release of the New MolType API
We are pleased to announce an alpha release of our new MolType
API. This version can be accessed by specifying the argument new_type=True
in the get_moltype()
function.
Please be aware that this alpha release has not been fully integrated with the library. Users are encouraged to explore its capabilities but should proceed with caution!
The MolType
object provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments.
If your analysis involves handling ambiguous states, or translation via a genetic code, it’s critical to specify the appropriate moltype.
Available molecular types#
from cogent3 import available_moltypes
available_moltypes()
Abbreviation | Number of states | Moltype |
---|---|---|
'ab' | 2 | MolType(('a', 'b')) |
'dna' | 4 | MolType(('T', 'C', 'A', 'G')) |
'rna' | 4 | MolType(('U', 'C', 'A', 'G')) |
'protein' | 21 | MolType(('A', 'C', 'D', 'E', 'F', 'G', ... |
'protein_with_stop' | 22 | MolType(('A', 'C', 'D', 'E', 'F', 'G', ... |
'text' | 52 | MolType(('a', 'b', 'c', 'd', 'e', 'f', ... |
'bytes' | 256 | MolType(('\x00', '\x01', '\x02', '\x03'... |
7 rows x 3 columns
For statements that have a moltype
argument, use the entry under the “Abbreviation” column. For example:
from cogent3 import load_aligned_seqs
seqs = load_aligned_seqs("data/brca1-bats.fasta", moltype="dna")
Getting a MolType
#
from cogent3 import get_moltype
dna = get_moltype("dna")
dna
MolType(('T', 'C', 'A', 'G'))
Using a MolType
to get ambiguity codes#
Just using dna
from above.
dna.ambiguities
{'?': ('T', 'C', 'A', 'G', '-'),
'-': ('-',),
'N': ('A', 'C', 'T', 'G'),
'R': ('A', 'G'),
'Y': ('C', 'T'),
'W': ('A', 'T'),
'S': ('C', 'G'),
'K': ('T', 'G'),
'M': ('C', 'A'),
'B': ('C', 'T', 'G'),
'D': ('A', 'T', 'G'),
'H': ('A', 'C', 'T'),
'V': ('A', 'C', 'G'),
'T': ('T',),
'C': ('C',),
'A': ('A',),
'G': ('G',)}
Nucleic acid MolType
and complementing#
dna.complement("AGG")
'TCC'
Making sequences#
Use the either the top level cogent3.make_seq
function, or the method on the MolType
instance.
seq = dna.make_seq(seq="AGGCTT", name="seq1")
seq
0 | |
seq1 | AGGCTT |
DnaSequence, length=6
Verify sequences#
rna = get_moltype("rna")
rna.is_valid("ACGUACGUACGUACGU")
True
Making a custom MolType
#
We demonstrate this by customising DNA so it allows .
as gaps
from cogent3.core import moltype as mt
DNAgapped = mt.MolType(
seq_constructor=mt.DnaSequence,
motifset=mt.IUPAC_DNA_chars,
ambiguities=mt.IUPAC_DNA_ambiguities,
complements=mt.IUPAC_DNA_ambiguities_complements,
pairs=mt.DnaStandardPairs,
gaps=".",
)
seq = DNAgapped.make_seq("ACG.")
seq
0 | |
None | ACG. |
DnaSequence, length=4