Molecular types#
Note
These docs now use the new_type
core objects via the following setting.
import os
# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"
The MolType
object provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments.
If your analysis involves handling ambiguous states, or translation via a genetic code, it’s critical to specify the appropriate moltype.
Available molecular types#
from cogent3 import available_moltypes
available_moltypes()
Abbreviation | Number of states | Moltype |
---|---|---|
'ab' | 2 | MolType(('a', 'b')) |
'dna' | 4 | MolType(('T', 'C', 'A', 'G')) |
'rna' | 4 | MolType(('U', 'C', 'A', 'G')) |
'protein' | 21 | MolType(('A', 'C', 'D', 'E', 'F', 'G', ... |
'protein_with_stop' | 22 | MolType(('A', 'C', 'D', 'E', 'F', 'G', ... |
'text' | 52 | MolType(('a', 'b', 'c', 'd', 'e', 'f', ... |
'bytes' | 256 | MolType(('\x00', '\x01', '\x02', '\x03'... |
7 rows x 3 columns
For statements that have a moltype
argument, use the entry under the “Abbreviation” column. For example:
from cogent3 import load_aligned_seqs
seqs = load_aligned_seqs("data/brca1-bats.fasta", moltype="dna")
Getting a MolType
#
from cogent3 import get_moltype
dna = get_moltype("dna")
dna
MolType(('T', 'C', 'A', 'G'))
Using a MolType
to get ambiguity codes#
Just using dna
from above.
dna.ambiguities
{'N': frozenset({'A', 'C', 'G', 'T'}),
'R': frozenset({'A', 'G'}),
'Y': frozenset({'C', 'T'}),
'W': frozenset({'A', 'T'}),
'S': frozenset({'C', 'G'}),
'K': frozenset({'G', 'T'}),
'M': frozenset({'A', 'C'}),
'B': frozenset({'C', 'G', 'T'}),
'D': frozenset({'A', 'G', 'T'}),
'H': frozenset({'A', 'C', 'T'}),
'V': frozenset({'A', 'C', 'G'}),
'?': frozenset({'-', 'A', 'C', 'G', 'T'})}
Nucleic acid MolType
and complementing#
dna.complement("AGG")
'TCC'
Making sequences#
Use the either the top level cogent3.make_seq
function, or the method on the MolType
instance.
seq = dna.make_seq(seq="AGGCTT", name="seq1")
seq
0 | |
seq1 | AGGCTT |
DnaSequence, length=6
Verify sequences#
rna = get_moltype("rna")
rna.is_valid("ACGUACGUACGUACGU")
True
Making a custom MolType
#
We demonstrate this by customising DNA so it allows a .
as a gap character.
from cogent3.core import new_moltype, new_sequence
mt = new_moltype.MolType(
monomers="".join(new_moltype.IUPAC_DNA_chars),
ambiguities=new_moltype.IUPAC_DNA_ambiguities,
name="dna.gap",
complements=new_moltype.IUPAC_DNA_ambiguities_complements,
make_seq=new_sequence.DnaSequence,
pairing_rules=new_moltype.DNA_STANDARD_PAIRS,
mw_calculator=new_moltype.DnaMW,
coerce_to=new_moltype.coerce_to_dna,
gap=".",
)
seq = mt.make_seq(seq="ACG.")
seq
0 | |
None | ACG. |
DnaSequence, length=4