Molecular types#

The MolType object provides services for resolving ambiguities, or providing the correct ambiguity for recoding. It also maintains the mappings between different kinds of alphabets, sequences and alignments.

If your analysis involves handling ambiguous states, or translation via a genetic code, it’s critical to specify the appropriate moltype.

Available molecular types#

from cogent3 import available_moltypes

available_moltypes()

Specify a moltype by the Abbreviation (case insensitive).
Abbreviation	Number of states	Moltype
'dna'	4	MolType(('T', 'C', 'A', 'G'))
'rna'	4	MolType(('U', 'C', 'A', 'G'))
'protein'	21	MolType(('A', 'C', 'D', 'E', 'F', 'G', ...
'protein_with_stop'	22	MolType(('A', 'C', 'D', 'E', 'F', 'G', ...
'text'	52	MolType(('a', 'b', 'c', 'd', 'e', 'f', ...
'bytes'	256	MolType((b'\x00', b'\x01', b'\x02', b'\...

6 rows x 3 columns

For statements that have a moltype argument, use the entry under the “Abbreviation” column. For example:

from cogent3 import load_aligned_seqs

seqs = load_aligned_seqs("data/brca1-bats.fasta", moltype="dna")

Getting a `MolType`#

from cogent3 import get_moltype

dna = get_moltype("dna")
dna

MolType(('T', 'C', 'A', 'G'))

Using a `MolType` to get ambiguity codes#

Just using dna from above.

dna.ambiguities

{'N': frozenset({'A', 'C', 'G', 'T'}),
 'R': frozenset({'A', 'G'}),
 'Y': frozenset({'C', 'T'}),
 'W': frozenset({'A', 'T'}),
 'S': frozenset({'C', 'G'}),
 'K': frozenset({'G', 'T'}),
 'M': frozenset({'A', 'C'}),
 'B': frozenset({'C', 'G', 'T'}),
 'D': frozenset({'A', 'G', 'T'}),
 'H': frozenset({'A', 'C', 'T'}),
 'V': frozenset({'A', 'C', 'G'}),
 '?': frozenset({'-', 'A', 'C', 'G', 'T'})}

Nucleic acid `MolType` and complementing#

dna.complement("AGG")

'TCC'

Making sequences#

Use the either the top level cogent3.make_seq function, or the method on the MolType instance.

seq = dna.make_seq(seq="AGGCTT", name="seq1")
seq

	0
seq1	AGGCTT

DnaSequence, length=6

Verify sequences#

rna = get_moltype("rna")
rna.is_valid("ACGUACGUACGUACGU")

True

Making a custom `MolType`#

We demonstrate this by customising DNA so it allows a . as a gap character.

from cogent3.core import moltype, sequence

mt = moltype.MolType(
        monomers="".join(moltype.IUPAC_DNA_chars),
        ambiguities=moltype.IUPAC_DNA_ambiguities,
        name="dna.gap",
        complements=moltype.IUPAC_DNA_ambiguities_complements,
        make_seq=sequence.DnaSequence,
        pairing_rules=moltype.DNA_STANDARD_PAIRS,
        mw_calculator=moltype.DnaMW,
        coerce_to=moltype.coerce_to_dna,
        gap=".",
    )
seq = mt.make_seq(seq="ACG.")
seq

	0
None	ACG.

DnaSequence, length=4

Molecular types#

Available molecular types#

Getting a MolType#

Using a MolType to get ambiguity codes#

Nucleic acid MolType and complementing#