Using a codon model#
We load the unaligned sequences we will use in our examples.
from cogent3 import get_app
loader = get_app("load_unaligned", format="fasta")
seqs = loader("data/SCA1-cds.fasta")
Note
We use an app loader, but since this is just a single file we could have used the cogent3.load_unaligned_seqs()
function.
Codon alignment with default settings#
The default settings will result in estimation of a guide tree (using percent identity between the sequences). The default “codon” model is MG94HKY.
from cogent3 import get_app
codon_aligner = get_app("progressive_align", "codon")
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Mouse | ...............................................A..T......... |
Rat | ........T.................T....................A..T......... |
Mouse Lemur | ..................................................T......... |
Macaque | ............................................................ |
6 x 2478 (truncated to 6 x 60) dna alignment
Note
If you specify unique_guides=True
, a guide tree will be estimated for every alignment.
Specify a different distance measure for estimating the guide tree#
The distance measures available are the same as for the nucleotide case (percent, TN93 or paralinear).
Note
An estimated guide tree has its branch lengths scaled so they are consistent with usage in a codon model.
nt_aligner = get_app("progressive_align", "codon", distance="paralinear")
aligned = nt_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Mouse | ...............................................A..T......... |
Rat | ........T.................T....................A..T......... |
Mouse Lemur | ..................................................T......... |
Macaque | ............................................................ |
6 x 2478 (truncated to 6 x 60) dna alignment
Providing a guide tree#
tree = "((Chimp:0.001,Human:0.001):0.0076,Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01)"
codon_aligner = get_app("progressive_align", "codon", guide_tree=tree)
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2478 (truncated to 6 x 60) dna alignment
Warning
The guide tree must have branch lengths, otherwise a ValueError
is raised.
Specifying the gap parameters#
codon_aligner = get_app("progressive_align",
"codon", guide_tree=tree, indel_rate=0.001, indel_length=0.01
)
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2478 (truncated to 6 x 60) dna alignment
Specifying the substitution model and parameters#
Any cogent3
codon substitution model can be used. (See cogent3.available_models()
.)
codon_aligner = get_app("progressive_align",
"CNFHKY", guide_tree=tree, param_vals=dict(omega=0.1, kappa=3)
)
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2478 (truncated to 6 x 60) dna alignment
Note
If you provide parameter values, those must be consistent with the model definition.
Alignment settings and file provenance are recorded in the info
attribute#
The parameters used to construct the alignment, including the guide tree and substitution model, are record in the alignment info
attribute.
aligned.info
{'Refs': {},
'source': 'data/SCA1-cds.fasta',
'align_params': {'omega': 0.1,
'kappa': 3,
'indel_length': 0.1,
'indel_rate': 1e-10,
'guide_tree': "((Chimp:0.001,Human:0.001):0.0076,(Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01)'AUTOGENERATED_NAME_lB':1e-06);",
'model': 'CNFHKY',
'lnL': -6211.755293809508}}