Using a codon model¶
We load the unaligned sequences we will use in our examples.
from cogent3.app import io
reader = io.load_unaligned(format="fasta")
seqs = reader("data/SCA1-cds.fasta")
Codon alignment with default settings¶
The default settings will result in estimation of a guide tree (using percent identity between the sequences). The default “codon” model is MG94HKY.
from cogent3.app.align import progressive_align
codon_aligner = progressive_align("codon")
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Mouse | ...............................................A..T......... |
Rat | ........T.................T....................A..T......... |
Mouse Lemur | ..................................................T......... |
Macaque | ............................................................ |
6 x 2478 (truncated to 6 x 60) dna alignment
The parameters used to construct the alignment, including the guide tree and substitution model, are record in the info
attribute.
aligned.info
{'Refs': {},
'source': 'data/SCA1-cds.fasta',
'align_params': {'omega': 0.4,
'kappa': 3,
'indel_length': 0.1,
'indel_rate': 1e-10,
'guide_tree': '((Human:0.0007811864063533017,Chimp:0.0010087075768492318):0.0025329760417166156,((Mouse:0.00754218023336982,Rat:0.00654034335485719):0.025249030951039707,Mouse_Lemur:0.015105797134349978):0.009160143497857723,Macaque:0.0031170774657144143);',
'model': 'MG94HKY',
'lnL': -6539.927872324639}}
Note
If can also specify unique_guides=True
, which means a guide tree will be estimated for every alignment.
Specify a different distance measure for estimating the guide tree¶
The distance measures available are the same as for the nucleotide case (percent, TN93 or paralinear).
Note
An estimated guide tree has its branch lengths scaled so they are consistent with usage in a codon model.
nt_aligner = progressive_align("codon", distance="paralinear")
aligned = nt_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Mouse Lemur | ..................................................T......... |
Mouse | ...............................................A..T......... |
Rat | ........T.................T....................A..T......... |
Macaque | ............................................................ |
Chimp | ............................................................ |
6 x 2478 (truncated to 6 x 60) dna alignment
Providing a guide tree¶
Note
The guide tree needs to have branch lengths, otherwise a ValueError
is raised.
tree = "((Chimp:0.001,Human:0.001):0.0076,Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01)"
codon_aligner = progressive_align("codon", guide_tree=tree)
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2478 (truncated to 6 x 60) dna alignment
Specifying the gap parameters¶
codon_aligner = progressive_align(
"codon", guide_tree=tree, indel_rate=0.001, indel_length=0.01
)
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2478 (truncated to 6 x 60) dna alignment
Specifying the substitution model and parameters¶
Any codon substitution model can be used. (See cogent3.available_models()
.) If you provide parameter values, those must be consistent with the model definition.
codon_aligner = progressive_align(
"CNFHKY", guide_tree=tree, param_vals=dict(omega=0.1, kappa=3)
)
aligned = codon_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2478 (truncated to 6 x 60) dna alignment
Alignment settings and file provenance are recorded in the info
attribute¶
aligned.info
{'Refs': {},
'source': 'data/SCA1-cds.fasta',
'align_params': {'omega': 0.1,
'kappa': 3,
'indel_length': 0.1,
'indel_rate': 1e-10,
'guide_tree': '((Chimp:0.001,Human:0.001):0.0076,Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01);',
'model': 'CNFHKY',
'lnL': -6211.757122737677}}