Using a protein model#
We use apps to load unaligned DNA sequences and to translate them into amino acids.
from cogent3 import get_app
loader = get_app("load_unaligned", format_name="fasta")
to_aa = get_app("translate_seqs")
process = loader + to_aa
seqs = process("data/SCA1-cds.fasta")
Protein alignment with default settings#
The default setting for “protein” is a WG01 model.
from cogent3 import get_app
aa_aligner = get_app("progressive_align", "protein")
aligned = aa_aligner(seqs)
aligned
0 | |
Human | MKSNQERSNECLPPKKREIPATSRSSEEKAPTLPSDNHRVEGTAWLPGNPGGRGHGGGRH |
Chimp | ............................................................ |
Macaque | ........................P......A............................ |
Mouse Lemur | ...............................A.......A..AP................ |
Mouse | ........................P.....TA......C...V....ST..I........ |
Rat | ........................P.....TA......C...V....ST..S........ |
6 x 825 (truncated to 6 x 60) protein alignment
Specify a different distance measure for estimating the guide tree#
The distance measures available are percent or paralinear.
Note
An estimated guide tree has its branch lengths scaled so they are consistent with usage in a codon model.
aa_aligner = get_app("progressive_align", "protein", distance="paralinear")
aligned = aa_aligner(seqs)
aligned
0 | |
Human | MKSNQERSNECLPPKKREIPATSRSSEEKAPTLPSDNHRVEGTAWLPGNPGGRGHGGGRH |
Macaque | ........................P......A............................ |
Mouse | ........................P.....TA......C...V....ST..I........ |
Rat | ........................P.....TA......C...V....ST..S........ |
Mouse Lemur | ...............................A.......A..AP................ |
Chimp | ............................................................ |
6 x 825 (truncated to 6 x 60) protein alignment
Alignment settings provenance#
The parameters used to construct the alignment, including the guide tree and substitution model, are record in the alignment info
attribute.
aligned.info
{'Refs': {},
'align_params': {'indel_length': 0.1,
'indel_rate': 1e-10,
'guide_tree': '(Human:0.0001974014915215993,((Macaque:0.0023127545121458537,((Mouse:0.011219581285708921,Rat:0.004763355238688913):0.052856143725369786,Mouse_Lemur:0.03580862702845759):0.024351474041303382):0.003074310394311757,Chimp:0.008168683695808834):1e-06);',
'model': 'JTT92',
'lnL': -3220.8943153214836}}
The file from which the alignment was derived (the provenance) is on the .source
attribute.
aligned.source
'SCA1-cds.fasta'