Using a nucleotide model#
We load the unaligned sequences we will use in our examples.
from cogent3 import get_app
loader = get_app("load_unaligned", format="fasta")
seqs = loader("data/SCA1-cds.fasta")
Note
We use an app loader, but since this is just a single file we could have used the cogent3.load_unaligned_seqs()
function.
Nucleotide alignment with default settings#
The default setting for “nucleotide” is a HKY85 model.
from cogent3 import get_app
nt_aligner = get_app("progressive_align", "nucleotide")
aligned = nt_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Mouse Lemur | ..................................................T......... |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Macaque | ............................................................ |
6 x 2475 (truncated to 6 x 60) dna alignment
Note
If you specify unique_guides=True
, a guide tree will be estimated for every alignment.
Specify a different distance measure for estimating the guide tree#
For the nucleotide case, you can use TN93 or paralinear.
nt_aligner = get_app("progressive_align", "nucleotide", distance="TN93")
aligned = nt_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Mouse Lemur | ..................................................T......... |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Macaque | ............................................................ |
6 x 2475 (truncated to 6 x 60) dna alignment
Providing a guide tree#
tree = "((Chimp:0.001,Human:0.001):0.0076,Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01)"
nt_aligner = get_app("progressive_align", "nucleotide", guide_tree=tree)
aligned = nt_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2475 (truncated to 6 x 60) dna alignment
Warning
The guide tree must have branch lengths, otherwise a ValueError
is raised.
Specifying the substitution model#
You can use any cogent3
nucleotide substitution model. For a list of all available, see cogent3.available_models()
.
tree = "((Chimp:0.001,Human:0.001):0.0076,Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01)"
nt_aligner = get_app("progressive_align", "F81", guide_tree=tree)
aligned = nt_aligner(seqs)
aligned
0 | |
Human | ATGAAATCCAACCAAGAGCGGAGCAACGAATGCCTGCCTCCCAAGAAGCGCGAGATCCCC |
Chimp | ............................................................ |
Macaque | ............................................................ |
Rat | ........T.................T....................A..T......... |
Mouse | ...............................................A..T......... |
Mouse Lemur | ..................................................T......... |
6 x 2475 (truncated to 6 x 60) dna alignment
Alignment settings and file provenance are recorded in the info
attribute#
aligned.info
{'Refs': {},
'source': 'data/SCA1-cds.fasta',
'align_params': {'indel_length': 0.1,
'indel_rate': 1e-10,
'guide_tree': "((Chimp:0.001,Human:0.001):0.0076,(Macaque:0.01,((Rat:0.01,Mouse:0.01):0.02,Mouse_Lemur:0.02):0.01)'AUTOGENERATED_NAME_nM':1e-06);",
'model': 'F81',
'lnL': -6402.5569169915125}}