Building alignments

Using a cogent3 progressive aligner for nucleotides

We load a canned nucleotide substitution model and the progressive aligner TreeAlign function.

from cogent3 import load_unaligned_seqs, make_tree
from cogent3.align.progressive import TreeAlign

We first align without providing a guide tree. The TreeAlign algorithm builds pairwise alignments and estimates the substitution model parameters and pairwise distances. The distances are used to build a neighbour joining tree and the median value of substitution model parameters are provided to the substitution model for the progressive alignment step.

seqs = load_unaligned_seqs("data/test2.fasta", moltype="dna")
aln, tree = TreeAlign("HKY85", seqs, show_progress=False)
aln
0
DogFacedGCAAGGAGCCAGCAGAACAGATGGGTTGAAACTAAGGAAACATGTAATGATAGGCAGACT
HowlerMon...........A..T..........C.....G.G..........................
Mouse...GT...........G........C..C..G...A.G.........C..C......GT.
NineBande-.-----....A....G........C.....G............................
Human...........A..T..........C..G..G.......................G....

5 x 60 dna alignment

We then align using a guide tree (pre-estimated) and specifying the ratio of transitions to transversions (kappa).

tree = make_tree(
    "(((NineBande:0.013,Mouse:0.185):0.023,DogFaced:0.046):0.027,Human:0.034,HowlerMon:0.019)"
)
params = {"kappa": 4.0}
aln, tree = TreeAlign(
    "HKY85", seqs, tree=tree, param_vals=params, show_progress=False
)
aln
0
DogFacedGCAAGGAGCCAGCAGAACAGATGGGTTGAAACTAAGGAAACATGTAATGATAGGCAGACT
NineBande-.-----....A....G........C.....G............................
Mouse...GT...........G........C..C..G...A.G.........C..C......GT.
Human...........A..T..........C..G..G.......................G....
HowlerMon...........A..T..........C.....G.G..........................

5 x 60 dna alignment

Using a cogent3 progressive aligner for codons

We load a canned codon substitution model and use a pre-defined tree and parameter estimates.

from cogent3 import load_unaligned_seqs, make_tree
from cogent3.align.progressive import TreeAlign

seqs = load_unaligned_seqs("data/test2.fasta", moltype="dna")
tree = make_tree(
    "((NineBande:0.058,Mouse:0.595):0.079,DogFaced:0.142,(HowlerMon:0.062,Human:0.103):0.079)"
)
params = {"kappa": 4.0, "omega": 1.3}
aln, tree = TreeAlign(
    "MG94HKY", seqs, tree=tree, param_vals=params, show_progress=False
)
aln
0
DogFacedGCAAGGAGCCAGCAGAACAGATGGGTTGAAACTAAGGAAACATGTAATGATAGGCAGACT
NineBande------C....A....G........C.....G............................
Mouse...GT...........G........C..C..G...A.G.........C..C......GT.
HowlerMon...........A..T..........C.....G.G..........................
Human...........A..T..........C..G..G.......................G....

5 x 60 dna alignment

Converting gaps from aa-seq alignment to nuc seq alignment

We load some unaligned DNA sequences and show their translation.

from cogent3 import make_unaligned_seqs
from cogent3.evolve.models import get_model
from cogent3.align.progressive import TreeAlign

seqs = [
    (
        "hum",
        "AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCCCAGGTCAACCTCACT",
    ),
    (
        "mus",
        "AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCCCAAGTCAACCTCACG",
    ),
    ("rat", "CTGAACAAGCAGCCACTTTCAAACAAGAAA"),
]
unaligned_DNA = make_unaligned_seqs(seqs, moltype="dna")
print(unaligned_DNA)
>hum
AAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCC
CAGGTCAACCTCACT
>mus
AAGCAGATCCAGGAGAGCGGCGAGAGCGGCAGCCTGGCCGCGCGGCAGGAGAGGCAGGCC
CAAGTCAACCTCACG
>rat
CTGAACAAGCAGCCACTTTCAAACAAGAAA

print(unaligned_DNA.get_translation())
>hum
KQIQESSENGSLAARQERQAQVNLT
>mus
KQIQESGESGSLAARQERQAQVNLT
>rat
LNKQPLSNKK

We load an alignment of these protein sequences.

from cogent3 import make_aligned_seqs

aligned_aa_seqs = [
    ("hum", "KQIQESSENGSLAARQERQAQVNLT"),
    ("mus", "KQIQESGESGSLAARQERQAQVNLT"),
    ("rat", "LNKQ------PLS---------NKK"),
]
aligned_aa = make_aligned_seqs(aligned_aa_seqs, moltype="protein")

We then obtain an alignment of the DNA sequences from the alignment of their translation.

aligned_DNA = aligned_aa.replace_seqs(unaligned_DNA, aa_to_codon=True)
aligned_DNA
0
humAAGCAGATCCAGGAAAGCAGCGAGAATGGCAGCCTGGCCGCGCGCCAGGAGAGGCAGGCC
mus..............G...G......GC.................G...............
ratCT.A.C.AG...------------------CCA..TT.A---------------------

3 x 75 (truncated to 3 x 60) dna alignment