Building phylogenies

Building A Phylogenetic Tree From Pairwise Distances

Directly via alignment.quick_tree()

Both the ArrayAlignment and Alignment classes support this.

from cogent3 import load_aligned_seqs

aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
tree = aln.quick_tree(calc="TN93", show_progress=False)
tree = tree.balanced()  # purely for display
print(tree.ascii_art())
                              /-Galago
                    /edge.0--|
          /edge.1--|          \-HowlerMon
         |         |
         |          \-Rhesus
-root----|
         |--Orangutan
         |
         |          /-Chimpanzee
          \edge.2--|
                   |          /-Human
                    \edge.3--|
                              \-Gorilla

The quick_tree() method also supports non-parametric bootstrapping. The number of resampled alignments is specified using the bootstrap argument. In the following, trees are estimated from 100 resampled alignments and merged into a single consensus topology using a weighted consensus tree algorithm.

tree = aln.quick_tree(calc="TN93", bootstrap=100, show_progress=False)

Using the DistanceMatrix object

from cogent3 import load_aligned_seqs

aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
dists = aln.distance_matrix(calc="TN93")
tree = dists.quick_tree(show_progress=False)
tree = tree.balanced()  # purely for display
print(tree.ascii_art())
                              /-Galago
                    /edge.0--|
          /edge.1--|          \-HowlerMon
         |         |
         |          \-Rhesus
-root----|
         |--Orangutan
         |
         |          /-Chimpanzee
          \edge.2--|
                   |          /-Human
                    \edge.3--|
                              \-Gorilla

Explicitly via DistanceMatrix and cogent3.phylo.nj.nj()`

from cogent3.phylo import nj
from cogent3 import load_aligned_seqs

aln = load_aligned_seqs("data/primate_brca1.fasta", moltype="dna")
dists = aln.distance_matrix(calc="TN93")
tree = nj.nj(dists, show_progress=False)
tree = tree.balanced()  # purely for display
print(tree.ascii_art())
                              /-Galago
                    /edge.0--|
          /edge.1--|          \-HowlerMon
         |         |
         |          \-Rhesus
-root----|
         |--Orangutan
         |
         |          /-Chimpanzee
          \edge.2--|
                   |          /-Human
                    \edge.3--|
                              \-Gorilla

Directly from a pairwise distance dict

from cogent3.phylo import nj

dists = {("a", "b"): 2.7, ("c", "b"): 2.33, ("c", "a"): 0.73}
tree = nj.nj(dists, show_progress=False)
print(tree.ascii_art())
          /-c
         |
-root----|--b
         |
          \-a

By Least-squares

We illustrate the phylogeny reconstruction by least-squares using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space. Here a is the number of taxa to exhaustively evaluate all possible phylogenies for. Successive taxa are added to the top k trees (measured by the least-squares metric) and k trees are kept at each iteration.

from cogent3.util.deserialise import deserialise_object
from cogent3.phylo.least_squares import WLS

dists = deserialise_object("data/dists_for_phylo.json")
ls = WLS(dists)
stat, tree = ls.trex(a=5, k=5, show_progress=False)

Other optional arguments that can be passed to the trex method are: return_all, whether the k best trees at the final step are returned as a ScoredTreeCollection object; order, a series of tip names whose order defines the sequence in which tips will be added during tree building (this allows the user to randomise the input order).

By ML

We illustrate the phylogeny reconstruction using maximum-likelihood using the F81 substitution model. We use the advanced-stepwise addition algorithm to search tree space.

from cogent3 import load_aligned_seqs
from cogent3.phylo.maximum_likelihood import ML
from cogent3.evolve.models import F81

aln = load_aligned_seqs("data/primate_brca1.fasta")
ml = ML(F81(), aln)