Trees#

Loading a tree from a file and visualizing it with ascii_art()#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
print(tr.ascii_art())
                              /-Human
                    /edge.0--|
          /edge.1--|          \-HowlerMon
         |         |
         |          \-Mouse
-root----|
         |--NineBande
         |
          \-DogFaced

Note

See the Phylogenetic Trees for interactive graphical display of dendrograms.

Writing a tree to a file#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
tr.write("data/temp.tree")

Getting a dict nodes keyed by their name#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
names_nodes = tr.get_nodes_dict()
names_nodes["Human"]
Tree("Human")

Getting the name of a node#

The root node name defaults to "root".

from cogent3 import load_tree

tr = load_tree("data/test.tree")
tr.name
'root'
hu = tr.get_node_matching_name("Human")
hu.name
'Human'

You can ensure internal nodes get named

tr.name_unnamed_nodes()

The object type of a tree and its nodes is the same#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
type(tr)
cogent3.core.tree.PhyloNode
nodes = tr.get_nodes_dict()
hu = tr.get_node_matching_name("Human")
type(hu)
cogent3.core.tree.PhyloNode

Working with the nodes of a tree#

Get all the nodes, tips and edges as a dict.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
nodes = tr.get_nodes_dict()
for n in nodes.items():
    print(n)
('root', Tree("(((Human,HowlerMon),Mouse),NineBande,DogFaced);"))
('edge.1', Tree("((Human,HowlerMon),Mouse)"))
('edge.0', Tree("(Human,HowlerMon)"))
('Human', Tree("Human"))
('HowlerMon', Tree("HowlerMon"))
('Mouse', Tree("Mouse"))
('NineBande', Tree("NineBande"))
('DogFaced', Tree("DogFaced"))

As a list.

nodes = tr.get_edge_vector()

Only the tip (terminal) nodes as a list.

tips = tr.tips()

Iterate the tip nodes.

for n in tr.iter_tips():
    print(n.name)
Human
HowlerMon
Mouse
NineBande
DogFaced

Get just the internal nodes as a list

non_tips = tr.nontips()

or iteratively.

for n in tr.iter_nontips():
    print(n.name)
edge.1
edge.0

Getting the path between two tips or edges (connecting nodes)#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
edges = tr.get_connecting_edges("edge.1", "Human")
for edge in edges:
    print(edge.name)
edge.1
edge.0
Human

Get tip-to-root distances#

The sum of all lengths on nodes connecting tips to the root node.

from cogent3 import make_tree

tr = make_tree("(B:3,(C:2,D:4):5);")
tr.tip_to_root_distances()
{'B': 3.0, 'C': 7.0, 'D': 9.0}

Can also be done for a subset of tips.

tr.tip_to_root_distances(names=["B", "D"])
{'B': 3.0, 'D': 9.0}

Get tip-to-tip distances#

Get a distance matrix between all pairs of tips and a list of the tip nodes.

from cogent3 import make_tree

tr = make_tree("(B:3,(C:2,D:4)F:5)G;")
dmat = tr.tip_to_tip_distances()
dmat
namesBCD
B0.000010.000012.0000
C10.00000.00006.0000
D12.00006.00000.0000

Note

tip_to_tip_distances() is an alias for get_distances().

Getting the distance between two nodes#

Via pairwise distances, which returns a DistanceMatrix instance.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
dists = tr.get_distances(names=["Human", "Mouse"])
dists
namesMouseHuman
Mouse0.00000.3468
Human0.34680.0000

Or directly between the node objects.

tr = load_tree("data/test.tree")
nodes = tr.get_nodes_dict()
hu = nodes["Human"]
mu = nodes["Mouse"]
hu.distance(mu)
0.3467553610937

Get sum of all branch lengths#

from cogent3 import make_tree

tr = make_tree("(B:3,(C:2,D:4)F:5)G;")
tr.total_length()
14.0

Getting the last common ancestor (LCA) for two nodes#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
nodes = tr.get_nodes_dict()
hu = nodes["Human"]
mu = nodes["Mouse"]
lca = hu.last_common_ancestor(mu)
lca.name, lca
('edge.1', Tree("((Human,HowlerMon),Mouse)"))

Getting all the ancestors for a node#

A list of all nodes to the tree root.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
hu = tr.get_node_matching_name("Human")
for a in hu.ancestors():
    print(a.name)
edge.0
edge.1
root

Getting all the children for a node#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
node = tr.get_node_matching_name("edge.1")
children = list(node.iter_tips()) + list(node.iter_nontips())
for child in children:
    print(child.name)
Human
HowlerMon
Mouse
edge.0

Getting all the distances for a tree#

On a TreeNode, each branh has a weight of 1 so the distances represent the number of connected nodes. On a PhyloNode the measure is the sum of branch lengths.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
dists = tr.get_distances()
dists
namesHumanHowlerMonMouseNineBandeDogFaced
Human0.00000.07270.34680.18310.2023
HowlerMon0.07270.00000.35720.19360.2128
Mouse0.34680.35720.00000.39110.4103
NineBande0.18310.19360.39110.00000.2072
DogFaced0.20230.21280.41030.20720.0000

Getting the two nodes that are farthest apart#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
tr.max_tip_tip_distance()
(np.float64(0.4102925130849), ('Mouse', 'DogFaced'))

Get the nodes within a given distance#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
hu = tr.get_node_matching_name("Human")
tips = hu.tips_within_distance(0.2)
for t in tips:
    print(t)
HowlerMon:0.0415847131449
NineBande:0.0939768158209

Rerooting trees#

Reorienting a tree at a named node#

The method name is a bit misleading. If tr is an unrooted tree (loosely, this is a tree whose root node has > 2 children) then the result is more a re-orientation of the tree rather than true root.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
print(tr.rooted_at("edge.0").ascii_art())
          /-Human
         |
-root----|--HowlerMon
         |
         |          /-Mouse
          \edge.0--|
                   |          /-NineBande
                    \edge.1--|
                              \-DogFaced

At the midpoint#

This does produce a rooted tree.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
print(tr.root_at_midpoint().ascii_art())
          /-Mouse
         |
-root----|                    /-Human
         |          /edge.0--|
         |         |          \-HowlerMon
          \Mouse-root
                   |          /-NineBande
                    \edge.1--|
                              \-DogFaced

Root at a named edge#

The edge can be either a tip or an internal node.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
print(tr.ascii_art())
                              /-Human
                    /edge.0--|
          /edge.1--|          \-HowlerMon
         |         |
         |          \-Mouse
-root----|
         |--NineBande
         |
          \-DogFaced
print(tr.rooted("Mouse").ascii_art())
          /-Mouse
         |
-root----|                    /-Human
         |          /edge.0--|
         |         |          \-HowlerMon
          \Mouse-root
                   |          /-NineBande
                    \edge.1--|
                              \-DogFaced

Tree representations#

Newick format#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
tr.get_newick()
'(((Human,HowlerMon),Mouse),NineBande,DogFaced);'
tr.get_newick(with_distances=True)
'(((Human:0.0311054096183,HowlerMon:0.0415847131449):0.0382963424874,Mouse:0.277353608988):0.0197278502379,NineBande:0.0939768158209,DogFaced:0.113211053859);'
tr.get_newick(with_distances=True, with_node_names=True)
'(((Human:0.0311054096183,HowlerMon:0.0415847131449)edge.0:0.0382963424874,Mouse:0.277353608988)edge.1:0.0197278502379,NineBande:0.0939768158209,DogFaced:0.113211053859);'

Tree traversal#

Here is the example tree for reference:

from cogent3 import load_tree

tr = load_tree("data/test.tree")
print(tr.ascii_art())
                              /-Human
                    /edge.0--|
          /edge.1--|          \-HowlerMon
         |         |
         |          \-Mouse
-root----|
         |--NineBande
         |
          \-DogFaced

Preorder#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
for t in tr.preorder():
    print(t.get_newick())
(((Human,HowlerMon),Mouse),NineBande,DogFaced);
((Human,HowlerMon),Mouse)
(Human,HowlerMon)
Human
HowlerMon
Mouse
NineBande
DogFaced

Postorder#

from cogent3 import load_tree

tr = load_tree("data/test.tree")
for t in tr.postorder():
    print(t.get_newick())
Human
HowlerMon
(Human,HowlerMon)
Mouse
((Human,HowlerMon),Mouse)
NineBande
DogFaced
(((Human,HowlerMon),Mouse),NineBande,DogFaced);

Selecting subtrees#

from cogent3 import make_tree

tr = make_tree("((a,b),((c,d),(e,f),(g,h)));")
print(tr.ascii_art(show_internal=False))
                    /-a
          /--------|
         |          \-b
         |
         |                    /-c
---------|          /--------|
         |         |          \-d
         |         |
         |         |          /-e
          \--------|---------|
                   |          \-f
                   |
                   |          /-g
                    \--------|
                              \-h

Provide the names of nodes you want the subtree for. The default behaviour is to force the subtree to have the same number of children at the root as the original tree, in this case 2.

subtree = tr.get_sub_tree(["c", "e", "g"])
print(subtree.ascii_art(show_internal=False))
          /-c
---------|
         |          /-e
          \--------|
                    \-g

Use the as_rooted argument to ensure the selected subtree topology is as it existed on the original tree.

subtree = tr.get_sub_tree(["c", "e", "g"], as_rooted=True)
print(subtree.ascii_art(show_internal=False))
          /-c
         |
---------|--e
         |
          \-g

Tree manipulation methods#

Pruning the tree#

Remove internal nodes with only one child. Create new connections and branch lengths (if tree is a PhyloNode) to reflect the change.

from cogent3 import make_tree

simple_tree = make_tree("(B:0.2,(D:0.4)E:0.5);")
print(simple_tree.ascii_art())
          /-B
-root----|
          \E------- /-D

The prune() modifies the tree in place.

simple_tree.prune()
print(simple_tree.ascii_art())
          /-B
-root----|
          \-D

Create a full unrooted copy of the tree#

from cogent3 import load_tree

tr1 = load_tree("data/test.tree")
print(tr1.get_newick())
(((Human,HowlerMon),Mouse),NineBande,DogFaced);
tr2 = tr1.unrooted_deepcopy()
print(tr2.get_newick())
(((Human,HowlerMon),Mouse),NineBande,DogFaced);

Transform tree into a bifurcating tree#

Add internal nodes so that every node has 2 or fewer children.

from cogent3 import make_tree

tree_string = "(B:0.2,H:0.2,(C:0.3,D:0.4,E:0.1)F:0.5)G;"
tr = make_tree(tree_string)
print(tr.ascii_art())
          /-B
         |
         |--H
-G-------|
         |          /-C
         |         |
          \F-------|--D
                   |
                    \-E
print(tr.bifurcating().ascii_art())
          /-B
-G-------|
         |          /-H
          \--------|
                   |          /-C
                    \F-------|
                             |          /-D
                              \--------|
                                        \-E

Transform tree into a balanced tree#

Using a balanced tree can substantially improve performance of likelihood calculations for time-reversible models. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterisation should be done using the “outgroup_name” argument.

from cogent3 import load_tree

tr = load_tree("data/test.tree")
print(tr.ascii_art())
                              /-Human
                    /edge.0--|
          /edge.1--|          \-HowlerMon
         |         |
         |          \-Mouse
-root----|
         |--NineBande
         |
          \-DogFaced
print(tr.balanced().ascii_art())
                    /-Human
          /edge.0--|
         |          \-HowlerMon
         |
-root----|--Mouse
         |
         |          /-NineBande
          \edge.1--|
                    \-DogFaced

Test two trees for same topology#

Branch lengths don’t matter.

from cogent3 import make_tree

tr1 = make_tree("(B:0.2,(C:0.2,D:0.2)F:0.2)G;")
tr2 = make_tree("((C:0.1,D:0.1)F:0.1,B:0.1)G;")
tr1.same_topology(tr2)
True

Measure topological distances between two trees#

A number of topological tree distance metrics are available. They include:

  • The Robinson-Foulds Distance for rooted trees.

  • The Matching Cluster Distance for rooted trees.

  • The Robinson-Foulds Distance for unrooted trees.

  • The Lin-Rajan-Moret Distance for unrooted trees.

There are several variations of the Robinson-Foulds metric in the literature. The definition used by cogent3 is the cardinality of the symmetric difference of the sets of clades/splits in the two rooted/unrooted trees. Other definitions sometimes divide this by two, or normalise it to the unit interval.

The Robinson-Foulds distance is quick to compute, but is known to saturate quickly. Moving a single leaf in a tree can maximise this metric.

The Matching Cluster and Lin-Rajan-Moret are two matching-based distances that are more statistically robust. Unlike the Robinson-Foulds distance which counts how many of the splits/clades are not exactly same, the matching-based distances measures the degree by which the splits/clades are different. The matching-based distances solve a min-weight matching problem, which for large trees may take longer to compute.

# Distance metrics for rooted trees
from cogent3 import make_tree

tr1 = make_tree(treestring="(a,(b,(c,(d,e))));")
tr2 = make_tree(treestring="(e,(d,(c,(b,a))));")

mc_distance = tr1.tree_distance(tr2, method="matching_cluster") # or method="mc" or method="matching"
rooted_rf_distance = tr1.tree_distance(tr2, method="rooted_robinson_foulds") # or method="rrf" or method="rf"

print("Matching Cluster Distance:", mc_distance)
print("Rooted Robinson Foulds Distance:", rooted_rf_distance)
Matching Cluster Distance: 10
Rooted Robinson Foulds Distance: 6
# Distance metrics for unrooted trees
from cogent3 import make_tree

tr1 = make_tree(treestring="(a,b,(c,(d,e)));")
tr2 = make_tree(treestring="((a,c),(b,d),e);")

lrm_distance = tr1.tree_distance(tr2, method="lin_rajan_moret") # or method="lrm" or method="matching"
unrooted_rf_distance = tr1.tree_distance(tr2, method="unrooted_robinson_foulds") # or method="urf" or method="rf"

print("Lin-Rajan-Moret Distance:", lrm_distance)
print("Unrooted Robinson Foulds Distance:", unrooted_rf_distance)
Lin-Rajan-Moret Distance: 3
Unrooted Robinson Foulds Distance: 4