Trees#
Loading a tree from a file and visualizing it with ascii_art()
#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
print(tr.ascii_art())
/-Human
/edge.0--|
/edge.1--| \-HowlerMon
| |
| \-Mouse
-root----|
|--NineBande
|
\-DogFaced
Note
See the Phylogenetic Trees for interactive graphical display of dendrograms.
Writing a tree to a file#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
tr.write("data/temp.tree")
Getting a dict
nodes keyed by their name#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
names_nodes = tr.get_nodes_dict()
names_nodes["Human"]
Tree("Human")
Getting the name of a node#
The root node name defaults to "root"
.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
tr.name
'root'
hu = tr.get_node_matching_name("Human")
hu.name
'Human'
You can ensure internal nodes get named
tr.name_unnamed_nodes()
The object type of a tree and its nodes is the same#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
type(tr)
cogent3.core.tree.PhyloNode
nodes = tr.get_nodes_dict()
hu = tr.get_node_matching_name("Human")
type(hu)
cogent3.core.tree.PhyloNode
Working with the nodes of a tree#
Get all the nodes, tips and edges as a dict
.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
nodes = tr.get_nodes_dict()
for n in nodes.items():
print(n)
('root', Tree("(((Human,HowlerMon),Mouse),NineBande,DogFaced);"))
('edge.1', Tree("((Human,HowlerMon),Mouse)"))
('edge.0', Tree("(Human,HowlerMon)"))
('Human', Tree("Human"))
('HowlerMon', Tree("HowlerMon"))
('Mouse', Tree("Mouse"))
('NineBande', Tree("NineBande"))
('DogFaced', Tree("DogFaced"))
As a list.
nodes = tr.get_edge_vector()
Only the tip (terminal) nodes as a list.
tips = tr.tips()
Iterate the tip nodes.
for n in tr.iter_tips():
print(n.name)
Human
HowlerMon
Mouse
NineBande
DogFaced
Get just the internal nodes as a list
non_tips = tr.nontips()
or iteratively.
for n in tr.iter_nontips():
print(n.name)
edge.1
edge.0
Getting the path between two tips or edges (connecting nodes)#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
edges = tr.get_connecting_edges("edge.1", "Human")
for edge in edges:
print(edge.name)
edge.1
edge.0
Human
Get tip-to-root distances#
The sum of all lengths on nodes connecting tips to the root node.
from cogent3 import make_tree
tr = make_tree("(B:3,(C:2,D:4):5);")
tr.tip_to_root_distances()
{'B': 3.0, 'C': 7.0, 'D': 9.0}
Can also be done for a subset of tips.
tr.tip_to_root_distances(names=["B", "D"])
{'B': 3.0, 'D': 9.0}
Get tip-to-tip distances#
Get a distance matrix between all pairs of tips and a list of the tip nodes.
from cogent3 import make_tree
tr = make_tree("(B:3,(C:2,D:4)F:5)G;")
dmat = tr.tip_to_tip_distances()
dmat
names | B | C | D |
---|---|---|---|
B | 0.0000 | 10.0000 | 12.0000 |
C | 10.0000 | 0.0000 | 6.0000 |
D | 12.0000 | 6.0000 | 0.0000 |
Note
tip_to_tip_distances()
is an alias for get_distances()
.
Getting the distance between two nodes#
Via pairwise distances, which returns a DistanceMatrix
instance.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
dists = tr.get_distances(names=["Human", "Mouse"])
dists
names | Mouse | Human |
---|---|---|
Mouse | 0.0000 | 0.3468 |
Human | 0.3468 | 0.0000 |
Or directly between the node objects.
tr = load_tree("data/test.tree")
nodes = tr.get_nodes_dict()
hu = nodes["Human"]
mu = nodes["Mouse"]
hu.distance(mu)
0.3467553610937
Get sum of all branch lengths#
from cogent3 import make_tree
tr = make_tree("(B:3,(C:2,D:4)F:5)G;")
tr.total_length()
14.0
Getting the last common ancestor (LCA) for two nodes#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
nodes = tr.get_nodes_dict()
hu = nodes["Human"]
mu = nodes["Mouse"]
lca = hu.last_common_ancestor(mu)
lca.name, lca
('edge.1', Tree("((Human,HowlerMon),Mouse)"))
Getting all the ancestors for a node#
A list of all nodes to the tree root.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
hu = tr.get_node_matching_name("Human")
for a in hu.ancestors():
print(a.name)
edge.0
edge.1
root
Getting all the children for a node#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
node = tr.get_node_matching_name("edge.1")
children = list(node.iter_tips()) + list(node.iter_nontips())
for child in children:
print(child.name)
Human
HowlerMon
Mouse
edge.0
Getting all the distances for a tree#
On a TreeNode
, each branh has a weight of 1 so the distances represent the number of connected nodes. On a PhyloNode
the measure is the sum of branch lengths.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
dists = tr.get_distances()
dists
names | Human | HowlerMon | Mouse | NineBande | DogFaced |
---|---|---|---|---|---|
Human | 0.0000 | 0.0727 | 0.3468 | 0.1831 | 0.2023 |
HowlerMon | 0.0727 | 0.0000 | 0.3572 | 0.1936 | 0.2128 |
Mouse | 0.3468 | 0.3572 | 0.0000 | 0.3911 | 0.4103 |
NineBande | 0.1831 | 0.1936 | 0.3911 | 0.0000 | 0.2072 |
DogFaced | 0.2023 | 0.2128 | 0.4103 | 0.2072 | 0.0000 |
Getting the two nodes that are farthest apart#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
tr.max_tip_tip_distance()
(np.float64(0.4102925130849), ('Mouse', 'DogFaced'))
Get the nodes within a given distance#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
hu = tr.get_node_matching_name("Human")
tips = hu.tips_within_distance(0.2)
for t in tips:
print(t)
HowlerMon:0.0415847131449
NineBande:0.0939768158209
Rerooting trees#
Reorienting a tree at a named node#
The method name is a bit misleading. If tr
is an unrooted tree (loosely, this is a tree whose root node has > 2 children) then the result is more a re-orientation of the tree rather than true root.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
print(tr.rooted_at("edge.0").ascii_art())
/-Human
|
-root----|--HowlerMon
|
| /-Mouse
\edge.0--|
| /-NineBande
\edge.1--|
\-DogFaced
At the midpoint#
This does produce a rooted tree.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
print(tr.root_at_midpoint().ascii_art())
/-Mouse
|
-root----| /-Human
| /edge.0--|
| | \-HowlerMon
\Mouse-root
| /-NineBande
\edge.1--|
\-DogFaced
Root at a named edge#
The edge can be either a tip or an internal node.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
print(tr.ascii_art())
/-Human
/edge.0--|
/edge.1--| \-HowlerMon
| |
| \-Mouse
-root----|
|--NineBande
|
\-DogFaced
print(tr.rooted("Mouse").ascii_art())
/-Mouse
|
-root----| /-Human
| /edge.0--|
| | \-HowlerMon
\Mouse-root
| /-NineBande
\edge.1--|
\-DogFaced
Tree representations#
Newick format#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
tr.get_newick()
'(((Human,HowlerMon),Mouse),NineBande,DogFaced);'
tr.get_newick(with_distances=True)
'(((Human:0.0311054096183,HowlerMon:0.0415847131449):0.0382963424874,Mouse:0.277353608988):0.0197278502379,NineBande:0.0939768158209,DogFaced:0.113211053859);'
tr.get_newick(with_distances=True, with_node_names=True)
'(((Human:0.0311054096183,HowlerMon:0.0415847131449)edge.0:0.0382963424874,Mouse:0.277353608988)edge.1:0.0197278502379,NineBande:0.0939768158209,DogFaced:0.113211053859);'
Tree traversal#
Here is the example tree for reference:
from cogent3 import load_tree
tr = load_tree("data/test.tree")
print(tr.ascii_art())
/-Human
/edge.0--|
/edge.1--| \-HowlerMon
| |
| \-Mouse
-root----|
|--NineBande
|
\-DogFaced
Preorder#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
for t in tr.preorder():
print(t.get_newick())
(((Human,HowlerMon),Mouse),NineBande,DogFaced);
((Human,HowlerMon),Mouse)
(Human,HowlerMon)
Human
HowlerMon
Mouse
NineBande
DogFaced
Postorder#
from cogent3 import load_tree
tr = load_tree("data/test.tree")
for t in tr.postorder():
print(t.get_newick())
Human
HowlerMon
(Human,HowlerMon)
Mouse
((Human,HowlerMon),Mouse)
NineBande
DogFaced
(((Human,HowlerMon),Mouse),NineBande,DogFaced);
Selecting subtrees#
from cogent3 import make_tree
tr = make_tree("((a,b),((c,d),(e,f),(g,h)));")
print(tr.ascii_art(show_internal=False))
/-a
/--------|
| \-b
|
| /-c
---------| /--------|
| | \-d
| |
| | /-e
\--------|---------|
| \-f
|
| /-g
\--------|
\-h
Provide the names of nodes you want the subtree for. The default behaviour is to force the subtree to have the same number of children at the root as the original tree, in this case 2.
subtree = tr.get_sub_tree(["c", "e", "g"])
print(subtree.ascii_art(show_internal=False))
/-c
---------|
| /-e
\--------|
\-g
Use the as_rooted
argument to ensure the selected subtree topology is as it existed on the original tree.
subtree = tr.get_sub_tree(["c", "e", "g"], as_rooted=True)
print(subtree.ascii_art(show_internal=False))
/-c
|
---------|--e
|
\-g
Tree manipulation methods#
Pruning the tree#
Remove internal nodes with only one child. Create new connections and branch lengths (if tree is a PhyloNode) to reflect the change.
from cogent3 import make_tree
simple_tree = make_tree("(B:0.2,(D:0.4)E:0.5);")
print(simple_tree.ascii_art())
/-B
-root----|
\E------- /-D
The prune()
modifies the tree in place.
simple_tree.prune()
print(simple_tree.ascii_art())
/-B
-root----|
\-D
Create a full unrooted copy of the tree#
from cogent3 import load_tree
tr1 = load_tree("data/test.tree")
print(tr1.get_newick())
(((Human,HowlerMon),Mouse),NineBande,DogFaced);
tr2 = tr1.unrooted_deepcopy()
print(tr2.get_newick())
(((Human,HowlerMon),Mouse),NineBande,DogFaced);
Transform tree into a bifurcating tree#
Add internal nodes so that every node has 2 or fewer children.
from cogent3 import make_tree
tree_string = "(B:0.2,H:0.2,(C:0.3,D:0.4,E:0.1)F:0.5)G;"
tr = make_tree(tree_string)
print(tr.ascii_art())
/-B
|
|--H
-G-------|
| /-C
| |
\F-------|--D
|
\-E
print(tr.bifurcating().ascii_art())
/-B
-G-------|
| /-H
\--------|
| /-C
\F-------|
| /-D
\--------|
\-E
Transform tree into a balanced tree#
Using a balanced tree can substantially improve performance of likelihood calculations for time-reversible models. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterisation should be done using the “outgroup_name” argument.
from cogent3 import load_tree
tr = load_tree("data/test.tree")
print(tr.ascii_art())
/-Human
/edge.0--|
/edge.1--| \-HowlerMon
| |
| \-Mouse
-root----|
|--NineBande
|
\-DogFaced
print(tr.balanced().ascii_art())
/-Human
/edge.0--|
| \-HowlerMon
|
-root----|--Mouse
|
| /-NineBande
\edge.1--|
\-DogFaced
Test two trees for same topology#
Branch lengths don’t matter.
from cogent3 import make_tree
tr1 = make_tree("(B:0.2,(C:0.2,D:0.2)F:0.2)G;")
tr2 = make_tree("((C:0.1,D:0.1)F:0.1,B:0.1)G;")
tr1.same_topology(tr2)
True
Measure topological distances between two trees#
A number of topological tree distance metrics are available. They include:
The Robinson-Foulds Distance for rooted trees.
The Matching Cluster Distance for rooted trees.
The Robinson-Foulds Distance for unrooted trees.
The Lin-Rajan-Moret Distance for unrooted trees.
There are several variations of the Robinson-Foulds metric in the literature. The definition used by cogent3
is the
cardinality of the symmetric difference of the sets of clades/splits in the two rooted/unrooted trees. Other definitions sometimes
divide this by two, or normalise it to the unit interval.
The Robinson-Foulds distance is quick to compute, but is known to saturate quickly. Moving a single leaf in a tree can maximise this metric.
The Matching Cluster and Lin-Rajan-Moret are two matching-based distances that are more statistically robust. Unlike the Robinson-Foulds distance which counts how many of the splits/clades are not exactly same, the matching-based distances measures the degree by which the splits/clades are different. The matching-based distances solve a min-weight matching problem, which for large trees may take longer to compute.
# Distance metrics for rooted trees
from cogent3 import make_tree
tr1 = make_tree(treestring="(a,(b,(c,(d,e))));")
tr2 = make_tree(treestring="(e,(d,(c,(b,a))));")
mc_distance = tr1.tree_distance(tr2, method="matching_cluster") # or method="mc" or method="matching"
rooted_rf_distance = tr1.tree_distance(tr2, method="rooted_robinson_foulds") # or method="rrf" or method="rf"
print("Matching Cluster Distance:", mc_distance)
print("Rooted Robinson Foulds Distance:", rooted_rf_distance)
Matching Cluster Distance: 10
Rooted Robinson Foulds Distance: 6
# Distance metrics for unrooted trees
from cogent3 import make_tree
tr1 = make_tree(treestring="(a,b,(c,(d,e)));")
tr2 = make_tree(treestring="((a,c),(b,d),e);")
lrm_distance = tr1.tree_distance(tr2, method="lin_rajan_moret") # or method="lrm" or method="matching"
unrooted_rf_distance = tr1.tree_distance(tr2, method="unrooted_robinson_foulds") # or method="urf" or method="rf"
print("Lin-Rajan-Moret Distance:", lrm_distance)
print("Unrooted Robinson Foulds Distance:", unrooted_rf_distance)
Lin-Rajan-Moret Distance: 3
Unrooted Robinson Foulds Distance: 4