PhyloNode#
- class PhyloNode(*args, **kwargs)#
- Attributes:
- length
parent
Accessor for parent.
Methods
Returns all ancestors back to the root.
append
(i)Appends i to self.children, in-place, cleaning up refs.
ascii_art
([show_internal, compact])Returns a string containing an ascii drawing of the tree.
balanced
()Tree 'rooted' here with no neighbour having > 50% of the edges.
bifurcating
([eps, constructor, name_unnamed])Wrap multifurcating with a num of 2
Returns list containing lists of children sharing a state.
return dict of {<child name>: <parent name>, ...}
compare_by_names
(other)Equality test for trees by name
compare_by_subsets
(other[, exclude_absent_taxa])Returns fraction of overlapping subsets where self and other differ.
compare_by_tip_distances
(other[, sample, ...])Compares self to other using tip-to-tip distance matrices.
compare_name
(other)Compares TreeNode by name
copy
([memo, _nil, constructor])Returns a copy of self using an iterative approach
copy_topology
([constructor])Copies only the topology and labels of a tree, not any extra data.
deepcopy
([memo, _nil, constructor])Returns a copy of self using an iterative approach
descendant_array
([tip_list])Returns numpy array with nodes in rows and descendants in columns.
distance
(other)Returns branch length between self and other.
extend
(items)Extends self.children by items, in-place, cleaning up refs.
get_connecting_edges
(name1, name2)returns a list of edges connecting two nodes.
get_connecting_node
(name1, name2)Finds the last common ancestor of the two named edges.
get_distances
([endpoints])The distance matrix as a dictionary.
get_edge_names
(tip1name, tip2name[, clade, ...])Return the list of stem and/or sub tree (clade) edge name(s).
get_edge_vector
([include_root])Collect the list of edges in postfix order
get_figure
([style])gets Dendrogram for plotting the phylogeny
Returns the max tip-to-tip distance between any pair of tips
get_newick
([with_distances, semicolon, ...])Return the newick string of node and its descendents
get_node_names
([includeself, tipsonly])Return a list of edges from this edge - may or may not include self.
Returns a dict keyed by node name, value is node
get_param_value
(param, edge)returns the parameter value for named edge
get_sub_tree
(name_list[, ignore_missing, ...])A new instance of a sub tree that contains all the otus that are listed in name_list.
get_tip_names
([includeself])return the list of the names of all tips contained by this edge
get_xml
()Return XML formatted tree string.
Returns index of self in parent.
insert
(index, i)Inserts an item at specified position in self.children.
is_root
()Returns True if the current is a root, i.e. has no parent.
is_tip
()Returns True if the current node is a tip, i.e. has no children.
isroot
()Returns True if root of a tree, i.e. no parent.
istip
()Returns True if is tip, i.e. no children.
iter_nontips
([include_self])Iterates over nontips descended from self, [] if none.
iter_tips
([include_self])Iterates over tips descended from self, [] if self is a tip.
last_common_ancestor
(other)Finds last common ancestor of self and other, or None.
lca
(other)Finds last common ancestor of self and other, or None.
levelorder
([include_self])Performs levelorder iteration over tree
lin_rajan_moret
(tree2)return the lin-rajan-moret distance between trees
lowest_common_ancestor
(tipnames)Lowest common ancestor for a list of tipnames
make_tree_array
([dec_list])Makes an array with nodes in rows and descendants in columns.
returns the max distance between any pair of tips
multifurcating
(num[, eps, constructor, ...])return a new tree with every node having num or few children
sets the Data property of unnamed nodes to an arbitrary value
Returns direct children in self that have descendants.
nontips
([include_self])Returns nontips descended from self.
pop
([index])Returns and deletes child of self at index (default: -1)
postorder
([include_self])performs postorder iteration over tree.
pre_and_postorder
([include_self])Performs iteration over tree, visiting node before and after.
preorder
([include_self])Performs preorder iteration over tree.
prune
()Reconstructs correct tree after nodes have been removed.
reassign_names
(mapping[, nodes])Reassigns node names based on a mapping dict
remove
(target)Removes node by name instead of identity.
remove_deleted
(is_deleted)Removes all nodes where is_deleted tests true.
remove_node
(target)Removes node by identity instead of value.
root
()Returns root of the tree self is in.
return a new tree rooted at midpoint of the two tips farthest apart
rooted_at
(edge_name)Return a new tree rooted at the provided node.
rooted_with_tip
(outgroup_name)A new tree with the named tip as one of the root's children
same_shape
(other)Ignores lengths and order, so trees should be sorted first
same_topology
(other)Tests whether two trees have the same topology.
scale_branch_lengths
([max_length, ultrametric])Scales BranchLengths in place to integers for ascii output.
separation
(other)Returns number of edges separating self and other.
Propagate tip distance information up the tree
set_param_value
(param, edge, value)set's the value for param at named edge
Sets distance from each node to the most distant tip.
siblings
()Returns all nodes that are children of the same parent as self.
sorted
([sort_order])An equivalent tree sorted into a standard order.
subset
()Returns set of names that descend from specified node
subsets
()Returns all sets of names that come from specified node and its kids
Returns direct children of self that are tips.
tip_to_tip_distances
([endpoints, default_length])Returns distance matrix between all pairs of tips, and a tip order.
tips
([include_self])Returns tips descended from self, [] if self is a tip.
tips_within_distance
(distance)Returns tips within specified distance from self
to_json
()returns json formatted string {'newick': with edges and distances, 'edge_attributes': }
returns {'newick': with node names, 'edge_attributes': {'tip1': {'length': ...}, ...}}
Returns total descending branch length from self
returns the sum of all branch lengths in tree
traverse
([self_before, self_after, include_self])Returns iterator over descendants.
tree_distance
(other[, method])Return the specified tree distance between this and another tree.
unrooted
()A tree with at least 3 children at the root.
write
(filename[, with_distances, format])Save the tree to filename
get_node_matching_name
unrooted_deepcopy
- ancestors()#
Returns all ancestors back to the root. Dynamically calculated.
- append(i)#
Appends i to self.children, in-place, cleaning up refs.
- ascii_art(show_internal=True, compact=False)#
Returns a string containing an ascii drawing of the tree.
- Parameters:
- show_internal
includes internal edge names.
- compact
use exactly one line per tip.
- balanced()#
Tree ‘rooted’ here with no neighbour having > 50% of the edges.
- Usage:
Using a balanced tree can substantially improve performance of the likelihood calculations. Note that the resulting tree has a different orientation with the effect that specifying clades or stems for model parameterisation should be done using the ‘outgroup_name’ argument.
- bifurcating(eps=None, constructor=None, name_unnamed=False)#
Wrap multifurcating with a num of 2
- child_groups()#
Returns list containing lists of children sharing a state.
In other words, returns runs of tip and nontip children.
- child_parent_map() dict[str, str] #
return dict of {<child name>: <parent name>, …}
- compare_by_names(other)#
Equality test for trees by name
- compare_by_subsets(other, exclude_absent_taxa=False)#
Returns fraction of overlapping subsets where self and other differ.
Other is expected to be a tree object compatible with PhyloNode.
Note: names present in only one of the two trees will count as mismatches: if you don’t want this behavior, strip out the non-matching tips first.
- compare_by_tip_distances(other, sample=None, dist_f=<function distance_from_r>, shuffle_f=<bound method Random.shuffle of <random.Random object>>)#
Compares self to other using tip-to-tip distance matrices.
Value returned is dist_f(m1, m2) for the two matrices. Default is to use the Pearson correlation coefficient, with +1 giving a distance of 0 and -1 giving a distance of +1 (the madimum possible value). Depending on the application, you might instead want to use distance_from_r_squared, which counts correlations of both +1 and -1 as identical (0 distance).
Note: automatically strips out the names that don’t match (this is necessary for this method because the distance between non-matching names and matching names is undefined in the tree where they don’t match, and because we need to reorder the names in the two trees to match up the distance matrices).
- compare_name(other)#
Compares TreeNode by name
- copy(memo=None, _nil=None, constructor='ignored')#
Returns a copy of self using an iterative approach
- copy_topology(constructor=None)#
Copies only the topology and labels of a tree, not any extra data.
Useful when you want another copy of the tree with the same structure and labels, but want to e.g. assign different branch lengths and environments. Does not use deepcopy from the copy module, so _much_ faster than the copy() method.
- deepcopy(memo=None, _nil=None, constructor='ignored')#
Returns a copy of self using an iterative approach
- descendant_array(tip_list=None)#
Returns numpy array with nodes in rows and descendants in columns.
A value of 1 indicates that the decendant is a descendant of that node/ A value of 0 indicates that it is not
Also returns a list of nodes in the same order as they are listed in the array.
tip_list is a list of the names of the tips that will be considered, in the order they will appear as columns in the final array. Internal nodes will appear as rows in preorder traversal order.
- distance(other)#
Returns branch length between self and other.
- extend(items)#
Extends self.children by items, in-place, cleaning up refs.
- get_connecting_edges(name1, name2)#
returns a list of edges connecting two nodes.
If both are tips, the LCA is excluded from the result.
- get_connecting_node(name1, name2)#
Finds the last common ancestor of the two named edges.
- get_distances(endpoints=None)#
The distance matrix as a dictionary.
- Usage:
Grabs the branch lengths (evolutionary distances) as a complete matrix (i.e. a,b and b,a).
- get_edge_names(tip1name, tip2name, clade=True, stem=False, outgroup_name=None)#
Return the list of stem and/or sub tree (clade) edge name(s). This is done by finding the common intersection, and then getting the list of names. If the clade traverses the root, then use the outgroup_name argument to ensure valid specification.
- Parameters:
- tip1/2name
edge 1/2 names
- stem
whether the name of the clade stem edge is returned.
- clade
whether the names of the edges within the clade are returned
- outgroup_name
if provided the calculation is done on a version of the tree re-rooted relative to the provided tip.
- Usage:
The returned list can be used to specify subtrees for special parameterisation. For instance, say you want to allow the primates to have a different value of a particular parameter. In this case, provide the results of this method to the parameter controller method set_param_rule() along with the parameter name etc..
- get_edge_vector(include_root=True)#
Collect the list of edges in postfix order
- Parameters:
- include_root
specifies whether root edge included
- get_figure(style='square', **kwargs)#
gets Dendrogram for plotting the phylogeny
- Parameters:
- stylestring
‘square’, ‘angular’, ‘radial’ or ‘circular’
- kwargs
arguments passed to Dendrogram constructor
- get_max_tip_tip_distance()#
Returns the max tip-to-tip distance between any pair of tips
Returns (dist, tip_names, internal_node)
- get_newick(with_distances=False, semicolon=True, escape_name=True, with_node_names=False)#
Return the newick string of node and its descendents
- Parameters:
- with_distances
include value of node length attribute if present.
- semicolon
end tree string with a semicolon
- escape_name
if any of these characters []’”(), nodes name, wrap the name in single quotes
- with_node_names
includes internal node names (except ‘root’)
- get_node_matching_name(name)#
- get_node_names(includeself=True, tipsonly=False)#
Return a list of edges from this edge - may or may not include self. This node (or first connection) will be the first, and then they will be listed in the natural traverse order.
- Parameters:
- includeselfbool
excludes self.name from the result
- tipsonlybool
only tips returned
- get_nodes_dict()#
Returns a dict keyed by node name, value is node
Will raise TreeError if non-unique names are encountered
- get_param_value(param, edge)#
returns the parameter value for named edge
- get_sub_tree(name_list, ignore_missing=False, keep_root=False, tipsonly=False)#
A new instance of a sub tree that contains all the otus that are listed in name_list.
- Parameters:
- ignore_missing
if False, get_sub_tree will raise a ValueError if name_list contains names that aren’t nodes in the tree
- keep_root
if False, the root of the subtree will be the last common ancestor of all nodes kept in the subtree. Root to tip distance is then (possibly) different from the original tree. If True, the root to tip distance remains constant, but root may only have one child node.
- tipsonly
only tip names matching name_list are allowed
- get_tip_names(includeself=False)#
return the list of the names of all tips contained by this edge
- get_xml()#
Return XML formatted tree string.
- index_in_parent()#
Returns index of self in parent.
- insert(index, i)#
Inserts an item at specified position in self.children.
- is_root()#
Returns True if the current is a root, i.e. has no parent.
- is_tip()#
Returns True if the current node is a tip, i.e. has no children.
- isroot()#
Returns True if root of a tree, i.e. no parent.
- istip()#
Returns True if is tip, i.e. no children.
- iter_nontips(include_self=False)#
Iterates over nontips descended from self, [] if none.
include_self, if True (default is False), will return the current node as part of the list of nontips if it is a nontip.
- iter_tips(include_self=False)#
Iterates over tips descended from self, [] if self is a tip.
- last_common_ancestor(other)#
Finds last common ancestor of self and other, or None.
Always tests by identity.
- lca(other)#
Finds last common ancestor of self and other, or None.
Always tests by identity.
- property length#
- levelorder(include_self=True)#
Performs levelorder iteration over tree
- lin_rajan_moret(tree2) int #
return the lin-rajan-moret distance between trees
- float
the Lin-Rajan-Moret distance
Notes
This is a distance measure that exhibits superior statistical properties compared to Robinson-Foulds. It can only be applied to unrooted trees.
see: Lin et al. 2012 A Metric for Phylogenetic Trees Based on Matching IEEE/ACM Transactions on Computational Biology and Bioinformatics vol. 9, no. 4, pp. 1014-1022, July-Aug. 2012
- lowest_common_ancestor(tipnames)#
Lowest common ancestor for a list of tipnames
This should be around O(H sqrt(n)), where H is height and n is the number of tips passed in.
- make_tree_array(dec_list=None)#
Makes an array with nodes in rows and descendants in columns.
A value of 1 indicates that the decendant is a descendant of that node/ A value of 0 indicates that it is not
also returns a list of nodes in the same order as they are listed in the array
- max_tip_tip_distance()#
returns the max distance between any pair of tips
Also returns the tip names that it is between as a tuple
- multifurcating(num, eps=None, constructor=None, name_unnamed=False)#
return a new tree with every node having num or few children
- Parameters:
- numint
the number of children a node can have max
- epsfloat
default branch length to set if self or constructor is of PhyloNode type
- constructor
a TreeNode or subclass constructor. If None, uses self
- name_unnamedbool
names unnamed nodes
- name_unnamed_nodes()#
sets the Data property of unnamed nodes to an arbitrary value
Internal nodes are often unnamed and so this function assigns a value for referencing.
- non_tip_children()#
Returns direct children in self that have descendants.
- nontips(include_self=False)#
Returns nontips descended from self.
- property parent#
Accessor for parent.
If using an algorithm that accesses parent a lot, it will be much faster to access self._parent directly, but don’t do it if mutating self._parent! (or, if you must, remember to clean up the refs).
- pop(index=-1)#
Returns and deletes child of self at index (default: -1)
- postorder(include_self=True)#
performs postorder iteration over tree.
Notes
This is somewhat inelegant compared to saving the node and its index on the stack, but is 30% faster in the average case and 3x faster in the worst case (for a comb tree).
- pre_and_postorder(include_self=True)#
Performs iteration over tree, visiting node before and after.
- preorder(include_self=True)#
Performs preorder iteration over tree.
- prune()#
Reconstructs correct tree after nodes have been removed.
Internal nodes with only one child will be removed and new connections and Branch lengths will be made to reflect change.
- reassign_names(mapping, nodes=None)#
Reassigns node names based on a mapping dict
mapping : dict, old_name -> new_name nodes : specific nodes for renaming (such as just tips, etc…)
- remove(target)#
Removes node by name instead of identity.
Returns True if node was present, False otherwise.
- remove_deleted(is_deleted)#
Removes all nodes where is_deleted tests true.
Internal nodes that have no children as a result of removing deleted are also removed.
- remove_node(target)#
Removes node by identity instead of value.
Returns True if node was present, False otherwise.
- root()#
Returns root of the tree self is in. Dynamically calculated.
- root_at_midpoint()#
return a new tree rooted at midpoint of the two tips farthest apart
this fn doesn’t preserve the internal node naming or structure, but does keep tip to tip distances correct. uses unrooted_deepcopy()
- rooted_at(edge_name)#
Return a new tree rooted at the provided node.
- Usage:
This can be useful for drawing unrooted trees with an orientation that reflects knowledge of the true root location.
- rooted_with_tip(outgroup_name)#
A new tree with the named tip as one of the root’s children
- same_shape(other)#
Ignores lengths and order, so trees should be sorted first
- same_topology(other)#
Tests whether two trees have the same topology.
- scale_branch_lengths(max_length=100, ultrametric=False)#
Scales BranchLengths in place to integers for ascii output.
Warning: tree might not be exactly the length you specify.
Set ultrametric=True if you want all the root-tip distances to end up precisely the same.
- separation(other)#
Returns number of edges separating self and other.
- set_max_tip_tip_distance()#
Propagate tip distance information up the tree
This method was originally implemented by Julia Goodrich with the intent of being able to determine max tip to tip distances between nodes on large trees efficiently. The code has been modified to track the specific tips the distance is between
- set_param_value(param, edge, value)#
set’s the value for param at named edge
- set_tip_distances()#
Sets distance from each node to the most distant tip.
- siblings()#
Returns all nodes that are children of the same parent as self.
Note: excludes self from the list. Dynamically calculated.
- sorted(sort_order=None)#
An equivalent tree sorted into a standard order. If this is not specified then alphabetical order is used. At each node starting from root, the algorithm will try to put the descendant which contains the lowest scoring tip on the left.
- subset()#
Returns set of names that descend from specified node
- subsets()#
Returns all sets of names that come from specified node and its kids
- tip_children()#
Returns direct children of self that are tips.
- tip_to_tip_distances(endpoints=None, default_length=1)#
Returns distance matrix between all pairs of tips, and a tip order.
Warning: .__start and .__stop added to self and its descendants.
tip_order contains the actual node objects, not their names (may be confusing in some cases).
- tips(include_self=False)#
Returns tips descended from self, [] if self is a tip.
- tips_within_distance(distance)#
Returns tips within specified distance from self
Branch lengths of None will be interpreted as 0
- to_json()#
returns json formatted string {‘newick’: with edges and distances, ‘edge_attributes’: }
- to_rich_dict()#
returns {‘newick’: with node names, ‘edge_attributes’: {‘tip1’: {‘length’: …}, …}}
- total_descending_branch_length()#
Returns total descending branch length from self
- total_length()#
returns the sum of all branch lengths in tree
- traverse(self_before=True, self_after=False, include_self=True)#
Returns iterator over descendants. Iterative: safe for large trees.
self_before includes each node before its descendants if True. self_after includes each node after its descendants if True. include_self includes the initial node if True.
self_before and self_after are independent. If neither is True, only terminal nodes will be returned.
Note that if self is terminal, it will only be included once even if self_before and self_after are both True.
This is a depth-first traversal. Since the trees are not binary, preorder and postorder traversals are possible, but inorder traversals would depend on the data in the tree and are not handled here.
- tree_distance(other: TreeNode, method: str | None = None) int #
Return the specified tree distance between this and another tree.
Defaults to the Lin-Rajan-Moret distance on unrooted trees. Defaults to the Matching Cluster distance on rooted trees.
- Parameters:
- other: TreeNode
The other tree to calculate the distance between.
- method: str | None
The tree distance metric to use.
Options are: “rooted_robinson_foulds”: The Robinson-Foulds distance for rooted trees. “unrooted_robinson_foulds”: The Robinson-Foulds distance for unrooted trees. “matching_cluster”: The Matching Cluster distance for rooted trees. “lin_rajan_moret”: The Lin-Rajan-Moret distance for unrooted trees. “rrf”: An alias for rooted_robinson_foulds. “urf”: An alias for unrooted_robinson_foulds. “mc”: An alias for matching_cluster. “lrm”: An alias for lin_rajan_moret. “rf”: The unrooted/rooted Robinson-Foulds distance for unrooted/rooted trees. “matching”: The Lin-Rajan-Moret/Matching Cluster distance for unrooted/rooted trees.
Default is “matching”.
- Returns:
- int
the chosen distance between the two trees.
Notes
The Lin-Rajan-Moret distance [2] and Matching Cluster distance [1] display superior statistical properties than the Robinson-Foulds distance [3] on unrooted and rooted trees respectively.
References
[1]Bogdanowicz, D., & Giaro, K. (2013). On a matching distance between rooted phylogenetic trees. International Journal of Applied Mathematics and Computer Science, 23(3), 669-684.
[2]Lin et al. 2012 A Metric for Phylogenetic Trees Based on Matching IEEE/ACM Transactions on Computational Biology and Bioinformatics vol. 9, no. 4, pp. 1014-1022, July-Aug. 2012
[3]Robinson, David F., and Leslie R. Foulds. Comparison of phylogenetic trees. Mathematical biosciences 53.1-2 (1981): 131-147.
- unrooted()#
A tree with at least 3 children at the root.
- unrooted_deepcopy(constructor=None, parent=None)#
- write(filename, with_distances=True, format=None)#
Save the tree to filename
- Parameters:
- filename
self
- with_distances
whether branch lengths are included in string.
- format
default is newick, xml and json are alternate. Argument overrides the filename suffix. All attributes are saved in the xml format. Value overrides the file name suffix.
Notes
Only the cogent3 json and xml tree formats are supported.