Calculate pairwise distances between sequences#
Section author: Gavin Huttley
An example of how to calculate the pairwise distances for a set of sequences.
from cogent3 import load_aligned_seqs
from cogent3.evolve import distance
Import a substitution model (or create your own)
from cogent3.evolve.models import HKY85
Load my alignment
al = load_aligned_seqs("data/long_testseqs.fasta")
Create a pairwise distances object with your alignment and substitution model and run it.
d = distance.EstimateDistances(al, submodel=HKY85())
d.run(show_progress=False)
d.get_pairwise_distances()
names | DogFaced | HowlerMon | Human | Mouse | NineBande |
---|---|---|---|---|---|
DogFaced | 0.0000 | 0.2078 | 0.1972 | 0.4022 | 0.2019 |
HowlerMon | 0.2078 | 0.0000 | 0.0730 | 0.3487 | 0.1865 |
Human | 0.1972 | 0.0730 | 0.0000 | 0.3363 | 0.1804 |
Mouse | 0.4022 | 0.3487 | 0.3363 | 0.0000 | 0.3813 |
NineBande | 0.2019 | 0.1865 | 0.1804 | 0.3813 | 0.0000 |
Note that pairwise distances can be distributed for computation across multiple CPU’s. In this case, when statistics (like distances) are requested only the master CPU returns data.
We’ll write a phylip formatted distance matrix.
d.write("dists_for_phylo.phylip", format="phylip")
We’ll also save the distances to file in Python’s pickle format.
import pickle
with open("dists_for_phylo.pickle", "wb") as f:
pickle.dump(d.get_pairwise_distances(), f)