Calculate pairwise distances between sequences

Section author: Gavin Huttley

An example of how to calculate the pairwise distances for a set of sequences.

from cogent3 import load_aligned_seqs
from cogent3.evolve import distance

Import a substitution model (or create your own)

from cogent3.evolve.models import HKY85

Load my alignment

al = load_aligned_seqs("data/long_testseqs.fasta")

Create a pairwise distances object with your alignment and substitution model and run it.

d = distance.EstimateDistances(al, submodel=HKY85())
d.run(show_progress=False)
d.get_pairwise_distances()
namesDogFacedHowlerMonHumanMouseNineBande
DogFaced0.00000.20780.19720.40220.2019
HowlerMon0.20780.00000.07300.34870.1865
Human0.19720.07300.00000.33630.1804
Mouse0.40220.34870.33630.00000.3813
NineBande0.20190.18650.18040.38130.0000

Note that pairwise distances can be distributed for computation across multiple CPU’s. In this case, when statistics (like distances) are requested only the master CPU returns data.

We’ll write a phylip formatted distance matrix.

d.write("dists_for_phylo.phylip", format="phylip")

We’ll also save the distances to file in Python’s pickle format.

import pickle

with open("dists_for_phylo.pickle", "wb") as f:
    pickle.dump(d.get_pairwise_distances(), f)