Calculate pairwise distances between sequences#

Section author: Gavin Huttley

Note

These docs now use the new_type core objects via the following setting.

import os

# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"

An example of how to calculate the pairwise distances for a set of sequences.

from cogent3 import load_aligned_seqs
from cogent3.evolve import distance

Import a substitution model (or create your own)

from cogent3.evolve.models import HKY85

Load my alignment

al = load_aligned_seqs("data/long_testseqs.fasta", moltype="dna")

Create a pairwise distances object with your alignment and substitution model and run it.

d = distance.EstimateDistances(al, submodel=HKY85())
d.run(show_progress=False)
d.get_pairwise_distances()
namesDogFacedHowlerMonHumanMouseNineBande
DogFaced0.00000.20780.19720.40220.2019
HowlerMon0.20780.00000.07300.34870.1865
Human0.19720.07300.00000.33630.1804
Mouse0.40220.34870.33630.00000.3813
NineBande0.20190.18650.18040.38130.0000

Note that pairwise distances can be distributed for computation across multiple CPU’s. In this case, when statistics (like distances) are requested only the master CPU returns data.

We’ll write a phylip formatted distance matrix.

d.write("dists_for_phylo.phylip", format="phylip")

We’ll also save the distances to file in Python’s pickle format.

import pickle

with open("dists_for_phylo.pickle", "wb") as f:
    pickle.dump(d.get_pairwise_distances(), f)