Making Sense from Sequence#
cogent3
is a Python library for the analysis of biological sequence data. We endeavour to provide a first-class experience within Jupyter notebooks, but the algorithms also support parallel execution on compute systems with 1000’s of processors.
Check out the other tabs on this page for installation instructions and highlights of what you can do with cogent3
. See the links at the top of the page for an image gallery and detailed user guides.
For most uses, we recommend installation with the “extra” dependencies as these add support for visualisation and Jupyter notebooks.
pip install "cogent3[extra]"
For users on HPC systems, do the vanilla installation.
pip install cogent3
cogent3
provides an extensive suite of capabilities for manipulating and analysing sequence data. For instance, the ability to read standard biological data formats, manipulate sequences by their annotations, to perform multiple sequence alignment (app docs) using any of our substitution models, phylogenetic reconstruction and tree manipulation, manipulation of tabular data, visualisation of phylogenies (image gallery) and much more.
🎬 Data wrangling with sequence annotations
Differences in the frequency of nucleotides between species are common. In such cases, non-reversible models of sequence evolution are required for robust estimation of important quantities such as branch lengths, or measuring natural selection [1, 2] (see using non-stationary models.). We have done more than just invent these new methods, we have established the most robust algorithms [3] for their implementation and their suitability for real data [4].
🎬 Testing a hypothesis involving a non-stationary nucleotide process
You don’t have to be an expert in structural programming languages (like Python) to use cogent3
! Interactive usage in Jupyter notebooks and a functional programming style interface lowers the barrier to entry. Individuals comfortable with R should find this interface less complex. (See the cogent3.app
documentation.)
🎬 Using cogent3 apps
🆕 Features & 📣 Announcements#
📣 Migration to new type core objects ‼️
The first release after July 1st 2025 will remove all of the old type classes! We are changing the migration strategy from old type to new type cogent3
core classes. While this is a major change, we have been using these ourselves consistently and feel confident that the disruption to users should be small. We strongly advise all users to migrate now and report any errors. To do this, add the following statement to the top of your scripts or notebooks.
import os
os.environ["COGENT3_NEW_TYPE"] = "1"
🆕 Cogent3 implements plugin hooks 🔌🪝🎉
We have implemented the infrastructure to support hook-style plugins. We have definied a single hook now – the new type Alignment.quick_tree()
method checks for an external plugin for calculation. piqtree 0.5.0 has implemented support for this.
🆕 Cogent3 supports plugins for reading, writing, storing sequence data 🔌🎉
Who doesn’t love the myriad of file formats for biological sequences!! Or that sequence collections can now have millions of records!? We now support third-party contributions for reading and writing sequences. We also support alternate storage backends for our sequence collection classes. The cogent3-h5seqs project uses HDF5 plus compression for efficient storage of large volumes of sequences. See the docs for an example of how to use third-party storage.
🆕 Cogent3 and Plotly blog post 😎
A demo of the combined power of cogent3 and Plotly applied to the analysis of SARS-COV-2 genomes.
Citations
Benjamin D Kaehler, Von Bing Yap, Rongli Zhang, and Gavin A Huttley. Genetic distance for a general non-stationary Markov substitution process. Systematic Biology, 64:281–93, 2015. URL: https://www.ncbi.nlm.nih.gov/pubmed/25503772.
Benjamin D Kaehler, Von Bing Yap, and Gavin A Huttley. Standard codon substitution models overestimate purifying selection for non-stationary data. Genome Biology and Evolution, 9:134–149, 2017. URL: https://www.ncbi.nlm.nih.gov/pubmed/28175284.
Harold W Schranz, Von Bing Yap, Simon Easteal, Rob Knight, and Gavin A Huttley. Pathological rate matrices: from primates to pathogens. BMC Bioinformatics, 9:550, 2008. URL: https://www.ncbi.nlm.nih.gov/pubmed/19099591.
Klara L Verbyla, Von Bing Yap, Anuj Pahwa, Yunli Shao, and Gavin A Huttley. The embedding problem for Markov models of nucleotide substitution. PLoS ONE, 8:e69187, 2013. URL: https://pubmed.ncbi.nlm.nih.gov/23935949/.