Making Sense from Sequence#

cogent3 is a python library for analysis of biological sequence data. We endeavour to provide a first-class experience within Jupyter notebooks, but the algorithms also support parallel execution on compute systems with 1000’s of processors. It be used for…

cogent3 provides an extensive suite of capabilities for manipulating and analysing sequence data. For instance, the ability to read standard biological data formats, manipulate sequences by their annotations, to perform multiple sequence alignment (app docs) using any of our substitution models, phylogenetic reconstruction and tree manipulation, manipulation of tabular data, visualisation of phylogenies (image gallery) and much more.

🎬 Data wrangling with sequence annotations

Differences in the frequency of nucleotides between species are common. In such cases, non-reversible models of sequence evolution are required for robust estimation of important quantities such as branch lengths, or measuring natural selection [1, 2] (see using non-stationary models.). We have done more than just invent these new methods, we have established the most robust algorithms [3] for their implementation and their suitability for real data [4].

🎬 Testing a hypothesis involving a non-stationary nucleotide process

You don’t have to be an expert in structural programming languages (like Python) to use cogent3! Interactive usage in Jupyter notebooks and a functional programming style interface lowers the barrier to entry. Individuals comfortable with R should find this interface less complex. (See the cogent3.app documentation.)

🎬 Using cogent3 apps

📣 New Features & Announcements#

🆕 The release of piqtree 🎉

The piqtree project has made a major release. It now supports parallel execution for some functions.

🆕 Cogent3 and Plotly blog post 😎

A demo of the combined power of cogent3 and Plotly applied to the analysis of SARS-COV-2 genomes.

🆕 New core data types improve efficiency and flexibility

The cogent3 development team 👾 have been hard at work modernising the core internals 💪🛠.

The grand rewrite of alignment classes is ready for use! The new approach unifies the best features of the old classes plus gives us the foundation for major performance improvements in the future (see the next announcement). Try them out by setting new_type=True in the top level functions make_aligned_seqs() and load_aligned_seqs(). The new types are not fully integrated into the existing code and can differ in their API relative to the old style classes.

Please try them out and give us feedback!

🆕 Faster pairwise genetic distance calculations 🚀

We have completely rewritten a subset of the genetic distance calculators. These are now only available using the new type Alignment.distance_matrix() method. Single CPU performance is waaay faster 💨 plus they support parallel execution.

🆕 A new tutorial on using non-stationary amino acid models 🧐

A new contribution from Peter Goodman and Andrew Wheeler demonstrates how to specify a non-stationary amino acid substitution model. Check out their tutorial and the original paper. Thanks Peter, Andrew and their colleagues!

🆕 Faster sequence coevolution measures 🚀

We have completely rewritten all the Mutual Information based coevolution statistic calculators. Single CPU performance is orders of magnitude faster than the old implementation and we now also support parallel execution. The existing <alignment>.coevolution() method uses these so you don’t need to do anything different to use the new algorithms.

🆕 Supporting third-party apps as plugins 🔌

Third-party developers can deploy their code as cogent3 apps with just a few lines. See the app demo project for an example of how easy it is to share your cogent3 apps.

Please post any questions you have about writing apps or sharing them on cogent3 discussions.


Citations

[1]

Benjamin D Kaehler, Von Bing Yap, Rongli Zhang, and Gavin A Huttley. Genetic distance for a general non-stationary Markov substitution process. Systematic Biology, 64:281–93, 2015. URL: https://www.ncbi.nlm.nih.gov/pubmed/25503772.

[2]

Benjamin D Kaehler, Von Bing Yap, and Gavin A Huttley. Standard codon substitution models overestimate purifying selection for non-stationary data. Genome Biology and Evolution, 9:134–149, 2017. URL: https://www.ncbi.nlm.nih.gov/pubmed/28175284.

[3]

Harold W Schranz, Von Bing Yap, Simon Easteal, Rob Knight, and Gavin A Huttley. Pathological rate matrices: from primates to pathogens. BMC Bioinformatics, 9:550, 2008. URL: https://www.ncbi.nlm.nih.gov/pubmed/19099591.

[4]

Klara L Verbyla, Von Bing Yap, Anuj Pahwa, Yunli Shao, and Gavin A Huttley. The embedding problem for Markov models of nucleotide substitution. PLoS ONE, 8:e69187, 2013. URL: https://pubmed.ncbi.nlm.nih.gov/23935949/.