Counting gaps per sequence

We have several different ways of counting sequence gaps, and of visualising the results. By default, the count_gaps_per_seq() method returns a matrix of counts without the ability to visualise the results. When setting the argument unique=True, the counts are for gaps uniquely induced by each sequence. This can be a useful indicator of highly divergent sequences.

from cogent3 import load_aligned_seqs

aln = load_aligned_seqs("data/brca1.fasta", moltype="dna")

counts = aln.count_gaps_per_seq(unique=True)
counts[10: 20] # limiting the width of the displayed output
PangolinCatDogLlamaPigCowHippoSpermWhaleHumpbackWMole
0000030000

Plotting counts of unique gaps

Using the drawable argument causes the returned object to have a drawable attribute (type Drawable which has show() and write() methods), for the corresponding plot type. The three plot types supported are shown below. In all cases, placing the mouse pointer over a data point will show hover text with the number of unique gaps and the sequence name.

Displaying unique gaps as a bar chart

counts = aln.count_gaps_per_seq(unique=True, drawable="bar")
counts.drawable.show(width=500)

Displaying unique gaps as a violin plot

counts = aln.count_gaps_per_seq(unique=True, drawable="violin")
counts.drawable.show(width=300, height=500)

Displaying unique gaps as a box plot

counts = aln.count_gaps_per_seq(unique=True, drawable="box")
counts.drawable.show(width=300, height=500)