Counting gaps per sequence#

We have several different ways of counting sequence gaps, and of visualising the results. By default, the count_gaps_per_seq() method returns a matrix of counts without the ability to visualise the results. When setting the argument unique=True, the counts are for gaps uniquely induced by each sequence. This can be a useful indicator of highly divergent sequences.

from cogent3 import load_aligned_seqs

aln = load_aligned_seqs("data/brca1.fasta", moltype="dna")

counts = aln.count_gaps_per_seq(unique=True)
counts[10: 20] # limiting the width of the displayed output

Pangolin	Cat	Dog	Llama	Pig	Cow	Hippo	SpermWhale	HumpbackW	Mole
0	0	0	0	0	3	0	0	0	0

Plotting counts of unique gaps#

Using the drawable argument causes the returned object to have a drawable attribute (type Drawable which has show() and write() methods), for the corresponding plot type. The three plot types supported are shown below. In all cases, placing the mouse pointer over a data point will show hover text with the number of unique gaps and the sequence name.

Displaying unique gaps as a bar chart#

counts = aln.count_gaps_per_seq(unique=True, drawable="bar")
counts.drawable.show(width=500)

Displaying unique gaps as a violin plot#

counts = aln.count_gaps_per_seq(unique=True, drawable="violin")
counts.drawable.show(width=300, height=500)

Displaying unique gaps as a box plot#

counts = aln.count_gaps_per_seq(unique=True, drawable="box")
counts.drawable.show(width=300, height=500)