Sequence logos

Sequence logo’s display sequence information. They’re extensively applied to transcription factor binding site (TFBS) display. They can also be applied to sequence alignments more generally.

Drawing logo for a TFBS

We use the TFBS for the TATA box binding protein.

from cogent3 import load_aligned_seqs
from cogent3.parse import jaspar

_, pwm = jaspar.read("data/tbp.jaspar")
freqarr = pwm.to_freq_array()
freqarr[:5]  # illustrating the contents of the MotifFreqsArray
TCAG
00.07970.37280.15680.3907
10.79430.11830.04110.0463
20.09000.00000.90490.0051
30.96140.02570.00770.0051
40.07710.00000.91000.0129
logo = freqarr.logo()
logo.show(height=250, width=500)

Drawing a sequence logo from a multiple sequence alignment

This can be done for an entire alignment, but bear in mind it can take some time to render. Note that we include gap characters in the display.

aln = load_aligned_seqs("data/brca1-bats.fasta", moltype="dna")
l = aln[:311].seqlogo(height=300, width=500, wrap=60, vspace=0.05)
l.show()

Sequence logo of protein alignment

No difference here except it uses the built-in colour scheme from the protein MolType.

aa = aln.get_translation(incomplete_ok=True)[:120]
logo = aa.seqlogo(width=500, height=300, wrap=50, vspace=0.1)
logo.show()