Sequences#

The Sequence object provides generic biological sequence manipulation functions, plus functions that are critical for the evolve module calculations.

Generic molecular types#

Sequence properties are affected by the moltype you specify. The default type for a sequence is "text".

from cogent3 import make_seq

my_seq = make_seq("AGTACACTGGT")
my_seq.moltype.label
'text'
my_seq
0
NoneAGTACACTGGT

Sequence, length=11

In some circumstances you can also have a "bytes" moltype, which I’ll explicitly construct here.

my_seq = make_seq("AGTACACTGGT", moltype="bytes")
my_seq.moltype.label
'bytes'
my_seq
0
NoneAGTACACTGGT

ByteSequence, length=11

DNA and RNA sequences#

Creating a DNA sequence from a string#

Sequence properties are affected by the moltype you specify. Here we specify the DNA MolType.

from cogent3 import make_seq

my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq
0
NoneAGTACACTGGT

DnaSequence, length=11

Creating a RNA sequence from a string#

from cogent3 import make_seq

rnaseq = make_seq("ACGUACGUACGUACGU", moltype="rna")

Converting to FASTA format#

from cogent3 import make_seq

my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq
0
NoneAGTACACTGGT

DnaSequence, length=11

Convert a RNA sequence to FASTA format#

from cogent3 import make_seq

rnaseq = make_seq("ACGUACGUACGUACGU", moltype="rna")
rnaseq
0
NoneACGUACGUACGUACGU

RnaSequence, length=16

Creating a named sequence#

You can also use a convenience make_seq() function, providing the moltype as a string.

from cogent3 import make_seq

my_seq = make_seq("AGTACACTGGT", "my_gene", moltype="dna")
my_seq
type(my_seq)
cogent3.core.sequence.DnaSequence

Setting or changing the name of a sequence#

from cogent3 import make_seq

my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq.name = "my_gene"
my_seq
0
my_geneAGTACACTGGT

DnaSequence, length=11

Complementing a DNA sequence#

from cogent3 import make_seq

my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq.complement()
0
NoneTCATGTGACCA

DnaSequence, length=11

Reverse complementing a DNA sequence#

my_seq.rc()
0
NoneACCAGTGTACT

DnaSequence, length=11

Translate a DnaSequence to protein#

from cogent3 import make_seq

my_seq = make_seq("GCTTGGGAAAGTCAAATGGAA", name="s1", moltype="dna")
pep = my_seq.get_translation()
type(pep)
cogent3.core.sequence.ProteinSequence
pep
0
s1AWESQME

ProteinSequence, length=7

Converting a DNA sequence to RNA#

from cogent3 import make_seq

my_seq = make_seq("ACGTACGTACGTACGT", moltype="dna")
rnaseq = my_seq.to_rna()
rnaseq
0
NoneACGUACGUACGUACGU

RnaSequence, length=16

Convert an RNA sequence to DNA#

from cogent3 import make_seq

rnaseq = make_seq("ACGUACGUACGUACGU", moltype="rna")
dnaseq = rnaseq.to_dna()
dnaseq
0
NoneACGTACGTACGTACGT

DnaSequence, length=16

Testing complementarity#

from cogent3 import make_seq

a = make_seq("AGTACACTGGT", moltype="dna")
a.can_pair(a.complement())
False
a.can_pair(a.rc())
True

Joining two DNA sequences#

from cogent3 import make_seq

my_seq = make_seq("AGTACACTGGT", moltype="dna")
extra_seq = make_seq("CTGAC", moltype="dna")
long_seq = my_seq + extra_seq
long_seq
0
NoneAGTACACTGGTCTGAC

DnaSequence, length=16

Slicing DNA sequences#

my_seq[1:6]
0
NoneGTACA

DnaSequence, length=5

Getting 3rd positions from codons#

The easiest approach is to work off the cogent3 ArrayAlignment object.

from cogent3 import make_seq

seq = make_seq("ATGATGATGATG", moltype="dna")
pos3 = seq[2::3]
assert str(pos3) == "GGGG"

Getting 1st and 2nd positions from codons#

In this instance we can use features.

from cogent3 import make_seq

seq = make_seq("ATGATGATGATG", moltype="dna")
indices = [(i, i + 2) for i in range(len(seq))[::3]]
pos12 = seq.add_feature(biotype="pos12", name="pos12", spans=indices)
pos12 = pos12.get_slice()
assert str(pos12) == "ATATATAT"

Return a randomized version of the sequence#

rnaseq.shuffle()
0
NoneGCGGCUUGUCCAAAUA

RnaSequence, length=16

Remove gaps from a sequence#

from cogent3 import make_seq

s = make_seq("--AUUAUGCUAU-UAu--", moltype="rna")
s.degap()
0
NoneAUUAUGCUAUUAU

RnaSequence, length=13