Sequences#
The Sequence
object provides generic biological sequence manipulation functions, plus functions that are critical for the evolve
module calculations.
Generic molecular types#
Sequence properties are affected by the moltype you specify. The default type for a sequence is "text"
.
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT")
my_seq.moltype.label
'text'
my_seq
0 | |
None | AGTACACTGGT |
Sequence, length=11
In some circumstances you can also have a "bytes"
moltype, which I’ll explicitly construct here.
my_seq = make_seq("AGTACACTGGT", moltype="bytes")
my_seq.moltype.label
'bytes'
my_seq
0 | |
None | AGTACACTGGT |
ByteSequence, length=11
DNA and RNA sequences#
Creating a DNA sequence from a string#
Sequence properties are affected by the moltype you specify. Here we specify the DNA
MolType
.
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq
0 | |
None | AGTACACTGGT |
DnaSequence, length=11
Creating a RNA sequence from a string#
from cogent3 import make_seq
rnaseq = make_seq("ACGUACGUACGUACGU", moltype="rna")
Converting to FASTA format#
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq
0 | |
None | AGTACACTGGT |
DnaSequence, length=11
Convert a RNA sequence to FASTA format#
from cogent3 import make_seq
rnaseq = make_seq("ACGUACGUACGUACGU", moltype="rna")
rnaseq
0 | |
None | ACGUACGUACGUACGU |
RnaSequence, length=16
Creating a named sequence#
You can also use a convenience make_seq()
function, providing the moltype as a string.
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", "my_gene", moltype="dna")
my_seq
type(my_seq)
cogent3.core.sequence.DnaSequence
Setting or changing the name of a sequence#
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq.name = "my_gene"
my_seq
0 | |
my_gene | AGTACACTGGT |
DnaSequence, length=11
Complementing a DNA sequence#
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", moltype="dna")
my_seq.complement()
0 | |
None | TCATGTGACCA |
DnaSequence, length=11
Reverse complementing a DNA sequence#
my_seq.rc()
0 | |
None | ACCAGTGTACT |
DnaSequence, length=11
Translate a DnaSequence
to protein#
from cogent3 import make_seq
my_seq = make_seq("GCTTGGGAAAGTCAAATGGAA", name="s1", moltype="dna")
pep = my_seq.get_translation()
type(pep)
cogent3.core.sequence.ProteinSequence
pep
0 | |
s1 | AWESQME |
ProteinSequence, length=7
Converting a DNA sequence to RNA#
from cogent3 import make_seq
my_seq = make_seq("ACGTACGTACGTACGT", moltype="dna")
rnaseq = my_seq.to_rna()
rnaseq
0 | |
None | ACGUACGUACGUACGU |
RnaSequence, length=16
Convert an RNA sequence to DNA#
from cogent3 import make_seq
rnaseq = make_seq("ACGUACGUACGUACGU", moltype="rna")
dnaseq = rnaseq.to_dna()
dnaseq
0 | |
None | ACGTACGTACGTACGT |
DnaSequence, length=16
Testing complementarity#
from cogent3 import make_seq
a = make_seq("AGTACACTGGT", moltype="dna")
a.can_pair(a.complement())
False
a.can_pair(a.rc())
True
Joining two DNA sequences#
from cogent3 import make_seq
my_seq = make_seq("AGTACACTGGT", moltype="dna")
extra_seq = make_seq("CTGAC", moltype="dna")
long_seq = my_seq + extra_seq
long_seq
0 | |
None | AGTACACTGGTCTGAC |
DnaSequence, length=16
Slicing DNA sequences#
my_seq[1:6]
0 | |
None | GTACA |
DnaSequence, length=5
Getting 3rd positions from codons#
The easiest approach is to work off the cogent3
ArrayAlignment
object.
from cogent3 import make_seq
seq = make_seq("ATGATGATGATG", moltype="dna")
pos3 = seq[2::3]
assert str(pos3) == "GGGG"
Getting 1st and 2nd positions from codons#
In this instance we can use features.
from cogent3 import make_seq
seq = make_seq("ATGATGATGATG", moltype="dna")
indices = [(i, i + 2) for i in range(len(seq))[::3]]
pos12 = seq.add_feature(biotype="pos12", name="pos12", spans=indices)
pos12 = pos12.get_slice()
assert str(pos12) == "ATATATAT"
Return a randomized version of the sequence#
rnaseq.shuffle()
0 | |
None | UCUCUACCAUGAGGAG |
RnaSequence, length=16
Remove gaps from a sequence#
from cogent3 import make_seq
s = make_seq("--AUUAUGCUAU-UAu--", moltype="rna")
s.degap()
0 | |
None | AUUAUGCUAUUAU |
RnaSequence, length=13