Sample nucleotides from a given codon position#
The take_codon_positions
app allows you to extract all nucleotides at a given codon position from an alignment.
Let’s create a sample alignment for our example.
from cogent3 import make_aligned_seqs
aln = make_aligned_seqs({"s1": "ACGACGACG", "s2": "GATGATGAT"}, moltype="dna")
aln
0 | |
s1 | ACGACGACG |
s2 | GATGATGAT |
2 x 9 dna alignment
Extract the third codon position from an alignment#
We can achieve this by creating the take_codon_positions
app with 3
as a positional argument.
from cogent3 import get_app
take_pos3 = get_app("take_codon_positions", 3, moltype="dna")
result = take_pos3(aln)
result
0 | |
s1 | GGG |
s2 | TTT |
2 x 3 dna alignment
Extract the first and second codon positions from an alignment#
We can achieve this by creating the take_codon_positions
app with 1
and 2
as a positional argument.
from cogent3 import get_app
take_pos12 = get_app("take_codon_positions", 1, 2, moltype="dna")
result = take_pos12(aln)
result
0 | |
s1 | ACACAC |
s2 | GAGAGA |
2 x 6 dna alignment
Extract only the third codon positions from four-fold degenerate codons#
We can achieve this by creating the take_codon_positions
app with the argument fourfold_degenerate=True
.
from cogent3 import get_app, make_aligned_seqs
aln_ff = make_aligned_seqs({"s1": "GCAAGCGTTTAT", "s2": "GCTTTTGTCAAT"})
take_fourfold = get_app("take_codon_positions", fourfold_degenerate=True, moltype="dna")
result = take_fourfold(aln_ff)
result
0 | |
s1 | AT |
s2 | TC |
2 x 2 dna alignment
Create a composed process which samples only the third codon position#
Let’s set up a data store containing all the files with the “.fasta” suffix in the data directory, limiting the data store to two members as a minimum example.
from cogent3 import open_data_store
fasta_seq_dstore = open_data_store("data", suffix="fasta", mode="r", limit=2)
Now let’s set up a process composing the following apps: load_aligned
(loads the sequences ), take_codon_positions
(extracts the third codon position), and write_seqs
(writes the filtered sequences to a data store).
Note
Learn the basics of turning apps into composed processes here!
from cogent3 import get_app, open_data_store
out_dstore = open_data_store(path_to_dir, suffix="fa", mode="w")
loader = get_app("load_aligned", format="fasta", moltype="dna")
cpos3 = get_app("take_codon_positions", 3)
writer = get_app("write_seqs", out_dstore, format="fasta")
process = loader + cpos3 + writer
Tip
When running this code on your machine, remember to replace path_to_dir
with an actual directory path.
Now let’s apply process
to our data store! This populates out_dstore
(which is returned by the .apply_to()
call) with the filtered alignments. We can index out_dstore
to see individual data members. We could take a closer look using the .read()
method on data members.
out_dstore = process.apply_to(fasta_seq_dstore)
out_dstore.describe
record type | number |
---|---|
completed | 1 |
not_completed | 1 |
logs | 1 |
3 rows x 2 columns