Sample nucleotides from a given codon position#

The take_codon_positions app allows you to extract all nucleotides at a given codon position from an alignment.

Let’s create a sample alignment for our example.

from cogent3 import make_aligned_seqs

aln = make_aligned_seqs({"s1": "ACGACGACG", "s2": "GATGATGAT"}, moltype="dna")
aln

	0
s1	ACGACGACG
s2	GATGATGAT

2 x 9 dna alignment

Extract the third codon position from an alignment#

We can achieve this by creating the take_codon_positions app with 3 as a positional argument.

from cogent3 import get_app

take_pos3 = get_app("take_codon_positions", 3, moltype="dna")
result = take_pos3(aln)
result

	0
s1	GGG
s2	TTT

2 x 3 dna alignment

Extract the first and second codon positions from an alignment#

We can achieve this by creating the take_codon_positions app with 1 and 2 as a positional argument.

from cogent3 import get_app

take_pos12 = get_app("take_codon_positions", 1, 2, moltype="dna")
result = take_pos12(aln)
result

	0
s1	ACACAC
s2	GAGAGA

2 x 6 dna alignment

Extract only the third codon positions from four-fold degenerate codons#

We can achieve this by creating the take_codon_positions app with the argument fourfold_degenerate=True.

from cogent3 import get_app, make_aligned_seqs

aln_ff = make_aligned_seqs({"s1": "GCAAGCGTTTAT", "s2": "GCTTTTGTCAAT"}, moltype="dna")
take_fourfold = get_app("take_codon_positions", fourfold_degenerate=True, moltype="dna")
result = take_fourfold(aln_ff)
result

	0
s1	AT
s2	TC

2 x 2 dna alignment

Create a composed process which samples only the third codon position#

Let’s set up a data store containing all the files with the “.fasta” suffix in the data directory, limiting the data store to two members as a minimum example.

from cogent3 import open_data_store

fasta_seq_dstore = open_data_store("data", suffix="fasta", mode="r", limit=2)

Now let’s set up a process composing the following apps: load_aligned (loads the sequences ), take_codon_positions (extracts the third codon position), and write_seqs (writes the filtered sequences to a data store).

Note

Learn the basics of turning apps into composed processes here!

from cogent3 import get_app, open_data_store

out_dstore = open_data_store(path_to_dir, suffix="fa", mode="w")

loader = get_app("load_aligned", format_name="fasta", moltype="dna")
cpos3 = get_app("take_codon_positions", 3)
writer = get_app("write_seqs", out_dstore, format_name="fasta")

process = loader + cpos3 + writer

Tip

When running this code on your machine, remember to replace path_to_dir with an actual directory path.

Now let’s apply process to our data store! This populates out_dstore (which is returned by the .apply_to() call) with the filtered alignments. We can index out_dstore to see individual data members. We could take a closer look using the .read() method on data members.

out_dstore = process.apply_to(fasta_seq_dstore)
out_dstore.describe

Directory datastore
record type	number
completed	2
not_completed	0
logs	1

3 rows x 2 columns