Select named sequences from a collection#
Let’s load alignment of primates to use in examples.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="dna")
aln = loader("data/primate_brca1.fasta")
aln
0 | |
Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
Galago | .......A................................G................... |
HowlerMon | ...............................................G............ |
Rhesus | ...............................................G............ |
Orangutan | ............................................................ |
Gorilla | ............................................................ |
Human | ............................................................ |
7 x 2814 (truncated to 7 x 60) dna alignment
Selecting sequences from a collection#
Using the take_named_seqs
app we can select sequences from a collection that match provided names. For instance, we could be interested in an alignment of sequences from Humans with those from our two closest relatives.
from cogent3 import get_app
greater_ape_selector = get_app("take_named_seqs", "Human", "Chimpanzee", "Gorilla")
ape_aln = greater_ape_selector(aln)
ape_aln
0 | |
Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
Human | ............................................................ |
Gorilla | ............................................................ |
3 x 2814 (truncated to 3 x 60) dna alignment
Using take_named_seqs
in a composed app process#
from cogent3 import get_app
loader = get_app("load_aligned", format="fasta", moltype="dna")
select_seqs = get_app("take_named_seqs", "Human", "Rhesus", "Galago")
process = loader + select_seqs
hrg_aln = process("data/primate_brca1.fasta")
hrg_aln.names
['Human', 'Rhesus', 'Galago']
Removing sequences from a collection#
Creating the take_named_seqs
app with the argument negate=True
will exclude sequences that match the provided names. For instance, perhaps the Galago has got to go.
from cogent3 import get_app
no_galagos = get_app("take_named_seqs", "Galago", negate=True)
no_galago_aln = no_galagos(hrg_aln)
no_galago_aln.names
['Human', 'Rhesus']