Select n sequences from a collection#
Note
These docs now use the new_type
core objects via the following setting.
import os
# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"
Let’s load an alignment of primates to use in examples.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="dna")
aln = loader("data/primate_brca1.fasta")
aln
0 | |
Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
Galago | .......A................................G................... |
HowlerMon | ...............................................G............ |
Rhesus | ...............................................G............ |
Orangutan | ............................................................ |
Gorilla | ............................................................ |
Human | ............................................................ |
7 x 2814 (truncated to 7 x 60) dna alignment
Select the first n sequences from an alignment#
Initialising take_n_seqs
with the argument number=3
creates an app that returns the first 3 sequences from an alignment
Note
“first n” refers to the ordering in the fasta file.
from cogent3 import get_app
first_3 = get_app("take_n_seqs", number=3)
first_3(aln)
0 | |
Galago | TGTGGCAAAAATACTCATGCCAGCTCATTACAGCATGAGAGCAGTTTATTACTCACTAAA |
HowlerMon | .......C................................A......G............ |
Rhesus | .......C................................A......G............ |
3 x 2814 (truncated to 3 x 60) dna alignment
Randomly selecting n sequences from an alignment#
Using random=True
and number=3
returns 3 random sequences. An optional argument for a seed
can be provided to ensure the same sequences are returned each time the app is called.
from cogent3 import get_app
random_n = get_app("take_n_seqs", random=True, number=3, seed=1)
random_n(aln)
0 | |
Chimpanzee | TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA |
Rhesus | ...............................................G............ |
HowlerMon | ...............................................G............ |
3 x 2814 (truncated to 3 x 60) dna alignment
Selecting the same sequences from multiple alignments#
Providing the argument fixed_choice=True
ensures the same sequences are returned when (randomly) sampling sequences across several alignments.
from cogent3 import get_app
loader = get_app("load_aligned", moltype="dna")
aln1 = loader("data/primate_brca1.fasta")
aln2 = loader("data/brca1.fasta")
aln1.names
('Galago',
'HowlerMon',
'Rhesus',
'Orangutan',
'Gorilla',
'Human',
'Chimpanzee')
aln2.names
('FlyingFox',
'DogFaced',
'FreeTaile',
'LittleBro',
'TombBat',
'RoundEare',
'FalseVamp',
'LeafNose',
'Horse',
'Rhino',
'Pangolin',
'Cat',
'Dog',
'Llama',
'Pig',
'Cow',
'Hippo',
'SpermWhale',
'HumpbackW',
'Mole',
'Hedgehog',
'TreeShrew',
'FlyingLem',
'Galago',
'HowlerMon',
'Rhesus',
'Orangutan',
'Gorilla',
'Human',
'Chimpanzee',
'Jackrabbit',
'FlyingSqu',
'OldWorld',
'Mouse',
'Rat',
'NineBande',
'HairyArma',
'Anteater',
'Sloth',
'Dugong',
'Manatee',
'AfricanEl',
'AsianElep',
'RockHyrax',
'TreeHyrax',
'Aardvark',
'GoldenMol',
'Madagascar',
'Tenrec',
'LesserEle',
'GiantElep',
'Caenolest',
'Phascogale',
'Wombat',
'Bandicoot')
fixed_choice = get_app("take_n_seqs", number=2, random=True, fixed_choice=True)
result1 = fixed_choice(aln1).names
result2 = fixed_choice(aln2).names
result1 == result2
True