Select n sequences from a collection#

Let’s load an alignment of primates to use in examples.

from cogent3 import get_app

loader = get_app("load_aligned", moltype="dna")
aln = loader("data/primate_brca1.fasta")
aln
0
ChimpanzeeTGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA
Galago.......A................................G...................
HowlerMon...............................................G............
Rhesus...............................................G............
Orangutan............................................................
Gorilla............................................................
Human............................................................

7 x 2814 (truncated to 7 x 60) dna alignment

Select the first n sequences from an alignment#

Initialising take_n_seqs with the argument number=3 creates an app that returns the first 3 sequences from an alignment

Note

“first n” refers to the ordering in the fasta file.

from cogent3 import get_app

first_3 = get_app("take_n_seqs", number=3)
first_3(aln)
0
GalagoTGTGGCAAAAATACTCATGCCAGCTCATTACAGCATGAGAGCAGTTTATTACTCACTAAA
Rhesus.......C................................A......G............
HowlerMon.......C................................A......G............

3 x 2814 (truncated to 3 x 60) dna alignment

Randomly selecting n sequences from an alignment#

Using random=True and number=3 returns 3 random sequences. An optional argument for a seed can be provided to ensure the same sequences are returned each time the app is called.

from cogent3 import get_app

random_n = get_app("take_n_seqs", random=True, number=3, seed=1)
random_n(aln)
0
ChimpanzeeTGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA
HowlerMon...............................................G............
Rhesus...............................................G............

3 x 2814 (truncated to 3 x 60) dna alignment

Selecting the same sequences from multiple alignments#

Providing the argument fixed_choice=True ensures the same sequences are returned when (randomly) sampling sequences across several alignments.

from cogent3 import get_app

loader = get_app("load_aligned", moltype="dna")
aln1 = loader("data/primate_brca1.fasta")
aln2 = loader("data/brca1.fasta")

aln1.names
('Galago',
 'HowlerMon',
 'Rhesus',
 'Orangutan',
 'Gorilla',
 'Human',
 'Chimpanzee')
aln2.names
('FlyingFox',
 'DogFaced',
 'FreeTaile',
 'LittleBro',
 'TombBat',
 'RoundEare',
 'FalseVamp',
 'LeafNose',
 'Horse',
 'Rhino',
 'Pangolin',
 'Cat',
 'Dog',
 'Llama',
 'Pig',
 'Cow',
 'Hippo',
 'SpermWhale',
 'HumpbackW',
 'Mole',
 'Hedgehog',
 'TreeShrew',
 'FlyingLem',
 'Galago',
 'HowlerMon',
 'Rhesus',
 'Orangutan',
 'Gorilla',
 'Human',
 'Chimpanzee',
 'Jackrabbit',
 'FlyingSqu',
 'OldWorld',
 'Mouse',
 'Rat',
 'NineBande',
 'HairyArma',
 'Anteater',
 'Sloth',
 'Dugong',
 'Manatee',
 'AfricanEl',
 'AsianElep',
 'RockHyrax',
 'TreeHyrax',
 'Aardvark',
 'GoldenMol',
 'Madagascar',
 'Tenrec',
 'LesserEle',
 'GiantElep',
 'Caenolest',
 'Phascogale',
 'Wombat',
 'Bandicoot')
fixed_choice = get_app("take_n_seqs", number=2, random=True, fixed_choice=True)
result1 = fixed_choice(aln1).names
result2 = fixed_choice(aln2).names
result1 == result2
True