Select n sequences from a collection#

Note

These docs now use the new_type core objects via the following setting.

import os

# using new types without requiring an explicit argument
os.environ["COGENT3_NEW_TYPE"] = "1"

Let’s load an alignment of primates to use in examples.

from cogent3 import get_app

loader = get_app("load_aligned", moltype="dna")
aln = loader("data/primate_brca1.fasta")
aln
0
ChimpanzeeTGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA
Galago.......A................................G...................
HowlerMon...............................................G............
Rhesus...............................................G............
Orangutan............................................................
Gorilla............................................................
Human............................................................

7 x 2814 (truncated to 7 x 60) dna alignment

Select the first n sequences from an alignment#

Initialising take_n_seqs with the argument number=3 creates an app that returns the first 3 sequences from an alignment

Note

“first n” refers to the ordering in the fasta file.

from cogent3 import get_app

first_3 = get_app("take_n_seqs", number=3)
first_3(aln)
0
GalagoTGTGGCAAAAATACTCATGCCAGCTCATTACAGCATGAGAGCAGTTTATTACTCACTAAA
HowlerMon.......C................................A......G............
Rhesus.......C................................A......G............

3 x 2814 (truncated to 3 x 60) dna alignment

Randomly selecting n sequences from an alignment#

Using random=True and number=3 returns 3 random sequences. An optional argument for a seed can be provided to ensure the same sequences are returned each time the app is called.

from cogent3 import get_app

random_n = get_app("take_n_seqs", random=True, number=3, seed=1)
random_n(aln)
0
ChimpanzeeTGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGTTTATTACTCACTAAA
Rhesus...............................................G............
HowlerMon...............................................G............

3 x 2814 (truncated to 3 x 60) dna alignment

Selecting the same sequences from multiple alignments#

Providing the argument fixed_choice=True ensures the same sequences are returned when (randomly) sampling sequences across several alignments.

from cogent3 import get_app

loader = get_app("load_aligned", moltype="dna")
aln1 = loader("data/primate_brca1.fasta")
aln2 = loader("data/brca1.fasta")

aln1.names
('Galago',
 'HowlerMon',
 'Rhesus',
 'Orangutan',
 'Gorilla',
 'Human',
 'Chimpanzee')
aln2.names
('FlyingFox',
 'DogFaced',
 'FreeTaile',
 'LittleBro',
 'TombBat',
 'RoundEare',
 'FalseVamp',
 'LeafNose',
 'Horse',
 'Rhino',
 'Pangolin',
 'Cat',
 'Dog',
 'Llama',
 'Pig',
 'Cow',
 'Hippo',
 'SpermWhale',
 'HumpbackW',
 'Mole',
 'Hedgehog',
 'TreeShrew',
 'FlyingLem',
 'Galago',
 'HowlerMon',
 'Rhesus',
 'Orangutan',
 'Gorilla',
 'Human',
 'Chimpanzee',
 'Jackrabbit',
 'FlyingSqu',
 'OldWorld',
 'Mouse',
 'Rat',
 'NineBande',
 'HairyArma',
 'Anteater',
 'Sloth',
 'Dugong',
 'Manatee',
 'AfricanEl',
 'AsianElep',
 'RockHyrax',
 'TreeHyrax',
 'Aardvark',
 'GoldenMol',
 'Madagascar',
 'Tenrec',
 'LesserEle',
 'GiantElep',
 'Caenolest',
 'Phascogale',
 'Wombat',
 'Bandicoot')
fixed_choice = get_app("take_n_seqs", number=2, random=True, fixed_choice=True)
result1 = fixed_choice(aln1).names
result2 = fixed_choice(aln2).names
result1 == result2
True