Data stores – collections of data records#
The scinexus package provides data store. A data store is a collection of data members of the same type (e.g. all .fasta files in a directory). Data stores allow you to apply an app or composed pipeline to many data records without writing loops.
Using data stores with cogent3#
Use the open_data_store() function to open a data store and a loader app (see app types) to read member data.
from cogent3 import get_app, open_data_store
dstore = open_data_store("data/raw.zip", suffix="fa", mode="r")
print(dstore)
1035x member ReadOnlyDataStoreZipped(source='/home/runner/work/cogent3.github.io/cogent3.github.io/c3org/doc/doc/data/raw.zip', members=[DataMember(data_store=/home/runner/work/cogent3.github.io/cogent3.github.io/c3org/doc/doc/data/raw.zip, unique_id=ENSG00000157184.fa), DataMember(data_store=/home/runner/work/cogent3.github.io/cogent3.github.io/c3org/doc/doc/data/raw.zip, unique_id=ENSG00000131791.fa)]...)
loader = get_app("load_unaligned", moltype="dna")
seqs = loader(dstore[0])
seqs
| 0 | |
| Human | ATGGTGCCCCGCCTGCTGCTGCGCGCCTGGCCCCGGGGCCCCGCGGTTGGTCCGGGAGCC |
| Opossum | ATGAAGCCGCTGTTGGTGCGCCTGAGGTTCGGGTCCCTTCCGGGGCCGCTGTGGCTGCCG |
| Platypus | ATGAGAAGATACCTGAATGCCCAAAAGCCTCTTTTAGATGACAGCCAATTCAGGAACACA |
3 x {min=1791, median=1974.0, max=1998} dna sequence collection
Applying a pipeline to a data store#
out_dstore = open_data_store(path_to_dir, suffix="fa", mode="w")
loader = get_app("load_aligned", moltype="dna", format_name="fasta")
take3 = get_app("take_codon_positions", 3)
writer = get_app("write_seqs", data_store=out_dstore, format_name="fasta")
app = loader + take3 + writer
result = app.apply_to(dstore)
result.describe
| Condition | Value |
|---|---|
| completed | 41 |
| not_completed | 994 |
| logs | 1 |
3 rows x 2 columns
See the scinexus documentation for full details on data store types, structure, operations, locking, logging, and citations.