Data stores – collections of data records#

The scinexus package provides data store. A data store is a collection of data members of the same type (e.g. all .fasta files in a directory). Data stores allow you to apply an app or composed pipeline to many data records without writing loops.

Using data stores with cogent3#

Use the open_data_store() function to open a data store and a loader app (see app types) to read member data.

from cogent3 import get_app, open_data_store

dstore = open_data_store("data/raw.zip", suffix="fa", mode="r")
print(dstore)
1035x member ReadOnlyDataStoreZipped(source='/home/runner/work/cogent3.github.io/cogent3.github.io/c3org/doc/doc/data/raw.zip', members=[DataMember(data_store=/home/runner/work/cogent3.github.io/cogent3.github.io/c3org/doc/doc/data/raw.zip, unique_id=ENSG00000157184.fa), DataMember(data_store=/home/runner/work/cogent3.github.io/cogent3.github.io/c3org/doc/doc/data/raw.zip, unique_id=ENSG00000131791.fa)]...)
loader = get_app("load_unaligned", moltype="dna")
seqs = loader(dstore[0])
seqs
0
HumanATGGTGCCCCGCCTGCTGCTGCGCGCCTGGCCCCGGGGCCCCGCGGTTGGTCCGGGAGCC
OpossumATGAAGCCGCTGTTGGTGCGCCTGAGGTTCGGGTCCCTTCCGGGGCCGCTGTGGCTGCCG
PlatypusATGAGAAGATACCTGAATGCCCAAAAGCCTCTTTTAGATGACAGCCAATTCAGGAACACA

3 x {min=1791, median=1974.0, max=1998} dna sequence collection

Applying a pipeline to a data store#

out_dstore = open_data_store(path_to_dir, suffix="fa", mode="w")
loader = get_app("load_aligned", moltype="dna", format_name="fasta")
take3 = get_app("take_codon_positions", 3)
writer = get_app("write_seqs", data_store=out_dstore, format_name="fasta")
app = loader + take3 + writer
result = app.apply_to(dstore)
result.describe
describe
ConditionValue
completed41
not_completed994
logs1

3 rows x 2 columns

See the scinexus documentation for full details on data store types, structure, operations, locking, logging, and citations.