evolocity.tl.onehot_msa

evolocity.tl.onehot_msa(adata, reference=None, seq_id_fields=None, key='onehot', seq_key='seq', backend='mafft', dirname='target/evolocity_alignments', n_threads=1, copy=False)

Aligns and one-hot-encodes sequences.

By default, uses the MAFFT aligner (https://mafft.cbrc.jp/alignment/software/), which can be installed via conda using

conda install -c bioconda mafft
Parameters
adata : Anndata

Annoated data matrix.

reference : int (default: None)

Index corresponding to a sequence in adata to be used as the main reference sequence for the alignment.

seq_id_fields : list (default: None)

List of fields in adata.obs to store in FASTA IDs.

key : str (default: ‘onehot’)

Name at which the embedding is stored.

seq_key : str (default: ‘seq’)

Name of sequences in .obs.

backend : str (default: None )

Sequence alignment tool.

dirname : str (default: ‘target/evolocity_alignments’)

Directory under which to place alignment files.

n_threads : int (default: 1)

Number of threads for sequence alignment.

copy : bool (default: False)

Return a copy instead of writing to adata.

Returns

  • Returns or updates adata with the attributes

  • X_onehot (.obsm) – one-hot embeddings

  • seqs_msa (.obs) – aligned sequences