evolocity.tl.onehot_msa¶
-
evolocity.tl.
onehot_msa
(adata, reference=None, seq_id_fields=None, key='onehot', seq_key='seq', backend='mafft', dirname='target/evolocity_alignments', n_threads=1, copy=False)¶ Aligns and one-hot-encodes sequences.
By default, uses the MAFFT aligner (https://mafft.cbrc.jp/alignment/software/), which can be installed via conda using
conda install -c bioconda mafft
- Parameters
- adata :
Anndata
Annoated data matrix.
- reference : int (default: None)
Index corresponding to a sequence in adata to be used as the main reference sequence for the alignment.
- seq_id_fields : list (default: None)
List of fields in adata.obs to store in FASTA IDs.
- key : str (default: ‘onehot’)
Name at which the embedding is stored.
- seq_key : str (default: ‘seq’)
Name of sequences in .obs.
- backend : str (default: None )
Sequence alignment tool.
- dirname : str (default: ‘target/evolocity_alignments’)
Directory under which to place alignment files.
- n_threads : int (default: 1)
Number of threads for sequence alignment.
- copy : bool (default: False)
Return a copy instead of writing to adata.
- adata :
- Returns
Returns or updates adata with the attributes
X_onehot (.obsm) – one-hot embeddings
seqs_msa (.obs) – aligned sequences