openprotein.embeddings #

logits(sequences, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#

Compute logit embeddings for sequences using this model.

Parameters:

sequences (list of bytes) – Sequences to analyze.
prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

Returns:

A future object that returns the logits of the submitted sequences.

Return type:

score(sequences, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#

Score query sequences using the specified prompt.

Parameters:

sequences (list of bytes) – Sequences to score.
prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

Returns:

A future object that returns the scores of the submitted sequences.

Return type:

indel(sequence, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None, insert=None, delete=None, **kwargs)[source]#

Score all indels of the query sequence using the specified prompt.

Parameters:

sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.
insert (str or None, optional) – Insertion fragment at each site.
delete (list of int or None, optional) – Range of size of fragment to delete at each site.
**kwargs – Additional keyword arguments.

Returns:

A future object that returns the scores of the indel-ed sequence.

Return type:

Raises:

ValueError – If neither insert nor delete is provided.

single_site(sequence, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#

Score all single substitutions of the query sequence using the specified prompt.

Parameters:

sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

Returns:

A future object that returns the scores of the mutated sequence.

Return type:

generate(prompt, query=None, use_query_structure_in_decoder=True, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None, ensemble_weights=None, ensemble_method=None)[source]#

Generate protein sequences conditioned on a prompt.

Parameters:

prompt (str or Prompt) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
num_samples (int, optional) – The number of samples to generate. Default is 100.
temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs. Default is 1.0.
topk (float or None, optional) – The number of top-k residues to consider during sampling. Default is None.
topp (float or None, optional) – The cumulative probability threshold for top-p sampling. Default is None.
max_length (int, optional) – The maximum length of generated proteins. Default is 1000.
seed (int or None, optional) – Seed for random number generation. Default is None.
ensemble_weights (Sequence of float or None, optional) – Weights for combining likelihoods from multiple prompts in the ensemble. The length of this sequence must match the number of prompts. All weights must be finite. If ensemble_method is “arithmetic”, then weights must also be non-negative, and have a non-zero sum.
ensemble_method ({'arithmetic', 'geometric'} or None, optional) – Method used to combine likelihoods from multiple prompts in the ensemble. If “arithmetic”, the weighted mean is used; if “geometric”, the weighted geometric mean is used. If None (default), the method defaults to “arithmetic”, but this behavior may change in the future.

Returns:

A future object representing the status and information about the generation job.

Return type:

EmbeddingsGenerateFuture

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Fit an SVD on the embedding results of PoET.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:

sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD. If None, assay must be provided.
assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD. Ignored if sequences are provided.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

Returns:

A future that represents the fitted SVD model.

Return type:

fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Fit a UMAP on assay using PoET and hyperparameters.

This function will create a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the arguments.

Parameters:

sequences (list of bytes or list of str or None, optional) – Sequences to fit UMAP. If None, assay must be provided.
assay (AssayDataset or None, optional) – Assay containing sequences to fit UMAP. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Default is ReductionType.MEAN.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

Returns:

A future that represents the fitted UMAP model.

Return type:

fit_gp(assay, properties, prompt=None, query=None, use_query_structure_in_decoder=True, **kwargs)[source]#

Fit a Gaussian Process (GP) on assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata or AssayDataset or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
**kwargs – Additional keyword arguments.

Returns:

A future that represents the trained predictor model.

Return type:

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:

session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type variable of EmbeddingModel or None, optional) – Default EmbeddingModel subclass to use if no match is found.
kwargs – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

get_metadata()#

Get model metadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:: List of model_id strings associated with this class.
Return type:: list of str

property metadata#

ModelMetadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

class openprotein.embeddings.PoETModel(session, model_id, metadata=None)[source]#

Class for OpenProtein’s foundation model PoET.

Note

PoET functions are dependent on a prompt supplied via the prompt endpoints.

Examples

View specific model details (including supported tokens) with the ? operator.

>>> import openprotein
>>> session = openprotein.connect(username="user", password="password")
>>> session.embedding.poet.<embeddings_method>

Parameters:

session (APISession)
model_id (list[str] | str)
metadata (ModelMetadata | None)

embed(sequences, prompt=None, reduction=ReductionType.MEAN, **kwargs)[source]#

Embed sequences using the PoET model.

Parameters:

sequences (list of bytes) – Sequences to embed.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean). Default is ReductionType.MEAN.
**kwargs – Additional keyword arguments.

Returns:

Future object that returns the embeddings of the submitted sequences.

Return type:

logits(sequences, prompt=None, **kwargs)[source]#

Compute logits for sequences using the PoET model.

Parameters:

sequences (list of bytes) – Sequences to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.

Returns:

Future object that returns the logits of the submitted sequences.

Return type:

score(sequences, prompt=None, **kwargs)[source]#

Score query sequences using the specified prompt.

Parameters:

sequences (list of bytes) – Sequences to score.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.

Returns:

Future object that returns the scores of the submitted sequences.

Return type:

indel(sequence, prompt=None, insert=None, delete=None, **kwargs)[source]#

Score all indels of the query sequence using the specified prompt.

Parameters:

sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
insert (str or None, optional) – Insertion fragment at each site.
delete (list of int or None, optional) – Range of size of fragment to delete at each site.
**kwargs – Additional keyword arguments.

Returns:

Future object that returns the scores of the indel-ed sequence.

Return type:

Raises:

ValueError – If neither insert nor delete is provided.

single_site(sequence, prompt=None, **kwargs)[source]#

Score all single substitutions of the query sequence using the specified prompt.

Parameters:

sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.

Returns:

Future object that returns the scores of the mutated sequence.

Return type:

generate(prompt, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None, **kwargs)[source]#

Generate protein sequences conditioned on a prompt.

Parameters:

prompt (str or Prompt) – Prompt from an align workflow to condition the PoET model.
num_samples (int, optional) – Number of samples to generate. Default is 100.
temperature (float, optional) – Temperature for sampling. Higher values produce more random outputs. Default is 1.0.
topk (float or None, optional) – Number of top-k residues to consider during sampling. Default is None.
topp (float or None, optional) – Cumulative probability threshold for top-p sampling. Default is None.
max_length (int, optional) – Maximum length of generated proteins. Default is 1000.
seed (int or None, optional) – Seed for random number generation. Default is None.
**kwargs – Additional keyword arguments.

Returns:

Future object representing the status and information about the generation job.

Return type:

EmbeddingsGenerateFuture

fit_svd(prompt=None, sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)[source]#

Fit an SVD on the embedding results of PoET.

This function creates an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:

prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
sequences (list of bytes or list of str or None, optional) – Sequences to use for SVD.
assay (AssayDataset or None, optional) – Assay dataset to use for SVD.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean).
**kwargs – Additional keyword arguments.

Returns:

Future that represents the fitted SVD model.

Return type:

fit_umap(prompt=None, sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)[source]#

Fit a UMAP on assay using PoET and hyperparameters.

This function creates a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the arguments.

Parameters:

prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.
assay (AssayDataset or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean). Default is ReductionType.MEAN.
**kwargs – Additional keyword arguments.

Returns:

Future that represents the fitted UMAP model.

Return type:

fit_gp(assay, properties, prompt=None, **kwargs)[source]#

Fit a Gaussian Process (GP) on assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata or AssayDataset or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.

Returns:

Future that represents the trained predictor model.

Return type:

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:

session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type variable of EmbeddingModel or None, optional) – Default EmbeddingModel subclass to use if no match is found.
kwargs – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

get_metadata()#

Get model metadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:: List of model_id strings associated with this class.
Return type:: list of str

property metadata#

ModelMetadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

class openprotein.embeddings.OpenProteinModel(session, model_id, metadata=None)[source]#

Proprietary protein embedding models served by OpenProtein.

Examples

View specific model details (inc supported tokens) with the ? operator.

>>> import openprotein
>>> session = openprotein.connect(username="user", password="password")
>>> session.embedding.prot_seq?

Parameters:

session (APISession)
model_id (list[str] | str)
metadata (ModelMetadata | None)

attn(sequences, **kwargs)#

Compute attention embeddings for sequences using this model.

Parameters:

sequences (list of bytes or list of str) – Sequences to compute attention embeddings for.
kwargs – Additional keyword arguments to be used from foundational models.

Returns:

Future object representing the attention result.

Return type:

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:

session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type variable of EmbeddingModel or None, optional) – Default EmbeddingModel subclass to use if no match is found.
kwargs – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

embed(sequences, reduction=ReductionType.MEAN, **kwargs)#

Embed sequences using this model.

Parameters:

sequences (list of bytes or list of str) – Sequences to embed.
reduction (ReductionType or None, optional) – Reduction to use (e.g. mean). Defaults to mean embedding.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

Future object representing the embedding result.

Return type:

fit_gp(assay, properties, reduction, name=None, description=None, **kwargs)#

Fit a Gaussian Process (GP) on an assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata, AssayDataset, or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
reduction (ReductionType) – Type of embedding reduction to use for computing features. PLM must use reduction.
name (str or None, optional) – Optional name for the predictor model.
description (str or None, optional) – Optional description for the predictor model.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

The fitted predictor model.

Return type:

Raises:

InvalidParameterError – If no properties are provided, properties are not a subset of assay measurements, or multitask GP is requested.

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:

sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD on.
assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD on.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

The fitted SVD model.

Return type:

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

fit_umap(sequences=None, assay=None, n_components=2, reduction='MEAN', **kwargs)#

Fit a UMAP on the embedding results of this model.

This function will create a UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:

sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.
assay (AssayDataset or AssayMetadata or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (Reduction or ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

The fitted UMAP model.

Return type:

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

get_metadata()#

Get model metadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:: List of model_id strings associated with this class.
Return type:: list of str

logits(sequences, **kwargs)#

Compute logit embeddings for sequences using this model.

Parameters:

sequences (list of bytes or list of str) – Sequences to compute logits for.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

Future object representing the logits result.

Return type:

property metadata#

ModelMetadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

class openprotein.embeddings.ESMModel(session, model_id, metadata=None)[source]#

Class providing inference endpoints for Facebook’s ESM protein language models.

Examples

View specific model details (inc supported tokens) with the ? operator.

>>> import openprotein
>>> session = openprotein.connect(username="user", password="password")
>>> session.embedding.esm2_t12_35M_UR50D?

Parameters:

session (APISession)
model_id (list[str] | str)
metadata (ModelMetadata | None)

attn(sequences, **kwargs)#

Compute attention embeddings for sequences using this model.

Parameters:

sequences (list of bytes or list of str) – Sequences to compute attention embeddings for.
kwargs – Additional keyword arguments to be used from foundational models.

Returns:

Future object representing the attention result.

Return type:

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:

session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type variable of EmbeddingModel or None, optional) – Default EmbeddingModel subclass to use if no match is found.
kwargs – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

embed(sequences, reduction=ReductionType.MEAN, **kwargs)#

Embed sequences using this model.

Parameters:

sequences (list of bytes or list of str) – Sequences to embed.
reduction (ReductionType or None, optional) – Reduction to use (e.g. mean). Defaults to mean embedding.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

Future object representing the embedding result.

Return type:

fit_gp(assay, properties, reduction, name=None, description=None, **kwargs)#

Fit a Gaussian Process (GP) on an assay using this embedding model and hyperparameters.

Parameters:

assay (AssayMetadata, AssayDataset, or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
reduction (ReductionType) – Type of embedding reduction to use for computing features. PLM must use reduction.
name (str or None, optional) – Optional name for the predictor model.
description (str or None, optional) – Optional description for the predictor model.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

The fitted predictor model.

Return type:

Raises:

InvalidParameterError – If no properties are provided, properties are not a subset of assay measurements, or multitask GP is requested.

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:

sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD on.
assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD on.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

The fitted SVD model.

Return type:

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

fit_umap(sequences=None, assay=None, n_components=2, reduction='MEAN', **kwargs)#

Fit a UMAP on the embedding results of this model.

This function will create a UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:

sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.
assay (AssayDataset or AssayMetadata or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (Reduction or ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

The fitted UMAP model.

Return type:

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

get_metadata()#

Get model metadata for this model.

Returns:: The metadata associated with this model.
Return type:: ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:: List of model_id strings associated with this class.
Return type:: list of str

logits(sequences, **kwargs)#

Compute logit embeddings for sequences using this model.

Parameters:

sequences (list of bytes or list of str) – Sequences to compute logits for.
kwargs – Additional keyword arguments to be used from foundational models, e.g. prompt_id for PoET models.

Returns:

Future object representing the logits result.

Return type: