openprotein.embeddings#

Create embeddings for your protein sequences using open-source and proprietary models!

Note that for PoET Models, you will also need to utilize our align. workflow.

Endpoints#

class openprotein.embeddings.EmbeddingsAPI[source]#

Embeddings API providing the interface for creating embeddings using protein language models.

You can access all our models either via get_model() or directly through the session’s embedding attribute using the model’s ID and the desired method. For example, to use the attention method on the protein sequence model, you would use session.embedding.prot_seq.attn().

Examples

Accessing a model’s method:

# To call the attention method on the protein sequence model:
import openprotein
session = openprotein.connect(username="user", password="password")
session.embedding.prot_seq.attn()

Using the get_model method:

# Get a model instance by name:
import openprotein
session = openprotein.connect(username="user", password="password")
# list available models:
print(session.embedding.list_models() )
# init model by name
model = session.embedding.get_model('prot-seq')
prot_seq: OpenProteinModel#
rotaprot_large_uniref50w: OpenProteinModel#
rotaprot_large_uniref90_ft: OpenProteinModel#
poet: PoETModel#
poet_2: PoET2Model#
poet2: PoET2Model#
esm1b: ESMModel#
esm1b_t33_650M_UR50S: ESMModel#
esm1v: ESMModel#
esm1v_t33_650M_UR90S_1: ESMModel#
esm1v_t33_650M_UR90S_2: ESMModel#
esm1v_t33_650M_UR90S_3: ESMModel#
esm1v_t33_650M_UR90S_4: ESMModel#
esm1v_t33_650M_UR90S_5: ESMModel#
esm2: ESMModel#
esm2_t12_35M_UR50D: ESMModel#
esm2_t30_150M_UR50D: ESMModel#
esm2_t33_650M_UR50D: ESMModel#
esm2_t36_3B_UR50D: ESMModel#
esm2_t6_8M_UR50D: ESMModel#
__init__(session)[source]#
Parameters:

session (APISession)

list_models()[source]#

list models available for creating embeddings of your sequences

Return type:

list[EmbeddingModel]

get_model(name)[source]#

Get model by model_id.

ProtembedModel allows all the usual job manipulation: e.g. making POST and GET requests for this model specifically.

Parameters:
  • model_id (str) – the model identifier

  • name (str)

Returns:

The model

Return type:

ProtembedModel

Raises:

HTTPError – If the GET request does not succeed.

Models#

class openprotein.embeddings.OpenProteinModel[source]#

Class providing inference endpoints for proprietary protein embedding models served by OpenProtein.

Examples

View specific model details (inc supported tokens) with the ? operator.


>>> import openprotein
>>> session = openprotein.connect(username="user", password="password")
>>> session.embedding.prot_seq?
model_id: list[str] | str = ['prot-seq', 'rotaprot-large-uniref50w', 'rotaprot_large_uniref90_ft']#
__init__(session, model_id, metadata=None)#
Parameters:
  • session (APISession)

  • model_id (str)

  • metadata (ModelMetadata | None)

attn(sequences, **kwargs)#

Compute attention embeddings for sequences using this model.

Parameters:
  • sequences (list of bytes or list of str) – Sequences to compute attention embeddings for.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the attention request.

Returns:

Future object representing the attention result.

Return type:

EmbeddingsResultFuture

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:
  • session (APISession) – The API session to use.

  • model_id (str) – The model identifier.

  • default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

EmbeddingModel

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

embed(sequences, reduction=ReductionType.MEAN, **kwargs)#

Embed sequences using this model.

Parameters:
  • sequences (list of bytes or list of str) – Sequences to embed.

  • reduction (ReductionType or None, optional) – Reduction to use (e.g. mean). Defaults to mean embedding.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the embedding request.

Returns:

Future object representing the embedding result.

Return type:

EmbeddingsResultFuture

fit_gp(assay, properties, reduction, name=None, description=None, **kwargs)#

Fit a Gaussian Process (GP) on an assay using this embedding model and hyperparameters.

Parameters:
  • assay (AssayMetadata, AssayDataset, or str) – Assay to fit GP on.

  • properties (list of str) – Properties in the assay to fit the GP on.

  • reduction (ReductionType) – Type of embedding reduction to use for computing features. PLM must use reduction.

  • name (str or None, optional) – Optional name for the predictor model.

  • description (str or None, optional) – Optional description for the predictor model.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the GP fitting.

Returns:

The fitted predictor model.

Return type:

PredictorModel

Raises:

InvalidParameterError – If no properties are provided, properties are not a subset of assay measurements, or multitask GP is requested.

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:
  • sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD on.

  • assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD on.

  • n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the SVD fitting.

Returns:

The fitted SVD model.

Return type:

SVDModel

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)#

Fit a UMAP on the embedding results of this model.

This function will create a UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:
  • sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.

  • assay (AssayDataset or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.

  • n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the UMAP fitting.

Returns:

The fitted UMAP model.

Return type:

UMAPModel

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

get_metadata()#

Get model metadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:

List of model_id strings associated with this class.

Return type:

list of str

logits(sequences, **kwargs)#

Compute logit embeddings for sequences using this model.

Parameters:
  • sequences (list of bytes or list of str) – Sequences to compute logits for.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the logits request.

Returns:

Future object representing the logits result.

Return type:

EmbeddingsResultFuture

property metadata#

ModelMetadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

class openprotein.embeddings.ESMModel[source]#

Class providing inference endpoints for Facebook’s ESM protein language models.

Examples

View specific model details (inc supported tokens) with the ? operator.


>>> import openprotein
>>> session = openprotein.connect(username="user", password="password")
>>> session.embedding.esm2_t12_35M_UR50D?
__init__(session, model_id, metadata=None)#
Parameters:
  • session (APISession)

  • model_id (str)

  • metadata (ModelMetadata | None)

attn(sequences, **kwargs)#

Compute attention embeddings for sequences using this model.

Parameters:
  • sequences (list of bytes or list of str) – Sequences to compute attention embeddings for.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the attention request.

Returns:

Future object representing the attention result.

Return type:

EmbeddingsResultFuture

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:
  • session (APISession) – The API session to use.

  • model_id (str) – The model identifier.

  • default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

EmbeddingModel

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

embed(sequences, reduction=ReductionType.MEAN, **kwargs)#

Embed sequences using this model.

Parameters:
  • sequences (list of bytes or list of str) – Sequences to embed.

  • reduction (ReductionType or None, optional) – Reduction to use (e.g. mean). Defaults to mean embedding.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the embedding request.

Returns:

Future object representing the embedding result.

Return type:

EmbeddingsResultFuture

fit_gp(assay, properties, reduction, name=None, description=None, **kwargs)#

Fit a Gaussian Process (GP) on an assay using this embedding model and hyperparameters.

Parameters:
  • assay (AssayMetadata, AssayDataset, or str) – Assay to fit GP on.

  • properties (list of str) – Properties in the assay to fit the GP on.

  • reduction (ReductionType) – Type of embedding reduction to use for computing features. PLM must use reduction.

  • name (str or None, optional) – Optional name for the predictor model.

  • description (str or None, optional) – Optional description for the predictor model.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the GP fitting.

Returns:

The fitted predictor model.

Return type:

PredictorModel

Raises:

InvalidParameterError – If no properties are provided, properties are not a subset of assay measurements, or multitask GP is requested.

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#

Fit an SVD on the embedding results of this model.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:
  • sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD on.

  • assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD on.

  • n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the SVD fitting.

Returns:

The fitted SVD model.

Return type:

SVDModel

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)#

Fit a UMAP on the embedding results of this model.

This function will create a UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:
  • sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.

  • assay (AssayDataset or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.

  • n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the UMAP fitting.

Returns:

The fitted UMAP model.

Return type:

UMAPModel

Raises:

InvalidParameterError – If neither or both of assay and sequences are provided.

get_metadata()#

Get model metadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:

List of model_id strings associated with this class.

Return type:

list of str

logits(sequences, **kwargs)#

Compute logit embeddings for sequences using this model.

Parameters:
  • sequences (list of bytes or list of str) – Sequences to compute logits for.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the logits request.

Returns:

Future object representing the logits result.

Return type:

EmbeddingsResultFuture

property metadata#

ModelMetadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

class openprotein.embeddings.PoETModel[source]#

Class for OpenProtein’s foundation model PoET.

Note

PoET functions are dependent on a prompt supplied via the prompt endpoints.

Examples

View specific model details (including supported tokens) with the ? operator.

>>> import openprotein
>>> session = openprotein.connect(username="user", password="password")
>>> session.embedding.poet.<embeddings_method>
__init__(session, model_id, metadata=None)[source]#
Parameters:
  • session (APISession)

  • model_id (str)

  • metadata (ModelMetadata | None)

embed(sequences, prompt=None, reduction=ReductionType.MEAN, **kwargs)[source]#

Embed sequences using the PoET model.

Parameters:
  • sequences (list of bytes) – Sequences to embed.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean). Default is ReductionType.MEAN.

  • **kwargs – Additional keyword arguments.

Returns:

Future object that returns the embeddings of the submitted sequences.

Return type:

EmbeddingsResultFuture

logits(sequences, prompt=None, **kwargs)[source]#

Compute logits for sequences using the PoET model.

Parameters:
  • sequences (list of bytes) – Sequences to analyze.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • **kwargs – Additional keyword arguments.

Returns:

Future object that returns the logits of the submitted sequences.

Return type:

EmbeddingsResultFuture

attn()[source]#

Attention is not available for PoET.

Raises:

ValueError – Always raised, as attention is not supported for PoET.

score(sequences, prompt=None, **kwargs)[source]#

Score query sequences using the specified prompt.

Parameters:
  • sequences (list of bytes) – Sequences to score.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • **kwargs – Additional keyword arguments.

Returns:

Future object that returns the scores of the submitted sequences.

Return type:

EmbeddingsScoreFuture

indel(sequence, prompt=None, insert=None, delete=None, **kwargs)[source]#

Score all indels of the query sequence using the specified prompt.

Parameters:
  • sequence (bytes) – Sequence to analyze.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • insert (str or None, optional) – Insertion fragment at each site.

  • delete (list of int or None, optional) – Range of size of fragment to delete at each site.

  • **kwargs – Additional keyword arguments.

Returns:

Future object that returns the scores of the indel-ed sequence.

Return type:

EmbeddingsScoreFuture

Raises:

ValueError – If neither insert nor delete is provided.

single_site(sequence, prompt=None, **kwargs)[source]#

Score all single substitutions of the query sequence using the specified prompt.

Parameters:
  • sequence (bytes) – Sequence to analyze.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • **kwargs – Additional keyword arguments.

Returns:

Future object that returns the scores of the mutated sequence.

Return type:

EmbeddingsScoreFuture

generate(prompt, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None, **kwargs)[source]#

Generate protein sequences conditioned on a prompt.

Parameters:
  • prompt (str or Prompt) – Prompt from an align workflow to condition the PoET model.

  • num_samples (int, optional) – Number of samples to generate. Default is 100.

  • temperature (float, optional) – Temperature for sampling. Higher values produce more random outputs. Default is 1.0.

  • topk (float or None, optional) – Number of top-k residues to consider during sampling. Default is None.

  • topp (float or None, optional) – Cumulative probability threshold for top-p sampling. Default is None.

  • max_length (int, optional) – Maximum length of generated proteins. Default is 1000.

  • seed (int or None, optional) – Seed for random number generation. Default is None.

  • **kwargs – Additional keyword arguments.

Returns:

Future object representing the status and information about the generation job.

Return type:

EmbeddingsGenerateFuture

fit_svd(prompt=None, sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)[source]#

Fit an SVD on the embedding results of PoET.

This function creates an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:
  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • sequences (list of bytes or list of str or None, optional) – Sequences to use for SVD.

  • assay (AssayDataset or None, optional) – Assay dataset to use for SVD.

  • n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean).

  • **kwargs – Additional keyword arguments.

Returns:

Future that represents the fitted SVD model.

Return type:

SVDModel

fit_umap(prompt=None, sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)[source]#

Fit a UMAP on assay using PoET and hyperparameters.

This function creates a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the arguments.

Parameters:
  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.

  • assay (AssayDataset or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.

  • n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean). Default is ReductionType.MEAN.

  • **kwargs – Additional keyword arguments.

Returns:

Future that represents the fitted UMAP model.

Return type:

UMAPModel

fit_gp(assay, properties, prompt=None, **kwargs)[source]#

Fit a Gaussian Process (GP) on assay using this embedding model and hyperparameters.

Parameters:
  • assay (AssayMetadata or AssayDataset or str) – Assay to fit GP on.

  • properties (list of str) – Properties in the assay to fit the GP on.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • **kwargs – Additional keyword arguments.

Returns:

Future that represents the trained predictor model.

Return type:

PredictorModel

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:
  • session (APISession) – The API session to use.

  • model_id (str) – The model identifier.

  • default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

EmbeddingModel

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

get_metadata()#

Get model metadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:

List of model_id strings associated with this class.

Return type:

list of str

property metadata#

ModelMetadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

class openprotein.embeddings.PoET2Model[source]#

Class for OpenProtein’s foundation model PoET 2.

PoET functions are dependent on a prompt supplied via the prompt endpoints.

Examples

View specific model details (including supported tokens) with the ? operator.

Examples

>>> import openprotein
>>> session = openprotein.connect(username="user", password="password")
>>> session.embedding.poet2.<embeddings_method>
__init__(session, model_id, metadata=None)[source]#
Parameters:
  • session (APISession)

  • model_id (str)

  • metadata (ModelMetadata | None)

embed(sequences, reduction=ReductionType.MEAN, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#

Embed sequences using this model.

Parameters:
  • sequences (list of bytes) – Sequences to embed.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Default is ReductionType.MEAN.

  • prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

  • decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

Returns:

A future object that returns the embeddings of the submitted sequences.

Return type:

EmbeddingsResultFuture

logits(sequences, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#

Compute logit embeddings for sequences using this model.

Parameters:
  • sequences (list of bytes) – Sequences to analyze.

  • prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

  • decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

Returns:

A future object that returns the logits of the submitted sequences.

Return type:

EmbeddingsResultFuture

score(sequences, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#

Score query sequences using the specified prompt.

Parameters:
  • sequences (list of bytes) – Sequences to score.

  • prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

  • decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

Returns:

A future object that returns the scores of the submitted sequences.

Return type:

EmbeddingsScoreFuture

indel(sequence, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None, insert=None, delete=None, **kwargs)[source]#

Score all indels of the query sequence using the specified prompt.

Parameters:
  • sequence (bytes) – Sequence to analyze.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

  • decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

  • insert (str or None, optional) – Insertion fragment at each site.

  • delete (list of int or None, optional) – Range of size of fragment to delete at each site.

  • **kwargs – Additional keyword arguments.

Returns:

A future object that returns the scores of the indel-ed sequence.

Return type:

EmbeddingsScoreFuture

Raises:

ValueError – If neither insert nor delete is provided.

single_site(sequence, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#

Score all single substitutions of the query sequence using the specified prompt.

Parameters:
  • sequence (bytes) – Sequence to analyze.

  • prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

  • decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.

Returns:

A future object that returns the scores of the mutated sequence.

Return type:

EmbeddingsScoreFuture

generate(prompt, query=None, use_query_structure_in_decoder=True, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None, ensemble_weights=None, ensemble_method=None)[source]#

Generate protein sequences conditioned on a prompt.

Parameters:
  • prompt (str or Prompt) – Prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

  • num_samples (int, optional) – The number of samples to generate. Default is 100.

  • temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs. Default is 1.0.

  • topk (float or None, optional) – The number of top-k residues to consider during sampling. Default is None.

  • topp (float or None, optional) – The cumulative probability threshold for top-p sampling. Default is None.

  • max_length (int, optional) – The maximum length of generated proteins. Default is 1000.

  • seed (int or None, optional) – Seed for random number generation. Default is None.

  • ensemble_weights (Sequence of float or None, optional) – Weights for combining likelihoods from multiple prompts in the ensemble. The length of this sequence must match the number of prompts. All weights must be finite. If ensemble_method is “arithmetic”, then weights must also be non-negative, and have a non-zero sum.

  • ensemble_method ({'arithmetic', 'geometric'} or None, optional) – Method used to combine likelihoods from multiple prompts in the ensemble. If “arithmetic”, the weighted mean is used; if “geometric”, the weighted geometric mean is used. If None (default), the method defaults to “arithmetic”, but this behavior may change in the future.

Returns:

A future object representing the status and information about the generation job.

Return type:

EmbeddingsGenerateFuture

fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Fit an SVD on the embedding results of PoET.

This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.

Parameters:
  • sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD. If None, assay must be provided.

  • assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD. Ignored if sequences are provided.

  • n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

Returns:

A future that represents the fitted SVD model.

Return type:

SVDModel

fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#

Fit a UMAP on assay using PoET and hyperparameters.

This function will create a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the arguments.

Parameters:
  • sequences (list of bytes or list of str or None, optional) – Sequences to fit UMAP. If None, assay must be provided.

  • assay (AssayDataset or None, optional) – Assay containing sequences to fit UMAP. Ignored if sequences are provided.

  • n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.

  • reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Default is ReductionType.MEAN.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

Returns:

A future that represents the fitted UMAP model.

Return type:

UMAPModel

fit_gp(assay, properties, prompt=None, query=None, use_query_structure_in_decoder=True, **kwargs)[source]#

Fit a Gaussian Process (GP) on assay using this embedding model and hyperparameters.

Parameters:
  • assay (AssayMetadata or AssayDataset or str) – Assay to fit GP on.

  • properties (list of str) – Properties in the assay to fit the GP on.

  • prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.

  • query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.

  • use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.

  • **kwargs – Additional keyword arguments.

Returns:

A future that represents the trained predictor model.

Return type:

PredictorModel

attn()#

Attention is not available for PoET.

Raises:

ValueError – Always raised, as attention is not supported for PoET.

classmethod create(session, model_id, default=None, **kwargs)#

Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.

Parameters:
  • session (APISession) – The API session to use.

  • model_id (str) – The model identifier.

  • default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.

  • **kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.

Returns:

An instance of the appropriate EmbeddingModel subclass.

Return type:

EmbeddingModel

Raises:

ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.

get_metadata()#

Get model metadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

classmethod get_model()#

Get the model_id(s) for this EmbeddingModel subclass.

Returns:

List of model_id strings associated with this class.

Return type:

list of str

property metadata#

ModelMetadata for this model.

Returns:

The metadata associated with this model.

Return type:

ModelMetadata

Results#

class openprotein.embeddings.EmbeddingsResultFuture[source]#

Future for manipulating results for embeddings-related requests.

__init__(session, job, sequences=None, max_workers=10)[source]#

Retrieve results from asynchronous, mapped endpoints.

Use max_workers > 0 to enable concurrent retrieval of multiple pages.

Parameters:
  • session (APISession)

  • job (EmbeddingsJob | AttnJob | LogitsJob)

  • sequences (list[bytes] | list[str] | None)

  • max_workers (int)

stream()[source]#

Retrieve results for this job as a stream.

get(verbose=False)[source]#

Return the results from this job.

Return type:

list[ndarray]

get_item(sequence)[source]#

Get embedding results for specified sequence.

Parameters:

sequence (bytes) – sequence to fetch results for

Returns:

embeddings

Return type:

np.ndarray

cancelled()#

check if job is cancelled

Return type:

bool

done()#

Check if job is complete

Return type:

bool

refresh()#

Refresh job status.

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

class openprotein.embeddings.EmbeddingsScoreFuture[source]#

Future for manipulating results for embeddings score-related requests.

__init__(session, job, sequences=None)[source]#
Parameters:
  • session (APISession)

  • job (ScoreJob | ScoreSingleSiteJob | GenerateJob)

  • sequences (list[bytes] | list[str] | None)

stream()[source]#

Return the results from this job as a generator.

Return type:

Generator

cancelled()#

check if job is cancelled

Return type:

bool

done()#

Check if job is complete

Return type:

bool

get(verbose=False, **kwargs)#

Return the results from this job.

Parameters:

verbose (bool)

Return type:

list

refresh()#

Refresh job status.

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

class openprotein.embeddings.EmbeddingsGenerateFuture[source]#

Future for manipulating results for embeddings generate-related requests.

__init__(session, job, sequences=None)#
Parameters:
  • session (APISession)

  • job (ScoreJob | ScoreSingleSiteJob | GenerateJob)

  • sequences (list[bytes] | list[str] | None)

cancelled()#

check if job is cancelled

Return type:

bool

done()#

Check if job is complete

Return type:

bool

get(verbose=False, **kwargs)#

Return the results from this job.

Parameters:

verbose (bool)

Return type:

list

refresh()#

Refresh job status.

stream()#

Return the results from this job as a generator.

Return type:

Generator

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results