openprotein.embeddings#
Create embeddings for your protein sequences using open-source and proprietary models!
Note that for PoET Models, you will also need to utilize our align. workflow.
Endpoints#
- class openprotein.embeddings.EmbeddingsAPI[source]#
Embeddings API providing the interface for creating embeddings using protein language models.
You can access all our models either via
get_model()
or directly through the session’s embedding attribute using the model’s ID and the desired method. For example, to use the attention method on the protein sequence model, you would usesession.embedding.prot_seq.attn()
.Examples
Accessing a model’s method:
# To call the attention method on the protein sequence model: import openprotein session = openprotein.connect(username="user", password="password") session.embedding.prot_seq.attn()
Using the get_model method:
# Get a model instance by name: import openprotein session = openprotein.connect(username="user", password="password") # list available models: print(session.embedding.list_models() ) # init model by name model = session.embedding.get_model('prot-seq')
- prot_seq: OpenProteinModel#
- rotaprot_large_uniref50w: OpenProteinModel#
- rotaprot_large_uniref90_ft: OpenProteinModel#
- poet_2: PoET2Model#
- poet2: PoET2Model#
- __init__(session)[source]#
- Parameters:
session (APISession)
- list_models()[source]#
list models available for creating embeddings of your sequences
- Return type:
list[EmbeddingModel]
- get_model(name)[source]#
Get model by model_id.
ProtembedModel allows all the usual job manipulation: e.g. making POST and GET requests for this model specifically.
- Parameters:
model_id (str) – the model identifier
name (str)
- Returns:
The model
- Return type:
ProtembedModel
- Raises:
HTTPError – If the GET request does not succeed.
Models#
- class openprotein.embeddings.OpenProteinModel[source]#
Class providing inference endpoints for proprietary protein embedding models served by OpenProtein.
Examples
View specific model details (inc supported tokens) with the ? operator.
>>> import openprotein >>> session = openprotein.connect(username="user", password="password") >>> session.embedding.prot_seq?
- model_id: list[str] | str = ['prot-seq', 'rotaprot-large-uniref50w', 'rotaprot_large_uniref90_ft']#
- __init__(session, model_id, metadata=None)#
- Parameters:
session (APISession)
model_id (str)
metadata (ModelMetadata | None)
- attn(sequences, **kwargs)#
Compute attention embeddings for sequences using this model.
- Parameters:
sequences (list of bytes or list of str) – Sequences to compute attention embeddings for.
**kwargs (dict, optional) – Additional keyword arguments to pass to the attention request.
- Returns:
Future object representing the attention result.
- Return type:
- classmethod create(session, model_id, default=None, **kwargs)#
Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.
- Parameters:
session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.
**kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.
- Returns:
An instance of the appropriate EmbeddingModel subclass.
- Return type:
EmbeddingModel
- Raises:
ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.
- embed(sequences, reduction=ReductionType.MEAN, **kwargs)#
Embed sequences using this model.
- Parameters:
sequences (list of bytes or list of str) – Sequences to embed.
reduction (ReductionType or None, optional) – Reduction to use (e.g. mean). Defaults to mean embedding.
**kwargs (dict, optional) – Additional keyword arguments to pass to the embedding request.
- Returns:
Future object representing the embedding result.
- Return type:
- fit_gp(assay, properties, reduction, name=None, description=None, **kwargs)#
Fit a Gaussian Process (GP) on an assay using this embedding model and hyperparameters.
- Parameters:
assay (AssayMetadata, AssayDataset, or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
reduction (ReductionType) – Type of embedding reduction to use for computing features. PLM must use reduction.
name (str or None, optional) – Optional name for the predictor model.
description (str or None, optional) – Optional description for the predictor model.
**kwargs (dict, optional) – Additional keyword arguments to pass to the GP fitting.
- Returns:
The fitted predictor model.
- Return type:
- Raises:
InvalidParameterError – If no properties are provided, properties are not a subset of assay measurements, or multitask GP is requested.
- fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#
Fit an SVD on the embedding results of this model.
This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.
- Parameters:
sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD on.
assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD on.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).
**kwargs (dict, optional) – Additional keyword arguments to pass to the SVD fitting.
- Returns:
The fitted SVD model.
- Return type:
- Raises:
InvalidParameterError – If neither or both of assay and sequences are provided.
- fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)#
Fit a UMAP on the embedding results of this model.
This function will create a UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.
- Parameters:
sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.
assay (AssayDataset or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.
**kwargs (dict, optional) – Additional keyword arguments to pass to the UMAP fitting.
- Returns:
The fitted UMAP model.
- Return type:
- Raises:
InvalidParameterError – If neither or both of assay and sequences are provided.
- get_metadata()#
Get model metadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- classmethod get_model()#
Get the model_id(s) for this EmbeddingModel subclass.
- Returns:
List of model_id strings associated with this class.
- Return type:
list of str
- logits(sequences, **kwargs)#
Compute logit embeddings for sequences using this model.
- Parameters:
sequences (list of bytes or list of str) – Sequences to compute logits for.
**kwargs (dict, optional) – Additional keyword arguments to pass to the logits request.
- Returns:
Future object representing the logits result.
- Return type:
- property metadata#
ModelMetadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- class openprotein.embeddings.ESMModel[source]#
Class providing inference endpoints for Facebook’s ESM protein language models.
Examples
View specific model details (inc supported tokens) with the ? operator.
>>> import openprotein >>> session = openprotein.connect(username="user", password="password") >>> session.embedding.esm2_t12_35M_UR50D?
- __init__(session, model_id, metadata=None)#
- Parameters:
session (APISession)
model_id (str)
metadata (ModelMetadata | None)
- attn(sequences, **kwargs)#
Compute attention embeddings for sequences using this model.
- Parameters:
sequences (list of bytes or list of str) – Sequences to compute attention embeddings for.
**kwargs (dict, optional) – Additional keyword arguments to pass to the attention request.
- Returns:
Future object representing the attention result.
- Return type:
- classmethod create(session, model_id, default=None, **kwargs)#
Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.
- Parameters:
session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.
**kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.
- Returns:
An instance of the appropriate EmbeddingModel subclass.
- Return type:
EmbeddingModel
- Raises:
ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.
- embed(sequences, reduction=ReductionType.MEAN, **kwargs)#
Embed sequences using this model.
- Parameters:
sequences (list of bytes or list of str) – Sequences to embed.
reduction (ReductionType or None, optional) – Reduction to use (e.g. mean). Defaults to mean embedding.
**kwargs (dict, optional) – Additional keyword arguments to pass to the embedding request.
- Returns:
Future object representing the embedding result.
- Return type:
- fit_gp(assay, properties, reduction, name=None, description=None, **kwargs)#
Fit a Gaussian Process (GP) on an assay using this embedding model and hyperparameters.
- Parameters:
assay (AssayMetadata, AssayDataset, or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
reduction (ReductionType) – Type of embedding reduction to use for computing features. PLM must use reduction.
name (str or None, optional) – Optional name for the predictor model.
description (str or None, optional) – Optional description for the predictor model.
**kwargs (dict, optional) – Additional keyword arguments to pass to the GP fitting.
- Returns:
The fitted predictor model.
- Return type:
- Raises:
InvalidParameterError – If no properties are provided, properties are not a subset of assay measurements, or multitask GP is requested.
- fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)#
Fit an SVD on the embedding results of this model.
This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.
- Parameters:
sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD on.
assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD on.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).
**kwargs (dict, optional) – Additional keyword arguments to pass to the SVD fitting.
- Returns:
The fitted SVD model.
- Return type:
- Raises:
InvalidParameterError – If neither or both of assay and sequences are provided.
- fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)#
Fit a UMAP on the embedding results of this model.
This function will create a UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.
- Parameters:
sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.
assay (AssayDataset or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Defaults to MEAN.
**kwargs (dict, optional) – Additional keyword arguments to pass to the UMAP fitting.
- Returns:
The fitted UMAP model.
- Return type:
- Raises:
InvalidParameterError – If neither or both of assay and sequences are provided.
- get_metadata()#
Get model metadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- classmethod get_model()#
Get the model_id(s) for this EmbeddingModel subclass.
- Returns:
List of model_id strings associated with this class.
- Return type:
list of str
- logits(sequences, **kwargs)#
Compute logit embeddings for sequences using this model.
- Parameters:
sequences (list of bytes or list of str) – Sequences to compute logits for.
**kwargs (dict, optional) – Additional keyword arguments to pass to the logits request.
- Returns:
Future object representing the logits result.
- Return type:
- property metadata#
ModelMetadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- class openprotein.embeddings.PoETModel[source]#
Class for OpenProtein’s foundation model PoET.
Note
PoET functions are dependent on a prompt supplied via the prompt endpoints.
Examples
View specific model details (including supported tokens) with the ? operator.
>>> import openprotein >>> session = openprotein.connect(username="user", password="password") >>> session.embedding.poet.<embeddings_method>
- __init__(session, model_id, metadata=None)[source]#
- Parameters:
session (APISession)
model_id (str)
metadata (ModelMetadata | None)
- embed(sequences, prompt=None, reduction=ReductionType.MEAN, **kwargs)[source]#
Embed sequences using the PoET model.
- Parameters:
sequences (list of bytes) – Sequences to embed.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean). Default is ReductionType.MEAN.
**kwargs – Additional keyword arguments.
- Returns:
Future object that returns the embeddings of the submitted sequences.
- Return type:
- logits(sequences, prompt=None, **kwargs)[source]#
Compute logits for sequences using the PoET model.
- Parameters:
sequences (list of bytes) – Sequences to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.
- Returns:
Future object that returns the logits of the submitted sequences.
- Return type:
- attn()[source]#
Attention is not available for PoET.
- Raises:
ValueError – Always raised, as attention is not supported for PoET.
- score(sequences, prompt=None, **kwargs)[source]#
Score query sequences using the specified prompt.
- Parameters:
sequences (list of bytes) – Sequences to score.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.
- Returns:
Future object that returns the scores of the submitted sequences.
- Return type:
- indel(sequence, prompt=None, insert=None, delete=None, **kwargs)[source]#
Score all indels of the query sequence using the specified prompt.
- Parameters:
sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
insert (str or None, optional) – Insertion fragment at each site.
delete (list of int or None, optional) – Range of size of fragment to delete at each site.
**kwargs – Additional keyword arguments.
- Returns:
Future object that returns the scores of the indel-ed sequence.
- Return type:
- Raises:
ValueError – If neither insert nor delete is provided.
- single_site(sequence, prompt=None, **kwargs)[source]#
Score all single substitutions of the query sequence using the specified prompt.
- Parameters:
sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.
- Returns:
Future object that returns the scores of the mutated sequence.
- Return type:
- generate(prompt, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None, **kwargs)[source]#
Generate protein sequences conditioned on a prompt.
- Parameters:
prompt (str or Prompt) – Prompt from an align workflow to condition the PoET model.
num_samples (int, optional) – Number of samples to generate. Default is 100.
temperature (float, optional) – Temperature for sampling. Higher values produce more random outputs. Default is 1.0.
topk (float or None, optional) – Number of top-k residues to consider during sampling. Default is None.
topp (float or None, optional) – Cumulative probability threshold for top-p sampling. Default is None.
max_length (int, optional) – Maximum length of generated proteins. Default is 1000.
seed (int or None, optional) – Seed for random number generation. Default is None.
**kwargs – Additional keyword arguments.
- Returns:
Future object representing the status and information about the generation job.
- Return type:
- fit_svd(prompt=None, sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)[source]#
Fit an SVD on the embedding results of PoET.
This function creates an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.
- Parameters:
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
sequences (list of bytes or list of str or None, optional) – Sequences to use for SVD.
assay (AssayDataset or None, optional) – Assay dataset to use for SVD.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean).
**kwargs – Additional keyword arguments.
- Returns:
Future that represents the fitted SVD model.
- Return type:
- fit_umap(prompt=None, sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, **kwargs)[source]#
Fit a UMAP on assay using PoET and hyperparameters.
This function creates a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the arguments.
- Parameters:
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
sequences (list of bytes or list of str or None, optional) – Optional sequences to fit UMAP with. Either use sequences or assay. Sequences is preferred.
assay (AssayDataset or None, optional) – Optional assay containing sequences to fit UMAP with. Either use sequences or assay. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g., mean). Default is ReductionType.MEAN.
**kwargs – Additional keyword arguments.
- Returns:
Future that represents the fitted UMAP model.
- Return type:
- fit_gp(assay, properties, prompt=None, **kwargs)[source]#
Fit a Gaussian Process (GP) on assay using this embedding model and hyperparameters.
- Parameters:
assay (AssayMetadata or AssayDataset or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
**kwargs – Additional keyword arguments.
- Returns:
Future that represents the trained predictor model.
- Return type:
- classmethod create(session, model_id, default=None, **kwargs)#
Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.
- Parameters:
session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.
**kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.
- Returns:
An instance of the appropriate EmbeddingModel subclass.
- Return type:
EmbeddingModel
- Raises:
ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.
- get_metadata()#
Get model metadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- classmethod get_model()#
Get the model_id(s) for this EmbeddingModel subclass.
- Returns:
List of model_id strings associated with this class.
- Return type:
list of str
- property metadata#
ModelMetadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- class openprotein.embeddings.PoET2Model[source]#
Class for OpenProtein’s foundation model PoET 2.
PoET functions are dependent on a prompt supplied via the prompt endpoints.
Examples
View specific model details (including supported tokens) with the ? operator.
Examples
>>> import openprotein >>> session = openprotein.connect(username="user", password="password") >>> session.embedding.poet2.<embeddings_method>
- __init__(session, model_id, metadata=None)[source]#
- Parameters:
session (APISession)
model_id (str)
metadata (ModelMetadata | None)
- embed(sequences, reduction=ReductionType.MEAN, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#
Embed sequences using this model.
- Parameters:
sequences (list of bytes) – Sequences to embed.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Default is ReductionType.MEAN.
prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.
- Returns:
A future object that returns the embeddings of the submitted sequences.
- Return type:
- logits(sequences, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#
Compute logit embeddings for sequences using this model.
- Parameters:
sequences (list of bytes) – Sequences to analyze.
prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.
- Returns:
A future object that returns the logits of the submitted sequences.
- Return type:
- score(sequences, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#
Score query sequences using the specified prompt.
- Parameters:
sequences (list of bytes) – Sequences to score.
prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.
- Returns:
A future object that returns the scores of the submitted sequences.
- Return type:
- indel(sequence, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None, insert=None, delete=None, **kwargs)[source]#
Score all indels of the query sequence using the specified prompt.
- Parameters:
sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition the PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.
insert (str or None, optional) – Insertion fragment at each site.
delete (list of int or None, optional) – Range of size of fragment to delete at each site.
**kwargs – Additional keyword arguments.
- Returns:
A future object that returns the scores of the indel-ed sequence.
- Return type:
- Raises:
ValueError – If neither insert nor delete is provided.
- single_site(sequence, prompt=None, query=None, use_query_structure_in_decoder=True, decoder_type=None)[source]#
Score all single substitutions of the query sequence using the specified prompt.
- Parameters:
sequence (bytes) – Sequence to analyze.
prompt (str or Prompt or None, optional) – Prompt or prompt_id or prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
decoder_type ({'mlm', 'clm'} or None, optional) – Decoder type. Default is None.
- Returns:
A future object that returns the scores of the mutated sequence.
- Return type:
- generate(prompt, query=None, use_query_structure_in_decoder=True, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None, ensemble_weights=None, ensemble_method=None)[source]#
Generate protein sequences conditioned on a prompt.
- Parameters:
prompt (str or Prompt) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
num_samples (int, optional) – The number of samples to generate. Default is 100.
temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs. Default is 1.0.
topk (float or None, optional) – The number of top-k residues to consider during sampling. Default is None.
topp (float or None, optional) – The cumulative probability threshold for top-p sampling. Default is None.
max_length (int, optional) – The maximum length of generated proteins. Default is 1000.
seed (int or None, optional) – Seed for random number generation. Default is None.
ensemble_weights (Sequence of float or None, optional) – Weights for combining likelihoods from multiple prompts in the ensemble. The length of this sequence must match the number of prompts. All weights must be finite. If ensemble_method is “arithmetic”, then weights must also be non-negative, and have a non-zero sum.
ensemble_method ({'arithmetic', 'geometric'} or None, optional) – Method used to combine likelihoods from multiple prompts in the ensemble. If “arithmetic”, the weighted mean is used; if “geometric”, the weighted geometric mean is used. If None (default), the method defaults to “arithmetic”, but this behavior may change in the future.
- Returns:
A future object representing the status and information about the generation job.
- Return type:
- fit_svd(sequences=None, assay=None, n_components=1024, reduction=None, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#
Fit an SVD on the embedding results of PoET.
This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the arguments.
- Parameters:
sequences (list of bytes or list of str or None, optional) – Sequences to fit SVD. If None, assay must be provided.
assay (AssayDataset or None, optional) – Assay containing sequences to fit SVD. Ignored if sequences are provided.
n_components (int, optional) – Number of components in SVD. Determines output shapes. Default is 1024.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean).
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
- Returns:
A future that represents the fitted SVD model.
- Return type:
- fit_umap(sequences=None, assay=None, n_components=2, reduction=ReductionType.MEAN, prompt=None, query=None, use_query_structure_in_decoder=True)[source]#
Fit a UMAP on assay using PoET and hyperparameters.
This function will create a UMAP based on the embeddings from this PoET model as well as the hyperparameters specified in the arguments.
- Parameters:
sequences (list of bytes or list of str or None, optional) – Sequences to fit UMAP. If None, assay must be provided.
assay (AssayDataset or None, optional) – Assay containing sequences to fit UMAP. Ignored if sequences are provided.
n_components (int, optional) – Number of components in UMAP fit. Determines output shapes. Default is 2.
reduction (ReductionType or None, optional) – Embeddings reduction to use (e.g. mean). Default is ReductionType.MEAN.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
- Returns:
A future that represents the fitted UMAP model.
- Return type:
- fit_gp(assay, properties, prompt=None, query=None, use_query_structure_in_decoder=True, **kwargs)[source]#
Fit a Gaussian Process (GP) on assay using this embedding model and hyperparameters.
- Parameters:
assay (AssayMetadata or AssayDataset or str) – Assay to fit GP on.
properties (list of str) – Properties in the assay to fit the GP on.
prompt (str or Prompt or None, optional) – Prompt from an align workflow to condition PoET model.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
use_query_structure_in_decoder (bool, optional) – Whether to use query structure in decoder. Default is True.
**kwargs – Additional keyword arguments.
- Returns:
A future that represents the trained predictor model.
- Return type:
- attn()#
Attention is not available for PoET.
- Raises:
ValueError – Always raised, as attention is not supported for PoET.
- classmethod create(session, model_id, default=None, **kwargs)#
Create and return an instance of the appropriate EmbeddingModel subclass based on the model_id.
- Parameters:
session (APISession) – The API session to use.
model_id (str) – The model identifier.
default (type[EmbeddingModel] or None, optional) – Default EmbeddingModel subclass to use if no match is found.
**kwargs (dict, optional) – Additional keyword arguments to pass to the model constructor.
- Returns:
An instance of the appropriate EmbeddingModel subclass.
- Return type:
EmbeddingModel
- Raises:
ValueError – If no suitable EmbeddingModel subclass is found and no default is provided.
- get_metadata()#
Get model metadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- classmethod get_model()#
Get the model_id(s) for this EmbeddingModel subclass.
- Returns:
List of model_id strings associated with this class.
- Return type:
list of str
- property metadata#
ModelMetadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
Results#
- class openprotein.embeddings.EmbeddingsResultFuture[source]#
Future for manipulating results for embeddings-related requests.
- __init__(session, job, sequences=None, max_workers=10)[source]#
Retrieve results from asynchronous, mapped endpoints.
Use max_workers > 0 to enable concurrent retrieval of multiple pages.
- Parameters:
session (APISession)
job (EmbeddingsJob | AttnJob | LogitsJob)
sequences (list[bytes] | list[str] | None)
max_workers (int)
- get_item(sequence)[source]#
Get embedding results for specified sequence.
- Parameters:
sequence (bytes) – sequence to fetch results for
- Returns:
embeddings
- Return type:
np.ndarray
- cancelled()#
check if job is cancelled
- Return type:
bool
- done()#
Check if job is complete
- Return type:
bool
- refresh()#
Refresh job status.
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- class openprotein.embeddings.EmbeddingsScoreFuture[source]#
Future for manipulating results for embeddings score-related requests.
- __init__(session, job, sequences=None)[source]#
- Parameters:
session (APISession)
job (ScoreJob | ScoreSingleSiteJob | GenerateJob)
sequences (list[bytes] | list[str] | None)
- cancelled()#
check if job is cancelled
- Return type:
bool
- done()#
Check if job is complete
- Return type:
bool
- get(verbose=False, **kwargs)#
Return the results from this job.
- Parameters:
verbose (bool)
- Return type:
list
- refresh()#
Refresh job status.
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- class openprotein.embeddings.EmbeddingsGenerateFuture[source]#
Future for manipulating results for embeddings generate-related requests.
- __init__(session, job, sequences=None)#
- Parameters:
session (APISession)
job (ScoreJob | ScoreSingleSiteJob | GenerateJob)
sequences (list[bytes] | list[str] | None)
- cancelled()#
check if job is cancelled
- Return type:
bool
- done()#
Check if job is complete
- Return type:
bool
- get(verbose=False, **kwargs)#
Return the results from this job.
- Parameters:
verbose (bool)
- Return type:
list
- refresh()#
Refresh job status.
- stream()#
Return the results from this job as a generator.
- Return type:
Generator
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results