openprotein.api.embedding#
Create embeddings for your protein sequences using open-source and proprietary models!
Note that for PoET Models, you will also need to utilize our align. workflow.
Endpoints#
- class openprotein.api.embedding.EmbeddingAPI[source]#
This class defines a high level interface for accessing the embeddings API.
You can access all our models either via
get_model()
or directly through the session’s embedding attribute using the model’s ID and the desired method. For example, to use the attention method on the protein sequence model, you would usesession.embedding.prot_seq.attn()
.Examples
Accessing a model’s method:
# To call the attention method on the protein sequence model: import openprotein session = openprotein.connect(username="user", password="password") session.embedding.prot_seq.attn()
Using the get_model method:
# Get a model instance by name: import openprotein session = openprotein.connect(username="user", password="password") # list available models: print(session.embedding.list_models() ) # init model by name model = session.embedding.get_model('prot-seq')
- prot_seq: OpenProteinModel#
- rotaprot_large_uniref50w: OpenProteinModel#
- rotaprot_large_uniref90_ft: OpenProteinModel#
- __init__(session)[source]#
- Parameters:
session (APISession)
- list_models()[source]#
list models available for creating embeddings of your sequences
- Return type:
List[ProtembedModel]
- get_model(name)[source]#
Get model by model_id.
ProtembedModel allows all the usual job manipulation: e.g. making POST and GET requests for this model specifically.
- Parameters:
model_id (str) – the model identifier
name (str)
- Returns:
The model
- Return type:
ProtembedModel
- Raises:
HTTPError – If the GET request does not succeed.
Models#
- class openprotein.api.embedding.OpenProteinModel[source]#
Class providing inference endpoints for proprietary protein embedding models served by OpenProtein.
Examples
View specific model details (inc supported tokens) with the ? operator.
import openprotein session = openprotein.connect(username="user", password="password") session.embedding.prot_seq?
- __init__(session, model_id, metadata=None)#
- attn(sequences)#
Attention embeddings for sequences using this model.
- Parameters:
sequences (List[bytes]) – sequences to SVD
- Return type:
- embed(sequences, reduction='MEAN')#
Embed sequences using this model.
- Parameters:
sequences (List[bytes]) – sequences to SVD
reduction (str) – embeddings reduction to use (e.g. mean)
- Return type:
- fit_svd(sequences, n_components=1024, reduction=None)#
Fit an SVD on the embedding results of this model.
This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.
- Parameters:
sequences (List[bytes]) – sequences to SVD
n_components (int) – number of components in SVD. Will determine output shapes
reduction (str) – embeddings reduction to use (e.g. mean)
- Return type:
- get_metadata()#
Get model metadata for this model.
- Return type:
ModelMetadata
- classmethod get_model()#
- logits(sequences)#
logit embeddings for sequences using this model.
- Parameters:
sequences (List[bytes]) – sequences to SVD
- Return type:
- property metadata#
- model_id = 'protembed'#
- class openprotein.api.embedding.ESMModel[source]#
Class providing inference endpoints for Facebook’s ESM protein language Models.
Examples
View specific model details (inc supported tokens) with the ? operator.
import openprotein session = openprotein.connect(username="user", password="password") session.embedding.esm2_t12_35M_UR50D?
- __init__(session, model_id, metadata=None)#
- attn(sequences)#
Attention embeddings for sequences using this model.
- Parameters:
sequences (List[bytes]) – sequences to SVD
- Return type:
- embed(sequences, reduction='MEAN')#
Embed sequences using this model.
- Parameters:
sequences (List[bytes]) – sequences to SVD
reduction (str) – embeddings reduction to use (e.g. mean)
- Return type:
- fit_svd(sequences, n_components=1024, reduction=None)#
Fit an SVD on the embedding results of this model.
This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.
- Parameters:
sequences (List[bytes]) – sequences to SVD
n_components (int) – number of components in SVD. Will determine output shapes
reduction (str) – embeddings reduction to use (e.g. mean)
- Return type:
- get_metadata()#
Get model metadata for this model.
- Return type:
ModelMetadata
- logits(sequences)#
logit embeddings for sequences using this model.
- Parameters:
sequences (List[bytes]) – sequences to SVD
- Return type:
- class openprotein.api.embedding.PoETModel[source]#
Class for OpenProtein’s foundation model PoET - NB. PoET functions are dependent on a prompt supplied via the align endpoints.
Examples
View specific model details (inc supported tokens) with the ? operator.
import openprotein session = openprotein.connect(username="user", password="password") session.embedding.poet?
- embed(prompt, sequences, reduction='MEAN')[source]#
Embed sequences using this model.
- Parameters:
prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model
sequence (bytes) – Sequence to embed.
reduction (str) – embeddings reduction to use (e.g. mean)
sequences (List[bytes])
- Return type:
- logits(prompt, sequences)[source]#
logit embeddings for sequences using this model.
- Parameters:
prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model
sequence (bytes) – Sequence to analyse.
sequences (List[bytes])
- Return type:
- score(prompt, sequences)[source]#
Score query sequences using the specified prompt.
- Parameters:
prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model
sequence (bytes) – Sequence to analyse.
sequences (List[bytes])
- Returns:
The scores of the query sequences.
- Return type:
results
- single_site(prompt, sequence)[source]#
Score all single substitutions of the query sequence using the specified prompt.
- Parameters:
prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model
sequence (bytes) – Sequence to analyse.
- Returns:
The scores of the mutated sequence.
- Return type:
results
- generate(prompt, num_samples=100, temperature=1.0, topk=None, topp=None, max_length=1000, seed=None)[source]#
Generate protein sequences conditioned on a prompt.
- Parameters:
prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model
num_samples (int, optional) – The number of samples to generate, by default 100.
temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs, by default 1.0.
topk (int, optional) – The number of top-k residues to consider during sampling, by default None.
topp (float, optional) – The cumulative probability threshold for top-p sampling, by default None.
max_length (int, optional) – The maximum length of generated proteins, by default 1000.
seed (int, optional) – Seed for random number generation, by default a random number.
- Raises:
APIError – If there is an issue with the API request.
- Returns:
An object representing the status and information about the generation job.
- Return type:
Job
- fit_svd(prompt, sequences, n_components=1024, reduction=None)[source]#
Fit an SVD on the embedding results of this model.
This function will create an SVDModel based on the embeddings from this model as well as the hyperparameters specified in the args.
- Parameters:
prompt (Union[str, PromptFuture]) – prompt from an align workflow to condition Poet model
sequences (List[bytes]) – sequences to SVD
n_components (int) – number of components in SVD. Will determine output shapes
reduction (str) – embeddings reduction to use (e.g. mean)
- Return type:
- get_metadata()#
Get model metadata for this model.
- Return type:
ModelMetadata
- class openprotein.api.embedding.SVDModel[source]#
Class providing embedding endpoint for SVD models. Also allows retrieving embeddings of sequences used to fit the SVD with get.
- __init__(session, metadata)[source]#
- Parameters:
session (APISession)
metadata (SVDMetadata)
- get_inputs()[source]#
Get sequences used for embeddings job.
- Returns:
List[bytes]
- Return type:
list of sequences
- get_embeddings()[source]#
Get SVD embedding results for this model.
- Returns:
EmbeddingResultFuture
- Return type:
class for futher job manipulation
- embed(sequences)[source]#
Use this SVD model to reduce embeddings results.
- Parameters:
sequences (List[bytes]) – List of protein sequences.
- Returns:
Class for further job manipulation.
- Return type:
- classmethod get_job_type()#
Return the job type associated with this Future class.
- refresh()#
refresh job status
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
Results#
- class openprotein.api.embedding.EmbeddingResultFuture[source]#
Future Job for manipulating results
- __init__(session, job, sequences=None, max_workers=10)[source]#
Retrieve results from asynchronous, mapped endpoints. Use max_workers > 0 to enable concurrent retrieval of multiple pages.
- Parameters:
session (APISession)
job (Job)
- get_item(sequence)[source]#
Get embedding results for specified sequence.
- Parameters:
sequence (bytes) – sequence to fetch results for
- Returns:
embeddings
- Return type:
np.ndarray
- classmethod get_job_type()#
Return the job type associated with this Future class.
- refresh()#
refresh job status
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- class openprotein.api.poet.PoetScoreFuture[source]#
Represents a result of a PoET scoring job.
- session#
An instance of APISession for API interactions.
- Type:
- job#
The PoET scoring job.
- Type:
Job
- page_size#
The number of results to fetch in a single page.
- Type:
int
- __init__(session, job, page_size=50000, **kwargs)[source]#
init a PoetScoreFuture instance.
- Parameters:
(APISession) (session)
(Job) (job)
(int (page_size)
optional) (The number of results to fetch in a single page. Defaults to config.POET_PAGE_SIZE.)
session (APISession)
job (Job)
- get(verbose=False)[source]#
Get the final results of the PoET scoring job.
- Parameters:
verbose (bool, optional) – If True, print verbose output. Defaults to False.
- Raises:
APIError – If there is an issue with the API request.
- Returns:
A list of PoetScoreResult objects representing the scoring results.
- Return type:
List[PoetScoreResult]
- get_input(input_type)#
See child function docs.
- Parameters:
input_type (PoetInputType)
- classmethod get_job_type()#
Return the job type associated with this Future class.
- get_msa()#
See child function docs.
- get_prompt(prompt_index=None)#
See child function docs.
- Parameters:
prompt_index (int | None)
- get_seed()#
See child function docs.
- refresh()#
refresh job status
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- class openprotein.api.poet.PoetSingleSiteFuture[source]#
Represents a result of a PoET single-site analysis job.
- session#
An instance of APISession for API interactions.
- Type:
- job#
The PoET scoring job.
- Type:
Job
- page_size#
The number of results to fetch in a single page.
- Type:
int
- __init__(session, job, page_size=50000, **kwargs)[source]#
init a PoetSingleSiteFuture instance.
- Parameters:
(APISession) (session)
(Job) (job)
(int (page_size)
optional) (The number of results to fetch in a single page. Defaults to config.POET_PAGE_SIZE.)
session (APISession)
job (Job)
- get(verbose=False)[source]#
Get the results of a PoET single-site analysis job.
- Parameters:
verbose (bool, optional) – If True, print verbose output. Defaults to False.
- Returns:
A dictionary mapping mutation codes to scores.
- Return type:
Dict[bytes, float]
- Raises:
APIError – If there is an issue with the API request.
- get_input(input_type)#
See child function docs.
- Parameters:
input_type (PoetInputType)
- classmethod get_job_type()#
Return the job type associated with this Future class.
- get_msa()#
See child function docs.
- get_prompt(prompt_index=None)#
See child function docs.
- Parameters:
prompt_index (int | None)
- get_seed()#
See child function docs.
- refresh()#
refresh job status
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- class openprotein.api.poet.PoetGenerateFuture[source]#
Represents a result of a PoET generation job.
- session#
An instance of APISession for API interactions.
- Type:
- job#
The PoET scoring job.
- Type:
Job
- Methods#
- stream() -> Iterator[PoetScoreResult]:
Stream the results of the PoET generation job.
- stream()[source]#
Stream the results from the response.
- Returns:
PoetScoreResult – A result object containing the sequence, score, and name.
- Return type:
Yield
- Raises:
APIError – If the request fails.
- __init__(session, job)#
- Parameters:
session (APISession)
job (Job | str)
- get_input(input_type)#
See child function docs.
- Parameters:
input_type (PoetInputType)
- classmethod get_job_type()#
Return the job type associated with this Future class.
- get_msa()#
See child function docs.
- get_prompt(prompt_index=None)#
See child function docs.
- Parameters:
prompt_index (int | None)
- get_seed()#
See child function docs.
- refresh()#
refresh job status
- wait(interval=5, timeout=None, verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results