openprotein.svd#
Fit SVD models on top of our protein language models to produce reduced embeddings, which can be used to train predictors!
Endpoints#
- class openprotein.svd.SVDAPI[source]#
SVD API providing the interface for creating and using SVD models.
- __init__(session)[source]#
- Parameters:
session (APISession)
- fit_svd(model_id, sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)[source]#
Fit an SVD on the sequences with the specified model_id and hyperparameters (n_components).
- Parameters:
model_id (str) – The ID of the model to fit the SVD on.
sequences (list[bytes]) – The list of sequences to use for the SVD fitting.
n_components (int, optional) – The number of components for the SVD, by default 1024.
reduction (str, optional) – The reduction method to apply to the embeddings, by default None.
assay (AssayDataset | None)
- Returns:
The model with the SVD fit.
- Return type:
- class openprotein.svd.SVDModel[source]#
Class providing embedding endpoint for SVD models. Also allows retrieving embeddings of sequences used to fit the SVD with get. Implements a Future to allow waiting for a fit job.
- job: SVDFitJob#
- __init__(session, job=None, metadata=None)[source]#
Construct the SVD model using either job get or svd metadata get.
- Parameters:
session (APISession)
job (SVDFitJob | None)
metadata (SVDMetadata | None)
- property id#
- property n_components#
- property sequence_length#
- property reduction#
- property metadata#
- property model: EmbeddingModel#
- get_inputs()[source]#
Get sequences used for svd job.
- Returns:
List[bytes]
- Return type:
list of sequences
- embed(sequences, **kwargs)[source]#
Use this SVD model to get reduced embeddings from input sequences.
- Parameters:
sequences (List[bytes]) – List of protein sequences.
- Returns:
Class for further job manipulation.
- Return type:
EmbeddingResultFuture
- fit_umap(sequences=None, assay=None, n_components=2, **kwargs)[source]#
Fit an UMAP on the embedding results of this model.
This function will create an UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the args.
- Parameters:
sequences (List[bytes]) – sequences to UMAP
n_components (int) – number of components in UMAP. Will determine output shapes
reduction (ReductionType | None) – embeddings reduction to use (e.g. mean)
assay (AssayDataset | None)
- Return type:
- fit_gp(assay, properties, name=None, description=None, **kwargs)[source]#
Fit a GP on assay using this embedding model and hyperparameters.
- Parameters:
assay (AssayMetadata | str) – Assay to fit GP on.
properties (list[str]) – Properties in the assay to fit the gp on.
name (str | None)
description (str | None)
- Return type: