openprotein.svd#

Fit SVD models on top of our protein language models to produce reduced embeddings, which can be used to train predictors!

Endpoints#

class openprotein.svd.SVDAPI[source]#

SVD API providing the interface for creating and using SVD models.

__init__(session)[source]#
Parameters:

session (APISession)

fit_svd(model_id, sequences=None, assay=None, n_components=1024, reduction=None, **kwargs)[source]#

Fit an SVD on the sequences with the specified model_id and hyperparameters (n_components).

Parameters:
  • model_id (str) – The ID of the model to fit the SVD on.

  • sequences (list[bytes]) – The list of sequences to use for the SVD fitting.

  • n_components (int, optional) – The number of components for the SVD, by default 1024.

  • reduction (str, optional) – The reduction method to apply to the embeddings, by default None.

  • assay (AssayDataset | None)

Returns:

The model with the SVD fit.

Return type:

SVDModel

get_svd(svd_id)[source]#

Get SVD job results. Including SVD dimension and sequence lengths.

Requires a successful SVD job from fit_svd

Parameters:

svd_id (str) – The ID of the SVD job.

Returns:

The model with the SVD fit.

Return type:

SVDModel

list_svd()[source]#

List SVD models made by user.

Takes no args.

Returns:

SVDModels

Return type:

list[SVDModel]

class openprotein.svd.SVDModel[source]#

Class providing embedding endpoint for SVD models. Also allows retrieving embeddings of sequences used to fit the SVD with get. Implements a Future to allow waiting for a fit job.

job: SVDFitJob#
__init__(session, job=None, metadata=None)[source]#

Construct the SVD model using either job get or svd metadata get.

Parameters:
  • session (APISession)

  • job (SVDFitJob | None)

  • metadata (SVDMetadata | None)

property id#
property n_components#
property sequence_length#
property reduction#
property metadata#
get_model()[source]#

Fetch embeddings model

Return type:

EmbeddingModel

property model: EmbeddingModel#
delete()[source]#

Delete this SVD model.

Return type:

bool

get(verbose=False)[source]#

Return the results from this job.

Parameters:

verbose (bool)

get_inputs()[source]#

Get sequences used for svd job.

Returns:

List[bytes]

Return type:

list of sequences

embed(sequences, **kwargs)[source]#

Use this SVD model to get reduced embeddings from input sequences.

Parameters:

sequences (List[bytes]) – List of protein sequences.

Returns:

Class for further job manipulation.

Return type:

EmbeddingResultFuture

fit_umap(sequences=None, assay=None, n_components=2, **kwargs)[source]#

Fit an UMAP on the embedding results of this model.

This function will create an UMAPModel based on the embeddings from this model as well as the hyperparameters specified in the args.

Parameters:
  • sequences (List[bytes]) – sequences to UMAP

  • n_components (int) – number of components in UMAP. Will determine output shapes

  • reduction (ReductionType | None) – embeddings reduction to use (e.g. mean)

  • assay (AssayDataset | None)

Return type:

UMAPModel

fit_gp(assay, properties, name=None, description=None, **kwargs)[source]#

Fit a GP on assay using this embedding model and hyperparameters.

Parameters:
  • assay (AssayMetadata | str) – Assay to fit GP on.

  • properties (list[str]) – Properties in the assay to fit the gp on.

  • name (str | None)

  • description (str | None)

Return type:

PredictorModel