Foundation Models#

Our Python API provides a suite of foundation protein language models, including our proprietary models as well as open-source models, and enables you to make high-quality embeddings for your protein sequences.

Each model has unique characteristics, such as number of parameters, maximum sequence length, dimension, and supported output types, allowing you to select the most relevant model for your project. For each model, you can access attention embeddings and logits embeddings and fit an SVD.

Models included in the embeddings endpoint are:

  • PoET: An OpenProtein.AI conditional protein language model that enables embedding, scoring, and generating sequences conditioned on an input protein family of interest. Publication

  • Prot-seq: An OpenProtein.AI model delivering high-performance protein sequence embeddings.

  • Rotaprot-large-uniref50w: An OpenProtein.AI model specifically trained for robust inference capabilities.

  • Rotaprot-large-uniref90-ft: This model is a fine-tuned version of rotaprot-large-uniref50w.

  • ESM1 models: These community-based models use the ESM1 language model as their basis. ESM1b publication, ESM1v publication

  • ESM2 models: Also community-based, these models use the ESM2 language model. Publication

Get started using foundation models#

Tutorials:

API Reference