Asssay-based Sequence Learning

Welcome to the Asssay-based Sequence Learning section of the documentation! Here, we describe how to use our library to perform the core tasks associated with data processing and utilizing the platform’s machine learning capabilities.

The Asssay-based Sequence Learning functionality of OpenProtein’s Python client library is divided into four main modules: AssayData, Train, Predict, and Design.


Our AssayData module allows you to upload your dataset to OpenProtein’s engineering platform. This dataset forms the basis for training, predicting, and evaluating tasks. Your data should be formatted as a 2-column CSV, including the full sequence of each variant and one or more columns for your measured properties.

See the AssayData documentation for more details.

Our Train module provides functions to train models on your measured properties. This step is essential for enabling predictions for new sequences. These workflows also perform cross-validation on your models to estimate uncertainty.

See the Train documentation for more details.


With the Predict module, you can make predictions on arbitrary sequences using your trained OpenProtein models. This includes predictions for single sequences as well as single mutant variants of the sequence.

See the Predict documentation for more details.


The Design module provides the capability to design new sequences based on your objectives using our genetic algorithm.

See the Design documentation for more details.

Remember that these workflows require you to first upload your datasets using the AssayData module and train your models using the Train module.


For a practical example of using this workflow, see the core workflow notebook.

In addition to this documentation, we offer demos of key workflows and provide demo datasets to help you familiarize yourself with our workflows. Happy learning and exploring!