Designing sequences#

This tutorial shows you how to design protein sequences based on your chosen objectives. Use our sequence-to-function learning to customize design criteria and design variant libraries based on your data.

What you need before getting started#

In order to design sequences, you need an uploaded dataset and a trained model. Please see Uploading data and Training models for more information.

Initializing your design#

Set up a design job using your trained model as a criteria. You also need to create a mutation map to define which positions and residues should be mutated.

Our helper functions can be used to initialize your design mutation dictionary:

[ ]:
from openprotein.design import DesignConstraint

parent_sequence = assay.get_first().loc[0,'sequence']
print(parent_sequence)
WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA

Initialize the design constraints:

[ ]:
constraints = DesignConstraint(parent_sequence)

Use the constraints to indicate which mutations are allowed at which positions.

For example, we can restrict positions 3 and 6 to G, L only, and position 9 to C only.

[ ]:
constraints.allow([3, 6], ["G", "L"])

constraints.allow(9, "C")

Create design goals/criteria using trained model and limiting the number of mutations.

Here, we set the model design criterion to have the predicted isobutyramide_normalized_fitness be less than -0.5, scored with a weight of 1.0.

[ ]:
from openprotein.design import n_mutations
criteria = (1.0 * (predictor < 0.5)) | n_mutations()

Create the design job:

[ ]:
design_job = session.design.create_genetic_algorithm_design(
    assay=assay,
    criteria=criteria,
    allowed_tokens=constraints
)

design_id = design_job.id
design_job
DesignJob(job_id='606dda22-1891-4620-949f-852a1dac6ab1', job_type=<JobType.designer: '/design'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2024, 12, 3, 5, 30, 10, 659481, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

View the design results:

[ ]:
results = design_job.wait()
[ ]:
results[-1]._asdict()
{'step': 1,
 'sample_index': 35,
 'sequence': 'WRLGDLSSCNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA',
 'scores': array([ -0.88159519, 344.        ]),
 'subscores': array([ -0.88159519, 344.        ]),
 'means': array([-0.41518891]),
 'vars': array([0.1528153])}

Next steps#

Our Design API reference contains more information about designing your sequences.

Explore your designed sequence’s structure with our Structure prediction models, or evaluate the single substitution variants of your sequence with our Property Regression models. Visit Using single site analysis for more information.