Open In Colab Get Notebook View In GitHub

Using BoltzGen#

This tutorial shows you how to use the BoltzGen model to design novel protein structures.

The examples here are mainly using those from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!

Full credit for the examples and model go to the authors of boltzgen!

Unconditional monomer design#

The basic execution of BoltzGen would be an unconditional design of a protein structure of a certain length. You would need 3 things:

  1. An authenticated OpenProtein session

  2. Length of the protein

  3. Number of designs N desired

[1]:
import openprotein
session = openprotein.connect()
length = 150
N = 3
[2]:
boltzgen = session.models.boltzgen
boltzgen.generate?
Signature:
boltzgen.generate(
    query: str | bytes | openprotein.molecules.protein.Protein | openprotein.molecules.complex.Complex | openprotein.prompt.models.Query | None = None,
    design_spec: openprotein.models.foundation.boltzgen_schema.BoltzGenDesignSpec | dict[str, typing.Any] | None = None,
    structure_file: str | bytes | typing.BinaryIO | None = None,
    N: int = 1,
    diffusion_batch_size: int | None = None,
    step_scale: float | None = None,
    noise_scale: float | None = None,
    scaffolds: dict[str, str | bytes | typing.BinaryIO] | None = None,
    scaffold_set: openprotein.scaffolds.Scaffolds | str | None = None,
    extra_structure_files: dict[str, str | bytes | typing.BinaryIO] | None = None,
    **kwargs,
) -> openprotein.models.foundation.boltzgen.BoltzGenFuture
Docstring:
Run a protein structure generate job using BoltzGen.

Parameters
----------
query : str or bytes or Protein or Complex or Query, optional
    A query representing the design specification. Either `query` or `design_spec`
    must be provided.
    `query` provides a unified way to represent design specifications on the
    OpenProtein platform. In this case, the structure mask of the containing Complex
    proteins are specified to be designed. Other parameters like binding, group,
    secondary structures, etc. are also passed through to BoltzGen.
design_spec : BoltzGenDesignSpec | dict[str, Any] | None, optional
    The BoltzGen design specification to run. Either `query` or `design_spec`
    must be provided.
    `design_spec` exposes a low-level interface to using BoltzGen by accepting the YAML
    specification used by official BoltzGen examples.
    Can be a typed BoltzGenDesignSpec object or a dict representing the
    BoltzGen yaml request specification.
    Note: If the design_spec includes file paths, provide
    these extra files either using `scaffolds` or `extra_structure_files`.
structure_file : str | bytes | BinaryIO | None, optional
    (Deprecated: use `extra_structure_files`)
    An input PDB/CIF file used for inpainting or other guided design tasks
    where parts of an existing structure are provided. This parameter provides
    the actual structure content that corresponds to any FileEntity `path`
    fields in the design_spec. Can be:
    - A file path (str) to read from
    - Raw file content (bytes)
    - A file-like object (BinaryIO)
n : int, optional
    The number of unique design trajectories to run (default is 1).
diffusion_batch_size : int, optional
    The batch size for diffusion sampling. Controls how many samples are
    processed in parallel during the diffusion process.
step_scale : float, optional
    Scaling factor for the number of diffusion steps. Higher values may
    improve quality at the cost of longer generation time.
noise_scale : float, optional
    Scaling factor for the noise schedule during diffusion. Controls the
    amount of noise added at each step of the reverse diffusion process.
scaffolds : dict[str, str | bytes | BinaryIO] | None, optional
    Dictionary mapping scaffold filenames to their content. Each value can be:
    - A file path (str) to read from
    - Raw file content (bytes)
    - A file-like object (BinaryIO)
    These files will be packaged into a gzipped tar archive and made available
    to the design process under the 'scaffolds/' directory.
scaffold_set : Scaffolds | str | None, optional
    A pre-defined scaffold set object. Alternative to providing individual
    scaffold files via the `scaffolds` parameter.
extra_structure_files : dict[str, str | bytes | BinaryIO] | None, optional
    Dictionary mapping additional structure filenames to their content, with
    the same format options as `scaffolds`. These files will be packaged into
    the same archive under the 'extra/' directory and can be referenced in
    the design specification.

Other Parameters
----------------
**kwargs : dict
    Additional keyword args that are passed directly to the boltzgen
    inference script. Overwrites any preceding options.

Returns
-------
BoltzGenFuture
    A future object that can be used to retrieve the results of the design
    job upon completion.
File:      ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/boltzgen.py
Type:      method

To generate designs, we can use our convenient Query interface.

Alternatively, our python interface also supports the official design specifications from BoltzGen too. Look at the Appendix for an example.

[3]:
from openprotein.molecules import Protein

unconditional_monomer = Protein.from_expr(length)
print("sequence:", unconditional_monomer.sequence)
print("structure mask:", unconditional_monomer.get_structure_mask())
sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
structure mask: [ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True]

Run the design using BoltzGen:

[4]:
unconditional_design_job = boltzgen.generate(N=N, query=unconditional_monomer)
unconditional_design_job
[4]:
BoltzGenJob(job_id='612017a6-cf64-4642-a100-eb55167c645d', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 13, 20, 26, 44950, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for the job to finish running with wait_until_done.

[5]:
unconditional_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [03:18<00:00,  1.98s/it, status=SUCCESS]
[5]:
True

Retrieve the designs as a list of N Complex objects. Complex objects represent multimers, and can hold multiple protein (and other) chains. For now, our design will only return a single chain. Let’s look at the first one.

[6]:
from molviewspec import create_builder

def display_structure(structure_string):
    builder = create_builder()
    structure = builder.download(url="mystructure.cif")\
        .parse(format="mmcif")\
        .model_structure()\
        .component()\
        .representation()\
        .color_from_source(schema="atom",
                            category_name="atom_site",
                            field_name="auth_asym_id",
                            palette={"kind": "categorical", # color by chain
                                    "colors": ["blue", "red", "green", "orange"],
                                    "mode": "ordinal"}
                          )
    return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)

unconditional_design = unconditional_design_job.get()[0]
display_structure(unconditional_design.to_string())

Vanilla Protein Binding#

One of the basic examples in BoltzGen is to do a vanilla protein binding. To do so, we will first retrieve the structure file and parse it as a Protein. We can then craft a molecular Complex with an additional chain to be designed alongside our first chain.

[7]:
import requests
import yaml
import json
from openprotein.molecules import Complex

example_cif_string = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif").text
example_target = Protein.from_string(example_cif_string, format="cif", chain_id="A")
binder_query = example_target & "80"

print("target sequence:", binder_query.get_protein("A").sequence)
print("binder sequence:", binder_query.get_protein("B").sequence)
print("target structure mask:", binder_query.get_protein("A").get_structure_mask())
print("binder structure mask:", binder_query.get_protein("B").get_structure_mask())
target sequence: b'SSFSWDNCDEGKDPAVIRSLTLEPDPIIVPGNVTLSVMGSTSVPLSSPLKVDLVLEKEVAGLWIKIPCTDYIGSCTFEHFCDVLDMLIPTGEPCPEPLRTYGLPCHCPFKEGTYSLPKSEFVVPDLELPSWLTTGNYRIESVLSSSGKRLGCIKIAASLKGI'
binder sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
target structure mask: [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False]
binder structure mask: [ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True]

Now we can run the example:

[8]:
vanilla_protein_design_job = boltzgen.generate(
    query=binder_query,
    N=1,
)
vanilla_protein_design_job
[8]:
BoltzGenJob(job_id='4efcf6c7-ee98-4a17-a6a7-b4d1f0e60f0f', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 13, 32, 17, 327108, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[9]:
vanilla_protein_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:43<00:00,  1.04s/it, status=SUCCESS]
[9]:
True

Display the target + binder. Take note that chain B is the target from chain A above, and chain A is the designed binder.

[10]:
vanilla_protein_design = vanilla_protein_design_job.get()[0]
display_structure(vanilla_protein_design.to_string())

Vanilla Peptide with Target Binding Site#

Let’s run the other example which involves designing a peptide binder. We can retrieve the structure from the official example, then set the target binding sites.

[11]:
from openprotein.molecules import Binding

example_cif_string = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/5cqg.cif").text
example_target = Protein.from_string(example_cif_string, format="cif", chain_id="A")
example_target.set_binding_at([343,344,251], Binding.BINDING)
peptide_binder_query = example_target & "15"

print("target sequence:", peptide_binder_query.get_protein("A").sequence)
print("binder sequence:", peptide_binder_query.get_protein("B").sequence)
print("target structure mask:", peptide_binder_query.get_protein("A").get_structure_mask())
print("binder structure mask:", peptide_binder_query.get_protein("B").get_structure_mask())

# Display our target
display_structure(example_target.to_string())
target sequence: b'MVHYYRLSLKSRQKAPKIVNSKYNSILNIALKNFRLCKKHKTKKPVQILALLQEIIPKSYFGTTTNLKRFYKVVEKILTQSSFECIHLSVLHKCYDYDAIPWLQNVEPNLRPKLLLKHNLFLLDNIVKPIIAFYYKPIKTLNGHEIKFIRKEEYISFESKVFHKLKKMKYLVEVQDEVKPRGVLNIIPKQDNFRAIVSIFPDSARKPFFKLLTSKIYKVLEEKYKTSGSLYTCWSEFTQKTQGQIYGIKVDIRDAYGNVKIPVLCKLIQSIPTHLLDSEKKNFIVDHISNQFVAFRRKIYKWNHGLLQGDPLSGCLCELYMAFMDRLYFSNLDKDAFIHRTVDDYFFCSPHPHKVYDFELLIKGVYQVNPTKTRTNLPTHRHPQDEIPYCGKIFNLTTRQVRTLYKLPPNYEIRHKFKLWNFNNQISDDNPARFLQKAMDFPFICNSFTKFEFNTVFNDQRTVFANFYDAMICVAYKFDAAMMALRTSFLVNDFGFIWLVLSSTVRAYASRAFKKIVTYKGGKYRKVTFQCLKSIAWRAFLAVLKRRTEIYKGLIDRIKSREKLTMKFHDGEVDASYFCKLPEKFRFVKINRKASI'
binder sequence: b'XXXXXXXXXXXXXXX'
target structure mask: [False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False False False False False
 False False False False False False False False]
binder structure mask: [ True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True]
[12]:
vanilla_peptide_design_job = boltzgen.generate(
    query=peptide_binder_query,
    N=1,
)
vanilla_peptide_design_job
[12]:
BoltzGenJob(job_id='6fee8571-cae6-4ffc-bbe7-23ec5a49dddc', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 13, 40, 7, 797809, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)

Wait for and retrieve the result. Display the output design.

[13]:
vanilla_peptide_design_job.wait_until_done(verbose=True, timeout=900)
vanilla_peptide_design = vanilla_peptide_design_job.get()[0]
display_structure(vanilla_peptide_design.to_string())
Waiting: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:53<00:00,  1.74s/it, status=SUCCESS]

Next Steps#

You can run more of the examples from the BoltzGen repository. Take note that any command line arguments to boltzgen run can be passed as kwargs to the boltzgen.design function.

You can also move on to the next step of the design pipeline by running inverse folding using PoET-2. Refer to the walkthrough of Inverse Folding with PoET-2 for an example.

Appendix#

Using the BoltzGen design specification#

To support any non-standard workflows that may not be fully covered by our Query interface, our Python interface also fully supports the raw official BoltzGen specification. This can be used by supplying them directly as design_spec.

However, when doing so, it will usually be necessary to also provide additional structure files to accompany the design specification. These can be provided using extra_structure_files as a mapping of filenames to files.

The following is an example of running the vanilla protein design job with this interface.

[ ]:
import io

design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13prot.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif").text

vanilla_protein_design_job_ = boltzgen.generate(
    design_spec=design_spec,
    extra_structure_files={"1g13.cif": io.BytesIO(structure_file.encode())},
    N=1,
)
vanilla_protein_design_job_
BoltzGenJob(job_id='8146f068-ce59-43d3-9aec-782cce54099e', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 14, 2, 7, 381776, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)