Using BoltzGen#
This tutorial shows you how to use the BoltzGen model to design novel protein structures.
The examples here are mainly using those from the original documentation but adapted to show how it can be run using the OpenProtein platform, which can then be combined with our other workflows!
Full credit for the examples and model go to the authors of boltzgen!
Unconditional monomer design#
The basic execution of BoltzGen would be an unconditional design of a protein structure of a certain length. You would need 3 things:
An authenticated OpenProtein session
Length of the protein
Number of designs
Ndesired
[1]:
import openprotein
session = openprotein.connect()
length = 150
N = 3
[2]:
boltzgen = session.models.boltzgen
boltzgen.generate?
Signature:
boltzgen.generate(
query: str | bytes | openprotein.molecules.protein.Protein | openprotein.molecules.complex.Complex | openprotein.prompt.models.Query | None = None,
design_spec: openprotein.models.foundation.boltzgen_schema.BoltzGenDesignSpec | dict[str, typing.Any] | None = None,
structure_file: str | bytes | typing.BinaryIO | None = None,
N: int = 1,
diffusion_batch_size: int | None = None,
step_scale: float | None = None,
noise_scale: float | None = None,
scaffolds: dict[str, str | bytes | typing.BinaryIO] | None = None,
scaffold_set: openprotein.scaffolds.Scaffolds | str | None = None,
extra_structure_files: dict[str, str | bytes | typing.BinaryIO] | None = None,
**kwargs,
) -> openprotein.models.foundation.boltzgen.BoltzGenFuture
Docstring:
Run a protein structure generate job using BoltzGen.
Parameters
----------
query : str or bytes or Protein or Complex or Query, optional
A query representing the design specification. Either `query` or `design_spec`
must be provided.
`query` provides a unified way to represent design specifications on the
OpenProtein platform. In this case, the structure mask of the containing Complex
proteins are specified to be designed. Other parameters like binding, group,
secondary structures, etc. are also passed through to BoltzGen.
design_spec : BoltzGenDesignSpec | dict[str, Any] | None, optional
The BoltzGen design specification to run. Either `query` or `design_spec`
must be provided.
`design_spec` exposes a low-level interface to using BoltzGen by accepting the YAML
specification used by official BoltzGen examples.
Can be a typed BoltzGenDesignSpec object or a dict representing the
BoltzGen yaml request specification.
Note: If the design_spec includes file paths, provide
these extra files either using `scaffolds` or `extra_structure_files`.
structure_file : str | bytes | BinaryIO | None, optional
(Deprecated: use `extra_structure_files`)
An input PDB/CIF file used for inpainting or other guided design tasks
where parts of an existing structure are provided. This parameter provides
the actual structure content that corresponds to any FileEntity `path`
fields in the design_spec. Can be:
- A file path (str) to read from
- Raw file content (bytes)
- A file-like object (BinaryIO)
n : int, optional
The number of unique design trajectories to run (default is 1).
diffusion_batch_size : int, optional
The batch size for diffusion sampling. Controls how many samples are
processed in parallel during the diffusion process.
step_scale : float, optional
Scaling factor for the number of diffusion steps. Higher values may
improve quality at the cost of longer generation time.
noise_scale : float, optional
Scaling factor for the noise schedule during diffusion. Controls the
amount of noise added at each step of the reverse diffusion process.
scaffolds : dict[str, str | bytes | BinaryIO] | None, optional
Dictionary mapping scaffold filenames to their content. Each value can be:
- A file path (str) to read from
- Raw file content (bytes)
- A file-like object (BinaryIO)
These files will be packaged into a gzipped tar archive and made available
to the design process under the 'scaffolds/' directory.
scaffold_set : Scaffolds | str | None, optional
A pre-defined scaffold set object. Alternative to providing individual
scaffold files via the `scaffolds` parameter.
extra_structure_files : dict[str, str | bytes | BinaryIO] | None, optional
Dictionary mapping additional structure filenames to their content, with
the same format options as `scaffolds`. These files will be packaged into
the same archive under the 'extra/' directory and can be referenced in
the design specification.
Other Parameters
----------------
**kwargs : dict
Additional keyword args that are passed directly to the boltzgen
inference script. Overwrites any preceding options.
Returns
-------
BoltzGenFuture
A future object that can be used to retrieve the results of the design
job upon completion.
File: ~/Projects/openprotein/openprotein-python-private/openprotein/models/foundation/boltzgen.py
Type: method
To generate designs, we can use our convenient Query interface.
Alternatively, our python interface also supports the official design specifications from BoltzGen too. Look at the Appendix for an example.
[3]:
from openprotein.molecules import Protein
unconditional_monomer = Protein.from_expr(length)
print("sequence:", unconditional_monomer.sequence)
print("structure mask:", unconditional_monomer.get_structure_mask())
sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
structure mask: [ True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True]
Run the design using BoltzGen:
[4]:
unconditional_design_job = boltzgen.generate(N=N, query=unconditional_monomer)
unconditional_design_job
[4]:
BoltzGenJob(job_id='612017a6-cf64-4642-a100-eb55167c645d', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 13, 20, 26, 44950, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for the job to finish running with wait_until_done.
[5]:
unconditional_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [03:18<00:00, 1.98s/it, status=SUCCESS]
[5]:
True
Retrieve the designs as a list of N Complex objects. Complex objects represent multimers, and can hold multiple protein (and other) chains. For now, our design will only return a single chain. Let’s look at the first one.
[6]:
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
unconditional_design = unconditional_design_job.get()[0]
display_structure(unconditional_design.to_string())
Vanilla Protein Binding#
One of the basic examples in BoltzGen is to do a vanilla protein binding. To do so, we will first retrieve the structure file and parse it as a Protein. We can then craft a molecular Complex with an additional chain to be designed alongside our first chain.
[7]:
import requests
import yaml
import json
from openprotein.molecules import Complex
example_cif_string = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif").text
example_target = Protein.from_string(example_cif_string, format="cif", chain_id="A")
binder_query = example_target & "80"
print("target sequence:", binder_query.get_protein("A").sequence)
print("binder sequence:", binder_query.get_protein("B").sequence)
print("target structure mask:", binder_query.get_protein("A").get_structure_mask())
print("binder structure mask:", binder_query.get_protein("B").get_structure_mask())
target sequence: b'SSFSWDNCDEGKDPAVIRSLTLEPDPIIVPGNVTLSVMGSTSVPLSSPLKVDLVLEKEVAGLWIKIPCTDYIGSCTFEHFCDVLDMLIPTGEPCPEPLRTYGLPCHCPFKEGTYSLPKSEFVVPDLELPSWLTTGNYRIESVLSSSGKRLGCIKIAASLKGI'
binder sequence: b'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
target structure mask: [False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False]
binder structure mask: [ True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True True True True True
True True True True True True True True]
Now we can run the example:
[8]:
vanilla_protein_design_job = boltzgen.generate(
query=binder_query,
N=1,
)
vanilla_protein_design_job
[8]:
BoltzGenJob(job_id='4efcf6c7-ee98-4a17-a6a7-b4d1f0e60f0f', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 13, 32, 17, 327108, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[9]:
vanilla_protein_design_job.wait_until_done(verbose=True, timeout=600)
Waiting: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [01:43<00:00, 1.04s/it, status=SUCCESS]
[9]:
True
Display the target + binder. Take note that chain B is the target from chain A above, and chain A is the designed binder.
[10]:
vanilla_protein_design = vanilla_protein_design_job.get()[0]
display_structure(vanilla_protein_design.to_string())
Vanilla Peptide with Target Binding Site#
Let’s run the other example which involves designing a peptide binder. We can retrieve the structure from the official example, then set the target binding sites.
[11]:
from openprotein.molecules import Binding
example_cif_string = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_peptide_with_target_binding_site/5cqg.cif").text
example_target = Protein.from_string(example_cif_string, format="cif", chain_id="A")
example_target.set_binding_at([343,344,251], Binding.BINDING)
peptide_binder_query = example_target & "15"
print("target sequence:", peptide_binder_query.get_protein("A").sequence)
print("binder sequence:", peptide_binder_query.get_protein("B").sequence)
print("target structure mask:", peptide_binder_query.get_protein("A").get_structure_mask())
print("binder structure mask:", peptide_binder_query.get_protein("B").get_structure_mask())
# Display our target
display_structure(example_target.to_string())
target sequence: b'MVHYYRLSLKSRQKAPKIVNSKYNSILNIALKNFRLCKKHKTKKPVQILALLQEIIPKSYFGTTTNLKRFYKVVEKILTQSSFECIHLSVLHKCYDYDAIPWLQNVEPNLRPKLLLKHNLFLLDNIVKPIIAFYYKPIKTLNGHEIKFIRKEEYISFESKVFHKLKKMKYLVEVQDEVKPRGVLNIIPKQDNFRAIVSIFPDSARKPFFKLLTSKIYKVLEEKYKTSGSLYTCWSEFTQKTQGQIYGIKVDIRDAYGNVKIPVLCKLIQSIPTHLLDSEKKNFIVDHISNQFVAFRRKIYKWNHGLLQGDPLSGCLCELYMAFMDRLYFSNLDKDAFIHRTVDDYFFCSPHPHKVYDFELLIKGVYQVNPTKTRTNLPTHRHPQDEIPYCGKIFNLTTRQVRTLYKLPPNYEIRHKFKLWNFNNQISDDNPARFLQKAMDFPFICNSFTKFEFNTVFNDQRTVFANFYDAMICVAYKFDAAMMALRTSFLVNDFGFIWLVLSSTVRAYASRAFKKIVTYKGGKYRKVTFQCLKSIAWRAFLAVLKRRTEIYKGLIDRIKSREKLTMKFHDGEVDASYFCKLPEKFRFVKINRKASI'
binder sequence: b'XXXXXXXXXXXXXXX'
target structure mask: [False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False False False False False
False False False False False False False False]
binder structure mask: [ True True True True True True True True True True True True
True True True]
[12]:
vanilla_peptide_design_job = boltzgen.generate(
query=peptide_binder_query,
N=1,
)
vanilla_peptide_design_job
[12]:
BoltzGenJob(job_id='6fee8571-cae6-4ffc-bbe7-23ec5a49dddc', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 13, 40, 7, 797809, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Wait for and retrieve the result. Display the output design.
[13]:
vanilla_peptide_design_job.wait_until_done(verbose=True, timeout=900)
vanilla_peptide_design = vanilla_peptide_design_job.get()[0]
display_structure(vanilla_peptide_design.to_string())
Waiting: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [02:53<00:00, 1.74s/it, status=SUCCESS]
Next Steps#
You can run more of the examples from the BoltzGen repository. Take note that any command line arguments to boltzgen run can be passed as kwargs to the boltzgen.design function.
You can also move on to the next step of the design pipeline by running inverse folding using PoET-2. Refer to the walkthrough of Inverse Folding with PoET-2 for an example.
Appendix#
Using the BoltzGen design specification#
To support any non-standard workflows that may not be fully covered by our Query interface, our Python interface also fully supports the raw official BoltzGen specification. This can be used by supplying them directly as design_spec.
However, when doing so, it will usually be necessary to also provide additional structure files to accompany the design specification. These can be provided using extra_structure_files as a mapping of filenames to files.
The following is an example of running the vanilla protein design job with this interface.
[ ]:
import io
design_spec = yaml.safe_load(requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13prot.yaml").text)
structure_file = requests.get("https://raw.githubusercontent.com/HannesStark/boltzgen/refs/heads/main/example/vanilla_protein/1g13.cif").text
vanilla_protein_design_job_ = boltzgen.generate(
design_spec=design_spec,
extra_structure_files={"1g13.cif": io.BytesIO(structure_file.encode())},
N=1,
)
vanilla_protein_design_job_
BoltzGenJob(job_id='8146f068-ce59-43d3-9aec-782cce54099e', job_type='/models/boltzgen', status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 17, 14, 2, 7, 381776, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)