Using Boltz-1 and Boltz-2#
This tutorial demonstrates how to use the Boltz-2 model to predict the structure of a molecular complex, including proteins and ligands. We will also show how to request and retrieve predicted binding affinities and other quality metrics.
What you need before getting started#
First, ensure you have an active OpenProtein session. Then, import the necessary classes for defining the components of your complex.
[1]:
import openprotein
from openprotein.molecules import Complex, Protein, Ligand
# Login to your session
session = openprotein.connect()
Defining the Molecules#
Boltz-2 can model various molecule types, including proteins, ligands, DNA, and RNA. For this example, we’ll predict the structure of a protein dimer in complex with a ligand.
We will define a dimer and one ligand. To do this, we will create a Complex with a dictionary of chains and their respective chain ids.
Note that for affinity prediction, the ligand that is binding must be a single, unique ligand in the complex.
[2]:
# Define the molecular complex to predict
# Start with the protein in a homodimer
protein = Protein(sequence="MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEAPADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSLVGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTTLSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRLGVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVDQIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRILLARRATEPSAVPEGQASENLYFQ")
# You can also specify the protein to be cyclic by setting the property
# protein.cyclic = True
# Define the ligand in our complex
ligand = Ligand(ccd="SAH")
# Assemble the complex
complex = Complex({
"A": protein,
"B": protein,
"C": ligand,
})
Create MSA for the Protein using Homology Search#
When using Boltz with protein sequences, we need to supply an MSA to help inform the model. Otherwise, we can also explicitly set it to run using single sequence mode. You have to specify protein.msa either an MSA or to use Protein.single_sequence_mode.
Here, we will be building an MSA using our platform capabilities. Note the syntax here: creating an MSA with a complex uses ColabFold’s syntax of joining sequences with :.
[3]:
msa_query = []
for p in complex.get_proteins().values():
msa_query.append(p.sequence)
msa = session.align.create_msa(seed=b":".join(msa_query))
for p in complex.get_proteins().values():
p.msa = msa
# If desired, use single sequence mode to specify no msa
# p.msa = Protein.single_sequence_mode
Predicting the Complex Structure and Affinity#
Now, we can call the fold method on the Boltz-2 model.
The key steps are:
Access the model via
session.fold.boltz2. (orsession.fold.boltz1, orsession.fold.boltz1x)Pass the defined proteins and ligands.
To request binding affinity prediction, include the
propertiesargument. This argument takes a list of dictionaries. For affinity, you specify thebinderas thechain_idof the ligand you defined. (Note that Boltz-1 doesn’t support affinity.)
[4]:
# Request the fold, including an affinity prediction for our ligand.
fold_job = session.fold.boltz2.fold(
sequences=[complex], # list for batch requests
properties=[{"affinity": {"binder": "C"}}]
)
fold_job
[4]:
FoldJob(num_records=1, job_id='7bce7fe5-e946-4ae8-a9aa-bde6e2b7b0c0', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2026, 1, 16, 12, 56, 7, 411147, tzinfo=TzInfo(0)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
The call returns a FoldResultFuture object immediately. This is a reference to your job running on the OpenProtein platform. You can monitor its status or wait for it to complete.
[5]:
# Wait for the job to finish
fold_job.wait_until_done(verbose=True)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 938.65it/s, status=SUCCESS]
[5]:
True
Retrieving the Results#
Once the job is complete, you can retrieve the various outputs from the future object.
Getting the Structure#
The primary result is the Structure which contains the parsed molecular structure from the Boltz inference. The Structure object itself can hold multiple Complexs which in turn can hold multiple difference chains, including Proteins, which themselves hold the individual predicted 3D
coordinates of their atoms.
The number of Complexes in the resulting Structure depends on the diffusion_samples parameter in the request.
The output result is a list type because the API supports submitting multiple Complexes for prediction and each result maps to what was submitted in order.
[6]:
result = fold_job.get()
structure = result[0]
predicted_complex = structure[0]
print("Predicted structures:", result)
print("Predicted molecular complex:", result[0][0])
print("Predicted protein A:\n", predicted_complex.get_protein("A"))
print("Predicted protein B:\n", predicted_complex.get_protein("B"))
print("Predicted ligand C:\n", predicted_complex.get_ligand("C"))
Predicted structures: [<openprotein.molecules.structure.Structure object at 0x7f83dfb8c9b0>]
Predicted molecular complex: <openprotein.molecules.complex.Complex object at 0x7f8394df5940>
Predicted protein A:
0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA
60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL
120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT
180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL
240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD
300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI
360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted protein B:
0 SEQUENCE MVTPEGNVSLVDESLLVGVTDEDRAVRSAHQFYERLIGLWAPAVMEAAHELGVFAALAEA
60 SEQUENCE PADSGELARRLDCDARAMRVLLDALYAYDVIDRIHDTNGFRYLLSAEARECLLPGTLFSL
120 SEQUENCE VGKFMHDINVAWPAWRNLAEVVRHGARDTSGAESPNGIAQEDYESLVGGINFWAPPIVTT
180 SEQUENCE LSRKLRASGRSGDATASVLDVGCGTGLYSQLLLREFPRWTATGLDVERIATLANAQALRL
240 SEQUENCE GVEERFATRAGDFWRGGWGTGYDLVLFANIFHLQTPASAVRLMRHAAACLAPDGLVAVVD
300 SEQUENCE QIVDADREPKTPQDRFALLFAASMTNTGGGDAYTFQEYEEWFTAAGLQRIETLDTPMHRI
360 SEQUENCE LLARRATEPSAVPEGQASENLYFQ
Predicted ligand C:
Ligand(ccd='SAH', smiles=None, _structure_block=<openprotein.utils.cif.StructureCIFBlock object at 0x7f8394df6150>)
Visualize the structure using molviewspec.
[7]:
%pip install molviewspec
from molviewspec import create_builder
def display_structure(structure_string):
builder = create_builder()
structure = builder.download(url="mystructure.cif")\
.parse(format="mmcif")\
.model_structure()\
.component()\
.representation()\
.color_from_source(schema="atom",
category_name="atom_site",
field_name="auth_asym_id",
palette={"kind": "categorical", # color by chain
"colors": ["blue", "red", "green", "orange"],
"mode": "ordinal"}
)
return builder.molstar_notebook(data={'mystructure.cif': structure_string}, width=500, height=400)
display_structure(structure.to_string(format="cif"))
Requirement already satisfied: molviewspec in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (1.7.0)
Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.12.5)
Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.41.5 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.41.5)
Requirement already satisfied: typing-extensions>=4.14.1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.15.0)
Requirement already satisfied: typing-inspection>=0.4.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.2)
Note: you may need to restart the kernel to use updated packages.
Getting Confidence Metrics (pLDDT, PAE, PDE, and Confidence Score)#
Boltz provides AlphaFold3-style confidence metrics, plus an additional PDE output reflecting diffusion uncertainty.
- pLDDT (predicted Local Distance Difference Test)A per-residue confidence score—commonly scaled from 0–100 (or 0.0–1.0)—indicating how reliably each residue’s coordinate is predicted.
- PAE (Predicted Aligned Error)An N × N matrix estimating the expected error between pairs of residues, useful for assessing relative positions (e.g., domains or chains).
- PDE (Predicted Diffusion Error)A Boltz-specific metric trained to estimate uncertainty introduced by the reverse diffusion process. Output as a per-pair matrix (often symmetric) representing diffusion-related misprediction between residue pairs.
- Overall confidence scoreThe confidence score combines pLDDT, interface scores (pTM/iPTM, ligand_ipTM), and PDE into a single normalized rating (typically between 0 and 1), reflecting the likely reliability of the full prediction, including binding mode and interface correctness. It also shows the scores across pairs.
By combining pLDDT with pairwise PAE and PDE, and optionally a summary confidence score, users can evaluate confidence both at the residue level and in global or interface contexts—including uncertainties introduced by diffusion sampling.
Note that the numpy matrices will have a first dimension that represents the number of models in the structure, which we can adjust using diffusion_samples as it defaults to 1.
[8]:
# Retrieve the pLDDT scores
plddt_scores = fold_job.get_plddt()[0] # note that we are indexing into the first one
print("pLDDT scores shape:", plddt_scores.shape)
print("First 10 scores:", plddt_scores[0, :10])
# Retrieve the PAE matrix
pae_matrix = fold_job.get_pae()[0]
print("\nPAE matrix shape:", pae_matrix.shape)
# Retrieve the PDE matrix
pde_matrix = fold_job.get_pde()[0]
print("\nPDE matrix shape:", pde_matrix.shape)
# Retrieve the confidence scores
import json
confidence_scores = fold_job.get_confidence()[0]
print("\nConfidence scores:", json.dumps(confidence_scores[0].model_dump(), indent=2))
pLDDT scores shape: (1, 794)
First 10 scores: [0.5299579 0.56342727 0.6073508 0.6089854 0.6477817 0.6406661
0.6705705 0.75753546 0.86440575 0.9493773 ]
PAE matrix shape: (1, 794, 794)
PDE matrix shape: (1, 794, 794)
Confidence scores: {
"confidence_score": 0.9335882067680359,
"ptm": 0.9233882427215576,
"iptm": 0.9229851961135864,
"ligand_iptm": 0.9664735794067383,
"protein_iptm": 0.9221698641777039,
"complex_plddt": 0.9362390041351318,
"complex_iplddt": 0.9502238035202026,
"complex_pde": 0.5904649496078491,
"complex_ipde": 1.6942747831344604,
"chains_ptm": {
"0": 0.9499543309211731,
"1": 0.9469538331031799,
"2": 0.9903270602226257
},
"pair_chains_iptm": {
"0": {
"0": 0.9499543309211731,
"1": 0.9221698641777039,
"2": 0.8560048937797546
},
"1": {
"0": 0.9177331924438477,
"1": 0.9469538331031799,
"2": 0.7558805346488953
},
"2": {
"0": 0.9664735794067383,
"1": 0.9350272417068481,
"2": 0.9903270602226257
}
}
}
Getting Predicted Binding Affinity#
Since we requested it, we can now retrieve the predicted binding affinity. The result is a BoltzAffinity object containing detailed predictions.
[9]:
# Retrieve the affinity prediction
affinity_data = fold_job.get_affinity()[0]
print("Affinity for ligand 'C':")
print(f" predicted: {affinity_data.affinity_pred_value}")
print(f" probability: {affinity_data.affinity_probability_binary}")
print(f" per chain:")
print(f" chain A:")
print(f" predicted: {affinity_data.affinity_pred_value1}")
print(f" probability: {affinity_data.affinity_probability_binary1}")
print(f" chain B:")
print(f" predicted: {affinity_data.affinity_pred_value2}")
print(f" probability: {affinity_data.affinity_probability_binary2}")
Affinity for ligand 'C':
predicted: -1.8190799951553345
probability: 0.9927991628646851
per chain:
chain A:
predicted: -2.1024317741394043
probability: 0.9960522055625916
chain B:
predicted: -1.5357282161712646
probability: 0.9895461201667786
Next Steps#
You can examine the predicted structure, or work on binder design with BoltzGen on our platform. You can save your predicted structure like so:
[10]:
with open("boltz_prediction.cif", "w") as f:
f.write(structure.to_string(format="cif"))