openprotein.fold#
Create PDBs of your protein sequences via our folding models!
Note that for Boltz and AlphaFold2 Models, you will also need to utilize our align workflow to create MSAs.
Interface#
- class openprotein.fold.FoldAPI(session)[source]#
Fold API provides a high level interface for making protein structure predictions.
- boltz2: Boltz2Model#
Boltz-2 model
- boltz_2: Boltz2Model#
- boltz1x: Boltz1xModel#
Boltz-1x model
- boltz_1x: Boltz1xModel#
- boltz1: Boltz1Model#
Boltz-1 model
- boltz_1: Boltz1Model#
- af2: AlphaFold2Model#
AlphaFold-2 model
- alphafold2: AlphaFold2Model#
- rf3: RosettaFold3Model#
RosettaFold-3 model
- rosettafold_3: RosettaFold3Model#
- esmfold: ESMFoldModel#
ESMFold model
- minifold: MiniFoldModel#
MiniFold model
- get_model(model_id)[source]#
Get model by model_id.
FoldModel allows all the usual job manipulation: e.g. making POST and GET requests for this model specifically.
- Parameters:
model_id (str) – the model identifier
- Returns:
The model
- Return type:
FoldModel
- Raises:
HTTPError – If the GET request does not succeed.
Models#
- class openprotein.fold.Boltz2Model(session, model_id, metadata=None)[source]#
Class providing inference endpoints for Boltz-2 structure prediction model which jointly models complex structures and binding affinities.
- fold(sequences, diffusion_samples=1, num_recycles=3, num_steps=200, step_scale=1.638, use_potentials=False, constraints=None, templates=None, properties=None, method=None)[source]#
Request structure prediction with Boltz-2 model.
- Parameters:
sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.
diffusion_samples (int) – Number of diffusion samples to use
num_recycles (int) – Number of recycling steps to use
num_steps (int) – Number of sampling steps to use
step_scale (float) – Scaling factor for diffusion steps.
use_potentials (bool = False.) – Whether or not to use potentials.
constraints (list[dict] | None = None) – List of constraints.
templates (list[dict] | None = None) – List of templates to use for structure prediction.
properties (list[dict] | None = None) – List of additional properties to predict. Should match the BoltzProperties
method (str | None) – The experimental method or supervision source used for the prediction. Defults to None. Supported values (case-insensitive) include: ‘MD’, ‘X-RAY DIFFRACTION’, ‘ELECTRON MICROSCOPY’, ‘SOLUTION NMR’, ‘SOLID-STATE NMR’, ‘NEUTRON DIFFRACTION’, ‘ELECTRON CRYSTALLOGRAPHY’, ‘FIBER DIFFRACTION’, ‘POWDER DIFFRACTION’, ‘INFRARED SPECTROSCOPY’, ‘FLUORESCENCE TRANSFER’, ‘EPR’, ‘THEORETICAL MODEL’, ‘SOLUTION SCATTERING’, ‘OTHER’, ‘AFDB’, ‘BOLTZ-1’. View the documentation on Boltz for upstream details.
- Returns:
Future for the folding result.
- Return type:
- class openprotein.fold.Boltz1xModel(session, model_id, metadata=None)[source]#
Class providing inference endpoints for Boltz-1x open-source structure prediction model, which adds the use of inference potentials to improve performance.
- fold(sequences, diffusion_samples=1, num_recycles=3, num_steps=200, step_scale=1.638, constraints=None)[source]#
Request structure prediction with Boltz-1x model. Uses potentials with Boltz-1 model.
- Parameters:
sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.
diffusion_samples (int) – Number of diffusion samples to use
num_recycles (int) – Number of recycling steps to use
num_steps (int) – Number of sampling steps to use
step_scale (float) – Scaling factor for diffusion steps.
constraints (Optional[List[dict]]) – List of constraints.
- Returns:
Future for the folding complex result.
- Return type:
- class openprotein.fold.Boltz1Model(session, model_id, metadata=None)[source]#
Class providing inference endpoints for Boltz-1 open-source structure prediction model.
- fold(sequences, diffusion_samples=1, num_recycles=3, num_steps=200, step_scale=1.638, use_potentials=False, constraints=None)[source]#
Request structure prediction with Boltz-1 model.
- Parameters:
sequences (Sequence[Complex | Protein | str | bytes] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.
diffusion_samples (int) – Number of diffusion samples to use
num_recycles (int) – Number of recycling steps to use
num_steps (int) – Number of sampling steps to use
step_scale (float) – Scaling factor for diffusion steps.
use_potentials (bool = False.) – Whether or not to use potentials.
constraints (Optional[List[dict]]) – List of constraints.
- Returns:
Future for the folding complex result.
- Return type:
- class openprotein.fold.AlphaFold2Model(session, model_id, metadata=None)[source]#
Class providing inference endpoints for AlphaFold2 structure prediction models, based on the implementation by ColabFold.
- fold(sequences=None, num_recycles=None, num_models=1, num_relax=0, **kwargs)[source]#
Post sequences to alphafold model.
- Parameters:
sequences (List[Complex | Protein | str] | MSAFuture) – List of protein sequences to include in folded output. Protein objects must be tagged with an msa, which can be a Protein.single_sequence_mode for single sequence mode. Alternatively, supply an MSAFuture to use all query sequences as a multimer.
num_recycles (int) – number of times to recycle models
num_models (int) – number of models to train - best model will be used
num_relax (int) – maximum number of iterations for relax
- Returns:
job
- Return type:
- class openprotein.fold.ESMFoldModel(session, model_id, metadata=None)[source]#
Class providing inference endpoints for Facebook’s ESMFold structure prediction models.
- model_id: str = 'esmfold'#
Results#
- class openprotein.fold.FoldResultFuture(session, job=None, metadata=None, sequences=None, complexes=None, max_workers=10)[source]#
Fold results represented as a future.
- job#
The fold job associated with this future.
- Type:
FoldJob
- classmethod create(session, job=None, metadata=None, **kwargs)[source]#
Factory method to create a FoldResultFuture.
- Parameters:
session (APISession) – The API session to use for requests.
job (FoldJob) –
The fold job associated with this future.
Additional keyword arguments.
- Returns:
An instance of FoldResultFuture.
- Return type:
- property sequences: list[bytes]#
Get the sequences submitted for the fold request.
- Returns:
List of sequences.
- Return type:
list[bytes]
- property complexes: list[Complex]#
Get the molecular complexes submitted for the fold request.
- Returns:
List of complexes.
- Return type:
list[Complex]
- property id#
Get the ID of the fold request.
- Returns:
Fold job ID.
- Return type:
str
- property metadata: FoldMetadata#
The fold metadata.
- property model_id: str#
The fold model used.
- get_item(index: int, key: None = None) Structure[source]#
- get_item(index: int, key: Literal['pae', 'pde', 'plddt', 'ptm'] | None = None) ndarray
- get_item(index: int, key: Literal['affinity']) BoltzAffinity
- get_item(index: int, key: Literal['confidence']) list[BoltzConfidence]
- get_item(index: int, key: Literal['score', 'metrics'] | None = None) DataFrame
Get fold results for a specified sequence.
- Parameters:
sequence (bytes) – Sequence to fetch results for.
- Returns:
Complex containing the folded structure.
- Return type:
- property args: dict[str, Any]#
The registered job arguments.
- cancelled()#
Check if the job has been cancelled.
- Returns:
True if the job is cancelled, False otherwise.
- Return type:
bool
- property created_date: datetime#
The creation timestamp of the job.
- done()#
Check if the job has completed.
- Returns:
True if the job is done, False otherwise.
- Return type:
bool
- property end_date: datetime | None#
The end timestamp of the job.
- property job_id: str#
The unique identifier of the job.
- property job_type: str#
The type of the job.
- property progress_counter: int#
The progress counter of the job.
- refresh()#
Refresh the job status and internal job object.
- property start_date: datetime | None#
The start timestamp of the job.
- property status: JobStatus#
The current status of the job.
- stream(key: None = None) Iterator[Structure][source]#
- stream(key: Literal['pae', 'pde', 'plddt', 'ptm'] | None = None) Iterator[ndarray]
- stream(key: Literal['affinity']) Iterator[BoltzAffinity]
- stream(key: Literal['confidence']) Iterator[list[BoltzConfidence]]
- stream(key: Literal['score', 'metrics'] | None = None) Iterator[DataFrame]
Retrieve results for this job as a stream.
- Returns:
A generator that yields (key, value) tuples.
- Return type:
Generator
- wait(interval=5, timeout=None, verbose=False)#
Wait for the job to complete, then fetch results.
- Parameters:
interval (int, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int | None, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.
- Returns:
The results of the job.
- Return type:
Any
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for the job to complete.
- Parameters:
interval (float, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.
- Returns:
True if the job completed successfully.
- Return type:
bool
Notes
This method does not fetch the job results, unlike wait().
- get(verbose: bool = False, key: None = None) list[Structure][source]#
- get(verbose: bool = False, key: Literal['pae', 'pde', 'plddt', 'ptm'] | None = None) list[ndarray]
- get(verbose: bool = False, key: Literal['affinity'] | None = None) list[BoltzAffinity]
- get(verbose: bool = False, key: Literal['confidence'] | None = None) list[list[BoltzConfidence]]
- get(verbose: bool = False, key: Literal['score', 'metrics'] | None = None) list[DataFrame]
Return all results from the job by consuming the stream.
- Parameters:
verbose (bool, optional) – If True, display a progress bar. Defaults to False.
**kwargs – Keyword arguments passed to the stream method.
- Returns:
A list containing all results from the job.
- Return type:
list
- get_pae()[source]#
Get the Predicted Aligned Error (PAE) matrix for all outputs.
- Returns:
PAE matrix.
- Return type:
list[np.ndarray]
- Raises:
AttributeError – If PAE is not supported for the model.
- get_pde()[source]#
Get the Predicted Distance Error (PDE) matrix.
- Returns:
PDE matrix.
- Return type:
list[np.ndarray]
- Raises:
AttributeError – If PDE is not supported for the model.
- get_plddt()[source]#
Get the Predicted Local Distance Difference Test (pLDDT) scores.
- Returns:
pLDDT scores.
- Return type:
list[np.ndarray]
- Raises:
AttributeError – If pLDDT is not supported for the model.
- get_ptm()[source]#
Get the Predicted TM (pTM) scores.
- Returns:
pTM scores.
- Return type:
list[np.ndarray]
- Raises:
AttributeError – If pTM is not supported for the model.
- get_score()[source]#
Get the predicted scores.
- Returns:
Structure prediction scores.
- Return type:
list[pd.DataFrame]
- Raises:
AttributeError – If score is not supported for the model.
- get_metrics()[source]#
Get the predicted metrics.
- Returns:
Structure prediction metrics.
- Return type:
list[pd.DataFrame]
- Raises:
AttributeError – If metrics is not supported for the model.