openprotein.models#
Unified access to models on the OpenProtein AI platform. Use them to work at a lower level to craft your own workflows.
Note that the Models API is a WIP interface, but we are working hard on bringing all models here for a consistent and simple developer experience.
Interface#
Models#
RFdiffusion#
RFdiffusion is diffusion model that can be used for de novo structure design and binder design. It can be used with our Query interface to define structure prediction objectives in a unified manner. It also supports taking in the contigs defined in official RFdiffusion repo.
- class openprotein.models.RFdiffusionModel(session, model_id='rfdiffusion')[source]#
RFdiffusion model for generating de novo protein structures.
This model supports functionalities like unconditional design, scaffolding, and binder design.
- get_metadata()[source]#
Get model metadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- generate(query=None, contigs=None, structure_file=None, N=1, inpaint_seq=None, provide_seq=None, hotspot=None, T=None, partial_T=None, use_active_site_model=None, use_beta_model=None, symmetry=None, order=None, add_potential=None, scaffold_target_structure_file=None, scaffold_target_use_struct=False, **kwargs)[source]#
Run a protein structure generate job using RFdiffusion.
- Parameters:
query (str or bytes or Protein or Complex or Query, optional) – A query representing the design specification. Use either query or contigs for default design. Or provide scaffold_target_structure_file for scaffold guided design. query provides a unified way to represent design specifications on the OpenProtein platform. In this case, the structure mask of the containing Complex proteins are specified to be designed. Other parameters like binding are passed as hotspots to RFdiffusion.
contigs (int, str, optional) – Defines the lengths and connectivity of chain segments for the desired structure, specified in RFdiffusion’s contig string format. Required for most design tasks. Example: 150, ‘10-20/A100-110/10-20’ for a binder design.
structure_file (BinaryIO, optional) – An input PDB file (as a file-like object) used for inpainting or other guided design tasks where parts of an existing structure are provided.
n (int, optional) – The number of unique design trajectories to run (default is 1).
inpaint_seq (str, optional) – A string specifying the regions in the input structure to mask for in-painting. Example: ‘A1-A10/A30-40’.
provide_seq (str, optional) – A string specifying which segments of the contig have a provided sequence. Example: ‘A1-A10/A30-40’.
hotspot (str, optional) – A string specifying hotspot residues to constrain during design, typically for functional sites. Example: ‘A10,A12,A14’.
T (int, optional) – The number of timesteps for the diffusion process.
partial_T (int, optional) – The number of timesteps for partial diffusion.
use_active_site_model (bool, optional) – If True, uses the active site model checkpoint, which has been finetuned to better keep very small motifs in place in the output for motif scaffolding (default is False).
use_beta_model (bool, optional) – If True, uses the complex beta model checkpoint, which generates a greater diversity of topologies but has not been extensively experimentally validated (default is False).
symmetry ({"cyclic", "dihedral", "tetrahedral"}, optional) – The type of symmetry to apply to the design.
order (int, optional) – The order of the symmetry (e.g., 3 for C3 or D3 symmetry). Must be provided if symmetry is set.
add_potential (bool, optional) – A flag to toggle an additional potential to guide the design. This defaults to true in the case of symmetric design.
scaffold_target_structure_file (str, bytes, BinaryIO, optional) – A PDB file (which can be the text string or bytes or the file-like object) containing a scaffold structure to be used as a structural guide. It could also be used as a target when doing scaffold guided binder design with scaffold_target_use_struct.
scaffold_target_use_struct (bool, optional) – Whether or not to use the provided scaffold structure as a target. Otherwise, it is used only as a topology guide.
**kwargs (dict) – Additional keyword args that are passed directly to the rfdiffusion inference script. Overwrites any preceding options.
- Returns:
A future object that can be used to retrieve the results of the design job upon completion.
- Return type:
- predict(query=None, contigs=None, structure_file=None, N=1, inpaint_seq=None, provide_seq=None, hotspot=None, T=None, partial_T=None, use_active_site_model=None, use_beta_model=None, symmetry=None, order=None, add_potential=None, scaffold_target_structure_file=None, scaffold_target_use_struct=False, **kwargs)#
Run a protein structure generate job using RFdiffusion.
- Parameters:
query (str or bytes or Protein or Complex or Query, optional) – A query representing the design specification. Use either query or contigs for default design. Or provide scaffold_target_structure_file for scaffold guided design. query provides a unified way to represent design specifications on the OpenProtein platform. In this case, the structure mask of the containing Complex proteins are specified to be designed. Other parameters like binding are passed as hotspots to RFdiffusion.
contigs (int, str, optional) – Defines the lengths and connectivity of chain segments for the desired structure, specified in RFdiffusion’s contig string format. Required for most design tasks. Example: 150, ‘10-20/A100-110/10-20’ for a binder design.
structure_file (BinaryIO, optional) – An input PDB file (as a file-like object) used for inpainting or other guided design tasks where parts of an existing structure are provided.
n (int, optional) – The number of unique design trajectories to run (default is 1).
inpaint_seq (str, optional) – A string specifying the regions in the input structure to mask for in-painting. Example: ‘A1-A10/A30-40’.
provide_seq (str, optional) – A string specifying which segments of the contig have a provided sequence. Example: ‘A1-A10/A30-40’.
hotspot (str, optional) – A string specifying hotspot residues to constrain during design, typically for functional sites. Example: ‘A10,A12,A14’.
T (int, optional) – The number of timesteps for the diffusion process.
partial_T (int, optional) – The number of timesteps for partial diffusion.
use_active_site_model (bool, optional) – If True, uses the active site model checkpoint, which has been finetuned to better keep very small motifs in place in the output for motif scaffolding (default is False).
use_beta_model (bool, optional) – If True, uses the complex beta model checkpoint, which generates a greater diversity of topologies but has not been extensively experimentally validated (default is False).
symmetry ({"cyclic", "dihedral", "tetrahedral"}, optional) – The type of symmetry to apply to the design.
order (int, optional) – The order of the symmetry (e.g., 3 for C3 or D3 symmetry). Must be provided if symmetry is set.
add_potential (bool, optional) – A flag to toggle an additional potential to guide the design. This defaults to true in the case of symmetric design.
scaffold_target_structure_file (str, bytes, BinaryIO, optional) – A PDB file (which can be the text string or bytes or the file-like object) containing a scaffold structure to be used as a structural guide. It could also be used as a target when doing scaffold guided binder design with scaffold_target_use_struct.
scaffold_target_use_struct (bool, optional) – Whether or not to use the provided scaffold structure as a target. Otherwise, it is used only as a topology guide.
**kwargs (dict) – Additional keyword args that are passed directly to the rfdiffusion inference script. Overwrites any preceding options.
- Returns:
A future object that can be used to retrieve the results of the design job upon completion.
- Return type:
Results#
- class openprotein.models.RFdiffusionFuture(session, job, N=None, **kwargs)[source]#
Future for handling the results of an RFdiffusion job.
- get_item(replicate=0)[source]#
Retrieve the output Complex for a specific design.
- Parameters:
replicate (int) – The 0-based index of the design to retrieve.
- Returns:
The designed Complex.
- Return type:
- property args: dict[str, Any]#
The registered job arguments.
- cancelled()#
Check if the job has been cancelled.
- Returns:
True if the job is cancelled, False otherwise.
- Return type:
bool
- property created_date: datetime#
The creation timestamp of the job.
- done()#
Check if the job has completed.
- Returns:
True if the job is done, False otherwise.
- Return type:
bool
- property end_date: datetime | None#
The end timestamp of the job.
- get(**kwargs)#
Return all results from the job by consuming the stream.
- Parameters:
verbose (bool, optional) – If True, display a progress bar. Defaults to False.
**kwargs – Keyword arguments passed to the stream method.
- Returns:
A list containing all results from the job.
- Return type:
list
- property id: str#
The unique identifier of the job.
- property job_id: str#
The unique identifier of the job.
- property job_type: str#
The type of the job.
- property progress_counter: int#
The progress counter of the job.
- refresh()#
Refresh the job status and internal job object.
- property start_date: datetime | None#
The start timestamp of the job.
- property status: JobStatus#
The current status of the job.
- stream(**kwargs)#
Retrieve results for this job as a stream.
- Returns:
A generator that yields (key, value) tuples.
- Return type:
Generator
- wait(interval=5, timeout=None, verbose=False)#
Wait for the job to complete, then fetch results.
- Parameters:
interval (int, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int | None, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.
- Returns:
The results of the job.
- Return type:
Any
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for the job to complete.
- Parameters:
interval (float, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.
- Returns:
True if the job completed successfully.
- Return type:
bool
Notes
This method does not fetch the job results, unlike wait().
BoltzGen#
BoltzGen is a structure generation model that can be used for generating de novo structures along with nanobody scaffolds. It can be used with our Query interface to define structure prediction objectives in a unified manner. It also supports taking in a design_spec which follows the official design specification from BoltzGen.
- class openprotein.models.BoltzGenModel(session, model_id='boltzgen')[source]#
BoltzGen model for generating de novo protein structures.
This model supports functionalities like unconditional design, scaffolding, and binder design.
- get_metadata()[source]#
Get model metadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- generate(query=None, design_spec=None, structure_file=None, N=1, diffusion_batch_size=None, step_scale=None, noise_scale=None, scaffolds=None, scaffold_set=None, extra_structure_files=None, **kwargs)[source]#
Run a protein structure generate job using BoltzGen.
- Parameters:
query (str or bytes or Protein or Complex or Query, optional) – A query representing the design specification. Either query or design_spec must be provided. query provides a unified way to represent design specifications on the OpenProtein platform. In this case, the structure mask of the containing Complex proteins are specified to be designed. Other parameters like binding, group, secondary structures, etc. are also passed through to BoltzGen.
design_spec (BoltzGenDesignSpec | dict[str, Any] | None, optional) – The BoltzGen design specification to run. Either query or design_spec must be provided. design_spec exposes a low-level interface to using BoltzGen by accepting the YAML specification used by official BoltzGen examples. Can be a typed BoltzGenDesignSpec object or a dict representing the BoltzGen yaml request specification. Note: If the design_spec includes file paths, provide these extra files either using scaffolds or extra_structure_files.
structure_file (str | bytes | BinaryIO | None, optional) – (Deprecated: use extra_structure_files) An input PDB/CIF file used for inpainting or other guided design tasks where parts of an existing structure are provided. This parameter provides the actual structure content that corresponds to any FileEntity path fields in the design_spec. Can be: - A file path (str) to read from - Raw file content (bytes) - A file-like object (BinaryIO)
n (int, optional) – The number of unique design trajectories to run (default is 1).
diffusion_batch_size (int, optional) – The batch size for diffusion sampling. Controls how many samples are processed in parallel during the diffusion process.
step_scale (float, optional) – Scaling factor for the number of diffusion steps. Higher values may improve quality at the cost of longer generation time.
noise_scale (float, optional) – Scaling factor for the noise schedule during diffusion. Controls the amount of noise added at each step of the reverse diffusion process.
scaffolds (dict[str, str | bytes | BinaryIO] | None, optional) – Dictionary mapping scaffold filenames to their content. Each value can be: - A file path (str) to read from - Raw file content (bytes) - A file-like object (BinaryIO) These files will be packaged into a gzipped tar archive and made available to the design process under the ‘scaffolds/’ directory.
scaffold_set (Scaffolds | str | None, optional) – A pre-defined scaffold set object. Alternative to providing individual scaffold files via the scaffolds parameter.
extra_structure_files (dict[str, str | bytes | BinaryIO] | None, optional) – Dictionary mapping additional structure filenames to their content, with the same format options as scaffolds. These files will be packaged into the same archive under the ‘extra/’ directory and can be referenced in the design specification.
**kwargs (dict) – Additional keyword args that are passed directly to the boltzgen inference script. Overwrites any preceding options.
- Returns:
A future object that can be used to retrieve the results of the design job upon completion.
- Return type:
- predict(query=None, design_spec=None, structure_file=None, N=1, diffusion_batch_size=None, step_scale=None, noise_scale=None, scaffolds=None, scaffold_set=None, extra_structure_files=None, **kwargs)#
Run a protein structure generate job using BoltzGen.
- Parameters:
query (str or bytes or Protein or Complex or Query, optional) – A query representing the design specification. Either query or design_spec must be provided. query provides a unified way to represent design specifications on the OpenProtein platform. In this case, the structure mask of the containing Complex proteins are specified to be designed. Other parameters like binding, group, secondary structures, etc. are also passed through to BoltzGen.
design_spec (BoltzGenDesignSpec | dict[str, Any] | None, optional) – The BoltzGen design specification to run. Either query or design_spec must be provided. design_spec exposes a low-level interface to using BoltzGen by accepting the YAML specification used by official BoltzGen examples. Can be a typed BoltzGenDesignSpec object or a dict representing the BoltzGen yaml request specification. Note: If the design_spec includes file paths, provide these extra files either using scaffolds or extra_structure_files.
structure_file (str | bytes | BinaryIO | None, optional) – (Deprecated: use extra_structure_files) An input PDB/CIF file used for inpainting or other guided design tasks where parts of an existing structure are provided. This parameter provides the actual structure content that corresponds to any FileEntity path fields in the design_spec. Can be: - A file path (str) to read from - Raw file content (bytes) - A file-like object (BinaryIO)
n (int, optional) – The number of unique design trajectories to run (default is 1).
diffusion_batch_size (int, optional) – The batch size for diffusion sampling. Controls how many samples are processed in parallel during the diffusion process.
step_scale (float, optional) – Scaling factor for the number of diffusion steps. Higher values may improve quality at the cost of longer generation time.
noise_scale (float, optional) – Scaling factor for the noise schedule during diffusion. Controls the amount of noise added at each step of the reverse diffusion process.
scaffolds (dict[str, str | bytes | BinaryIO] | None, optional) – Dictionary mapping scaffold filenames to their content. Each value can be: - A file path (str) to read from - Raw file content (bytes) - A file-like object (BinaryIO) These files will be packaged into a gzipped tar archive and made available to the design process under the ‘scaffolds/’ directory.
scaffold_set (Scaffolds | str | None, optional) – A pre-defined scaffold set object. Alternative to providing individual scaffold files via the scaffolds parameter.
extra_structure_files (dict[str, str | bytes | BinaryIO] | None, optional) – Dictionary mapping additional structure filenames to their content, with the same format options as scaffolds. These files will be packaged into the same archive under the ‘extra/’ directory and can be referenced in the design specification.
**kwargs (dict) – Additional keyword args that are passed directly to the boltzgen inference script. Overwrites any preceding options.
- Returns:
A future object that can be used to retrieve the results of the design job upon completion.
- Return type:
Results#
- class openprotein.models.BoltzGenFuture(session, job, N=None, **kwargs)[source]#
Future for handling the results of an RFdiffusion job.
- get_item(replicate=0)[source]#
Retrieve the output Complex for a specific design.
- Parameters:
replicate (int) – The 0-based index of the design to retrieve.
- Returns:
The designed Complex.
- Return type:
- property args: dict[str, Any]#
The registered job arguments.
- cancelled()#
Check if the job has been cancelled.
- Returns:
True if the job is cancelled, False otherwise.
- Return type:
bool
- property created_date: datetime#
The creation timestamp of the job.
- done()#
Check if the job has completed.
- Returns:
True if the job is done, False otherwise.
- Return type:
bool
- property end_date: datetime | None#
The end timestamp of the job.
- get(**kwargs)#
Return all results from the job by consuming the stream.
- Parameters:
verbose (bool, optional) – If True, display a progress bar. Defaults to False.
**kwargs – Keyword arguments passed to the stream method.
- Returns:
A list containing all results from the job.
- Return type:
list
- property id: str#
The unique identifier of the job.
- property job_id: str#
The unique identifier of the job.
- property job_type: str#
The type of the job.
- property progress_counter: int#
The progress counter of the job.
- refresh()#
Refresh the job status and internal job object.
- property start_date: datetime | None#
The start timestamp of the job.
- property status: JobStatus#
The current status of the job.
- stream(**kwargs)#
Retrieve results for this job as a stream.
- Returns:
A generator that yields (key, value) tuples.
- Return type:
Generator
- wait(interval=5, timeout=None, verbose=False)#
Wait for the job to complete, then fetch results.
- Parameters:
interval (int, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int | None, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.
- Returns:
The results of the job.
- Return type:
Any
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for the job to complete.
- Parameters:
interval (float, optional) – Time in seconds between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – Maximum time in seconds to wait. Defaults to None.
verbose (bool, optional) – Verbosity flag. Defaults to False.
- Returns:
True if the job completed successfully.
- Return type:
bool
Notes
This method does not fetch the job results, unlike wait().
ProteinMPNN#
ProteinMPNN is a sequence generation model that can be used for inverse folding, and is a natural next step after using structure generation models. It can be used with our Query interface to define sequence generation objectives in a unified manner, similar to our PoET2Model.
- class openprotein.models.ProteinMPNNModel(session)[source]#
Class for ProteinMPNN model.
Model inference requires an input structure which is provided by a query.
Examples
View specific model details (including supported tokens) with the ? operator.
Examples
>>> import openprotein >>> session = openprotein.connect(username="user", password="password") >>> session.models.proteinmpnn?
- get_metadata()[source]#
Get model metadata for this model.
- Returns:
The metadata associated with this model.
- Return type:
ModelMetadata
- score(sequences, query)[source]#
Score query sequences based on the specified query.
- Parameters:
- Returns:
A future object that returns the scores of the submitted sequences.
- Return type:
- indel(sequence, query, insert=None, delete=None, **kwargs)[source]#
Score all indels of the query sequence based on the specified query.
- Parameters:
sequence (bytes) – Sequence to analyze.
query (str or bytes or Protein or Query or None, optional) – Query to use with prompt.
insert (str or None, optional) – Insertion fragment at each site.
delete (list of int or None, optional) – Range of size of fragment to delete at each site.
**kwargs – Additional keyword arguments.
- Returns:
A future object that returns the scores of the indel-ed sequence.
- Return type:
- Raises:
ValueError – If neither insert nor delete is provided.
- single_site(sequence, query)[source]#
Score all single substitutions of the query sequence using the specified query.
- Parameters:
- Returns:
A future object that returns the scores of the mutated sequence.
- Return type:
- generate(query, num_samples=100, temperature=0.1, seed=None)[source]#
Generate protein sequences based on a masked input query.
- Parameters:
query (str or bytes or Protein or Complex or Query) – Query specifying the structure to generate sequences for.
num_samples (int, optional) – The number of samples to generate. Default is 100.
temperature (float, optional) – The temperature for sampling. Higher values produce more random outputs. Default is 0.1.
- Returns:
A future object representing the status and information about the generation job.
- Return type: