openprotein.api.align#
Some tools (e.g. PoET, AlphaFold2) require an MSA to be generated. The tools below will help you achieve this.
- class openprotein.api.align.AlignAPI[source]#
API interface for calling Poet and Align endpoints
- __init__(session)[source]#
- Parameters:
session (APISession)
- upload_msa(msa_file)[source]#
Upload an MSA from file.
- Parameters:
msa_file (str, optional) – Ready-made MSA. If not provided, default value is None.
- Raises:
APIError – If there is an issue with the API request.
- Returns:
Job object containing the details of the MSA upload.
- Return type:
MSAJob
- create_msa(seed)[source]#
Construct an MSA via homology search with the seed sequence.
- Parameters:
seed (bytes) – Seed sequence for the MSA construction.
- Raises:
APIError – If there is an issue with the API request.
- Returns:
Job object containing the details of the MSA construction.
- Return type:
MSAJob
- upload_prompt(prompt_file)[source]#
Directly upload a prompt.
Bypass post_msa and prompt_post steps entirely. In this case PoET will use the prompt as is. You can specify multiple prompts (one per replicate) with an <END_PROMPT> and newline between CSVs.
- Parameters:
prompt_file (BinaryIO) – Binary I/O object representing the prompt file.
- Raises:
APIError – If there is an issue with the API request.
- Returns:
An object representing the status and results of the prompt job.
- Return type:
PromptJob
- get_prompt(job, prompt_index=None)[source]#
Get prompts for a given job.
- Parameters:
job (Job) – The job for which to retrieve data.
prompt_index (Optional[int]) – The replicate number for the prompt (input_type=-PROMPT only)
- Returns:
A CSV reader for the response data.
- Return type:
csv.reader
- class openprotein.api.align.PromptFuture[source]#
Represents a result of a prompt job.
- session#
An instance of APISession for API interactions.
- Type:
- job#
The PoET scoring job.
- Type:
Job
- page_size#
The number of results to fetch in a single page.
- Type:
int
- Returns:
The list of results from the PoET scoring job.
- Return type:
List[PoetScoreResult]
- __init__(session, job, page_size=50000, msa_id=None)[source]#
init a PoetScoreFuture instance.
- Parameters:
(APISession) (session)
(Job) (job)
(int (page_size)
optional) (The number of results to fetch in a single page. Defaults to config.POET_PAGE_SIZE.)
session (APISession)
job (Job)
msa_id (str | None)
- get_input(input_type)#
See child function docs.
- Parameters:
input_type (PoetInputType)
- classmethod get_job_type()#
Return the job type associated with this Future class.
- get_msa()#
See child function docs.
- get_prompt(prompt_index=None)#
See child function docs.
- Parameters:
prompt_index (int | None)
- get_seed()#
See child function docs.
- refresh()#
refresh job status
- sample_prompt(num_sequences=None, num_residues=None, method=MSASamplingMethod.NEIGHBORS_NONGAP_NORM_NO_LIMIT, homology_level=0.8, max_similarity=1.0, min_similarity=0.0, always_include_seed_sequence=False, num_ensemble_prompts=1, random_seed=None)#
Create a protein sequence prompt from a linked MSA (Multiple Sequence Alignment) for PoET Jobs.
- Parameters:
num_sequences (int, optional) – Maximum number of sequences in the prompt. Must be <100.
num_residues (int, optional) – Maximum number of residues (tokens) in the prompt. Must be less than 24577.
method (MSASamplingMethod, optional) – Method to use for MSA sampling. Defaults to NEIGHBORS_NONGAP_NORM_NO_LIMIT.
homology_level (float, optional) – Level of homology for sequences in the MSA (neighbors methods only). Must be between 0 and 1. Defaults to 0.8.
max_similarity (float, optional) – Maximum similarity between sequences in the MSA and the seed. Must be between 0 and 1. Defaults to 1.0.
min_similarity (float, optional) – Minimum similarity between sequences in the MSA and the seed. Must be between 0 and 1. Defaults to 0.0.
always_include_seed_sequence (bool, optional) – Whether to always include the seed sequence in the MSA. Defaults to False.
num_ensemble_prompts (int, optional) – Number of ensemble jobs to run. Defaults to 1.
random_seed (int, optional) – Seed for random number generation. Defaults to a random number between 0 and 2**32-1.
- Raises:
InvalidParameterError – If provided parameter values are not in the allowed range.
MissingParameterError – If both or none of ‘num_sequences’, ‘num_residues’ is specified.
- Return type:
PromptJob
- wait(verbose=False)#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- class openprotein.api.align.MSAFuture[source]#
Represents a result of a MSA job.
- session#
An instance of APISession for API interactions.
- Type:
- job#
The PoET scoring job.
- Type:
Job
- page_size#
The number of results to fetch in a single page.
- Type:
int
- Returns:
The list of results from the PoET scoring job.
- Return type:
List[PoetScoreResult]
- __init__(session, job, page_size=50000)[source]#
init a PoetScoreFuture instance.
- Parameters:
session (APISession) – An instance of APISession for API interactions.
job (Job) – The PoET scoring job.
page_size (int) – The number of results to fetch in a single page.
- wait(verbose=False)[source]#
Wait for job to complete, then fetch results.
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results
- sample_prompt(num_sequences=None, num_residues=None, method=MSASamplingMethod.NEIGHBORS_NONGAP_NORM_NO_LIMIT, homology_level=0.8, max_similarity=1.0, min_similarity=0.0, always_include_seed_sequence=False, num_ensemble_prompts=1, random_seed=None)[source]#
Create a protein sequence prompt from a linked MSA (Multiple Sequence Alignment) for PoET Jobs.
- Parameters:
num_sequences (int, optional) – Maximum number of sequences in the prompt. Must be <100.
num_residues (int, optional) – Maximum number of residues (tokens) in the prompt. Must be less than 24577.
method (MSASamplingMethod, optional) – Method to use for MSA sampling. Defaults to NEIGHBORS_NONGAP_NORM_NO_LIMIT.
homology_level (float, optional) – Level of homology for sequences in the MSA (neighbors methods only). Must be between 0 and 1. Defaults to 0.8.
max_similarity (float, optional) – Maximum similarity between sequences in the MSA and the seed. Must be between 0 and 1. Defaults to 1.0.
min_similarity (float, optional) – Minimum similarity between sequences in the MSA and the seed. Must be between 0 and 1. Defaults to 0.0.
always_include_seed_sequence (bool, optional) – Whether to always include the seed sequence in the MSA. Defaults to False.
num_ensemble_prompts (int, optional) – Number of ensemble jobs to run. Defaults to 1.
random_seed (int, optional) – Seed for random number generation. Defaults to a random number between 0 and 2**32-1.
- Raises:
InvalidParameterError – If provided parameter values are not in the allowed range.
MissingParameterError – If both or none of ‘num_sequences’, ‘num_residues’ is specified.
- Return type:
PromptJob
- get_input(input_type)#
See child function docs.
- Parameters:
input_type (PoetInputType)
- classmethod get_job_type()#
Return the job type associated with this Future class.
- get_msa()#
See child function docs.
- get_prompt(prompt_index=None)#
See child function docs.
- Parameters:
prompt_index (int | None)
- get_seed()#
See child function docs.
- refresh()#
refresh job status
- wait_until_done(interval=5, timeout=None, verbose=False)#
Wait for job to complete. Do not fetch results (unlike wait())
- Parameters:
interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.
timeout (int, optional) – max time to wait. Defaults to None.
verbose (bool, optional) – verbosity flag. Defaults to False.
- Returns:
results of job
- Return type:
results