openprotein.align#

Some tools (e.g. PoET, AlphaFold2) require an MSA to be generated. The tools below will help you achieve this.

class openprotein.align.AlignAPI[source]#

API interface for calling Poet and Align endpoints

__init__(session)[source]#
Parameters:

session (APISession)

upload_msa(msa_file)[source]#

Upload an MSA from file.

Parameters:

msa_file (str, optional) – Ready-made MSA. If not provided, default value is None.

Raises:

APIError – If there is an issue with the API request.

Returns:

Future object awaiting the contents of the MSA upload.

Return type:

MSAFuture

create_msa(seed)[source]#

Construct an MSA via homology search with the seed sequence.

Parameters:

seed (bytes) – Seed sequence for the MSA construction.

Raises:

APIError – If there is an issue with the API request.

Returns:

Job object containing the details of the MSA construction.

Return type:

MSAJob

upload_prompt(prompt_file)[source]#

Directly upload a prompt.

Bypass post_msa and prompt_post steps entirely. In this case PoET will use the prompt as is. You can specify multiple prompts (one per replicate) with an <END_PROMPT> and newline between CSVs.

Parameters:

prompt_file (BinaryIO) – Binary I/O object representing the prompt file.

Raises:

APIError – If there is an issue with the API request.

Returns:

An object representing the status and results of the prompt job.

Return type:

PromptJob

get_prompt(job, prompt_index=None)[source]#

Get prompts for a given job.

Parameters:
  • job (Job) – The job for which to retrieve data.

  • prompt_index (Optional[int]) – The replicate number for the prompt (input_type=-PROMPT only)

Returns:

A CSV reader for the response data.

Return type:

csv.reader

get_seed(job)[source]#

Get input data for a given msa job.

Parameters:

job (Job) – The job for which to retrieve data.

Returns:

A CSV reader for the response data.

Return type:

csv.reader

get_msa(job)[source]#

Get generated MSA for a given job.

Parameters:

job (Job) – The job for which to retrieve data.

Returns:

A CSV reader for the response data.

Return type:

csv.reader

class openprotein.align.PromptFuture[source]#

Represents a result of a prompt job.

session#

An instance of APISession for API interactions.

Type:

APISession

job#

The PoET scoring job.

Type:

Job

page_size#

The number of results to fetch in a single page.

Type:

int

get(verbose=False)[source]#

Get the final results of the PoET scoring job.

Parameters:
  • prompt_index (int | None)

  • verbose (bool)

Return type:

Iterator[list[str]]

Returns:

The list of results from the PoET scoring job.

Return type:

List[PoetScoreResult]

__init__(session, job, page_size=50000, msa_id=None)[source]#

init a PoetScoreFuture instance.

Parameters:
  • (APISession) (session)

  • (Job) (job)

  • (int (page_size)

  • optional) (The number of results to fetch in a single page. Defaults to config.POET_PAGE_SIZE.)

  • session (APISession)

  • job (PromptJob)

  • page_size (int)

  • msa_id (str | None)

get(prompt_index=None, verbose=False)[source]#

Return the results from this job.

Parameters:
  • prompt_index (int | None)

  • verbose (bool)

Return type:

Iterator[list[str]]

cancelled()#

check if job is cancelled

Return type:

bool

done()#

Check if job is complete

Return type:

bool

get_input(input_type)#

See child function docs.

Parameters:

input_type (AlignType)

get_seed()#

See child function docs.

refresh()#

Refresh job status.

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

class openprotein.align.MSAFuture[source]#

Represents a result of a MSA job.

session#

An instance of APISession for API interactions.

Type:

APISession

job#

The PoET scoring job.

Type:

Job

page_size#

The number of results to fetch in a single page.

Type:

int

get(verbose=False)[source]#

Get the final results of the PoET scoring job.

Parameters:

verbose (bool)

Return type:

Iterator[list[str]]

Returns:

The list of results from the PoET scoring job.

Return type:

List[PoetScoreResult]

__init__(session, job, page_size=50000)[source]#

init a PoetScoreFuture instance.

Parameters:
  • session (APISession) – An instance of APISession for API interactions.

  • job (Job) – The PoET scoring job.

  • page_size (int) – The number of results to fetch in a single page.

get(verbose=False)[source]#

Return the results from this job.

Parameters:

verbose (bool)

Return type:

Iterator[list[str]]

sample_prompt(num_sequences=None, num_residues=None, method=MSASamplingMethod.NEIGHBORS_NONGAP_NORM_NO_LIMIT, homology_level=0.8, max_similarity=1.0, min_similarity=0.0, always_include_seed_sequence=False, num_ensemble_prompts=1, random_seed=None)[source]#

Create a protein sequence prompt from a linked MSA (Multiple Sequence Alignment) for PoET Jobs.

Parameters:
  • num_sequences (int, optional) – Maximum number of sequences in the prompt. Must be <100.

  • num_residues (int, optional) – Maximum number of residues (tokens) in the prompt. Must be less than 24577.

  • method (MSASamplingMethod, optional) – Method to use for MSA sampling. Defaults to NEIGHBORS_NONGAP_NORM_NO_LIMIT.

  • homology_level (float, optional) – Level of homology for sequences in the MSA (neighbors methods only). Must be between 0 and 1. Defaults to 0.8.

  • max_similarity (float, optional) – Maximum similarity between sequences in the MSA and the seed. Must be between 0 and 1. Defaults to 1.0.

  • min_similarity (float, optional) – Minimum similarity between sequences in the MSA and the seed. Must be between 0 and 1. Defaults to 0.0.

  • always_include_seed_sequence (bool, optional) – Whether to always include the seed sequence in the MSA. Defaults to False.

  • num_ensemble_prompts (int, optional) – Number of ensemble jobs to run. Defaults to 1.

  • random_seed (int, optional) – Seed for random number generation. Defaults to a random number between 0 and 2**32-1.

Raises:
  • InvalidParameterError – If provided parameter values are not in the allowed range.

  • MissingParameterError – If both or none of ‘num_sequences’, ‘num_residues’ is specified.

Return type:

PromptJob

cancelled()#

check if job is cancelled

Return type:

bool

done()#

Check if job is complete

Return type:

bool

get_input(input_type)#

See child function docs.

Parameters:

input_type (AlignType)

get_seed()#

See child function docs.

refresh()#

Refresh job status.

wait(interval=5, timeout=None, verbose=False)#

Wait for job to complete, then fetch results.

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results

wait_until_done(interval=5, timeout=None, verbose=False)#

Wait for job to complete. Do not fetch results (unlike wait())

Parameters:
  • interval (int, optional) – time between polling. Defaults to config.POLLING_INTERVAL.

  • timeout (int, optional) – max time to wait. Defaults to None.

  • verbose (bool, optional) – verbosity flag. Defaults to False.

Returns:

results of job

Return type:

results