[2]:
%matplotlib inline

Getting started with sequence-based learning

This notebook will briefly cover how to run assaydata, train, predict, design workflows.

For more information please read the docs.

[3]:
import numpy as np
import matplotlib.pyplot as plt
import json
import pandas as pd
import seaborn as sns
sns.set()

Setup

Connect to the OpenProtein backend with your credentials:

[4]:
import openprotein

with open('secrets.config', 'r') as f:
    config = json.load(f)

session = openprotein.connect(username= config['username'], password= config['password'])

We will use a small sample of the AMIE PSEAE dataset as a demo, the full data is available on our website:

[5]:
dataset = pd.read_csv("./data/AMIE_PSEAE.csv")
dataset.head(2)
[5]:
sequence isobutyramide_normalized_fitness acetamide_normalized_fitness propionamide_normalized_fitness
0 WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMK... -0.5174 NaN NaN
1 WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMK... -0.5154 -2.1514 -1.1457

Data Upload

Create the Demo data in the backend to be able to use it with our suite of tools:

[6]:
# Create
assay = session.data.create(dataset, "Dataset Name", "Dataset description")
assay_id = assay.id
assay
[6]:
AssayMetadata(model_config={'protected_namespaces': ()}, assay_name='Dataset Name', assay_description='Dataset description', assay_id='78134472-be5b-4041-8bf0-21ebb480a9d7', original_filename='assay_data', created_date=datetime.datetime(2023, 10, 27, 3, 26, 22, 330136), num_rows=15, num_entries=41, measurement_names=['isobutyramide_normalized_fitness', 'acetamide_normalized_fitness', 'propionamide_normalized_fitness'], sequence_length=346)

We could also have loaded a job from an old job ID. This will be faster and more efficient for users resuming workflows:

[7]:
assay = session.data.load_job(assay_id) # can reload job to resume workflows
[8]:
assay.get_first()
[8]:
sequence isobutyramide_normalized_fitness acetamide_normalized_fitness propionamide_normalized_fitness
0 WRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMK... -0.5174 None None
[9]:
assay.get_slice(start=3, end=5)
[9]:
sequence isobutyramide_normalized_fitness acetamide_normalized_fitness propionamide_normalized_fitness
0 MRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKWAEMIVGMK... NaN NaN -0.7550
1 MRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKYAEMIVGMK... -0.7448 -1.7992 -0.9711
[10]:
assay.sequence_length
[10]:
346

Model training

We can use the assay object to create a training job:

[11]:
train = session.train.create_training_job(assay,
                                          measurement_name=["isobutyramide_normalized_fitness", "acetamide_normalized_fitness"],
                                          model_name="mymodel") # name the resulting model

train_id = train.id
train
[11]:
Jobplus(model_config={'protected_namespaces': ()}, status=<JobStatus.PENDING: 'PENDING'>, job_id='1def5433-70d1-42bd-b520-e58b196a273b', job_type='/workflow/train', created_date=datetime.datetime(2023, 10, 27, 3, 26, 22, 575393), start_date=None, end_date=None, prerequisite_job_id='696689f4-bf76-4cd3-b565-5d0b031e78e2', progress_message=None, progress_counter=None, num_records=None, sequence_length=346)
[12]:
#train = session.train.load_job(train_id)
#train
[13]:
train.refresh()
train.status
[13]:
<JobStatus.PENDING: 'PENDING'>

We can wait for the results before proceeding:

[14]:
results = train.wait(verbose=False)
[15]:
isobut_results = [i for i in results.traingraph if i.tag=="isobutyramide_normalized_fitness"]
sns.scatterplot(x=[i.step for i in isobut_results], y=[i.loss for i in isobut_results])
plt.xlabel("Steps")
plt.ylabel("Loss");
../_images/demos_core_demo_20_0.png

We can also request a cross-validation job to see the training results in more detail:

[16]:
cvjob = train.crossvalidate()
cvjob.status
[16]:
<JobStatus.PENDING: 'PENDING'>
[17]:
cvdata = cvjob.wait()
[18]:
cvresult = [i for i in cvdata if i.measurement_name == "isobutyramide_normalized_fitness"]

sns.regplot(x=[i.y for i in cvresult], y=[i.y_mu for i in cvresult])
plt.xlabel("Y")
plt.ylabel("Y-hat");
../_images/demos_core_demo_24_0.png

We can examine the models associated with a train or assaydata set. These will be identical here but multiple train jobs are possible on a single assaydata:

[19]:
train.list_models()
[19]:
[{'name': 'mymodel - acetamide_normalized_fitness',
  'description': '',
  'model_id': '70424c5f-2133-44d9-8e5c-40b1f8f22daa',
  'training_assaydata': '78134472-be5b-4041-8bf0-21ebb480a9d7',
  'job_id': '1def5433-70d1-42bd-b520-e58b196a273b',
  'created_date': '2023-10-27T03:33:37.657289',
  'model_type': 'EXACT_GP',
  'additional_metadata': {'input_dims': 13,
   'embedding_model': 'TorchLowRankSVD',
   'sequence_length': 346,
   'projection_layer': '78134472-be5b-4041-8bf0-21ebb480a9d7_pca.joblib',
   'measurement_names': ['acetamide_normalized_fitness'],
   'original_task_index': 1}},
 {'name': 'mymodel - isobutyramide_normalized_fitness',
  'description': '',
  'model_id': '1def5433-70d1-42bd-b520-e58b196a273b',
  'training_assaydata': '78134472-be5b-4041-8bf0-21ebb480a9d7',
  'job_id': '1def5433-70d1-42bd-b520-e58b196a273b',
  'created_date': '2023-10-27T03:33:37.033717',
  'model_type': 'EXACT_GP',
  'additional_metadata': {'input_dims': 13,
   'embedding_model': 'TorchLowRankSVD',
   'sequence_length': 346,
   'projection_layer': '78134472-be5b-4041-8bf0-21ebb480a9d7_pca.joblib',
   'measurement_names': ['isobutyramide_normalized_fitness'],
   'original_task_index': 0}}]
[20]:
assay.list_models()
[20]:
[{'name': 'mymodel - acetamide_normalized_fitness',
  'description': '',
  'model_id': '70424c5f-2133-44d9-8e5c-40b1f8f22daa',
  'training_assaydata': '78134472-be5b-4041-8bf0-21ebb480a9d7',
  'job_id': '1def5433-70d1-42bd-b520-e58b196a273b',
  'created_date': '2023-10-27T03:33:37.657289',
  'model_type': 'EXACT_GP',
  'additional_metadata': {'input_dims': 13,
   'embedding_model': 'TorchLowRankSVD',
   'sequence_length': 346,
   'projection_layer': '78134472-be5b-4041-8bf0-21ebb480a9d7_pca.joblib',
   'measurement_names': ['acetamide_normalized_fitness'],
   'original_task_index': 1}},
 {'name': 'mymodel - isobutyramide_normalized_fitness',
  'description': '',
  'model_id': '1def5433-70d1-42bd-b520-e58b196a273b',
  'training_assaydata': '78134472-be5b-4041-8bf0-21ebb480a9d7',
  'job_id': '1def5433-70d1-42bd-b520-e58b196a273b',
  'created_date': '2023-10-27T03:33:37.033717',
  'model_type': 'EXACT_GP',
  'additional_metadata': {'input_dims': 13,
   'embedding_model': 'TorchLowRankSVD',
   'sequence_length': 346,
   'projection_layer': '78134472-be5b-4041-8bf0-21ebb480a9d7_pca.joblib',
   'measurement_names': ['isobutyramide_normalized_fitness'],
   'original_task_index': 0}}]

Let’s take one of these models for further use:

[21]:
model_id = train.list_models()[0]['model_id']
train.list_models()[0]
[21]:
{'name': 'mymodel - acetamide_normalized_fitness',
 'description': '',
 'model_id': '70424c5f-2133-44d9-8e5c-40b1f8f22daa',
 'training_assaydata': '78134472-be5b-4041-8bf0-21ebb480a9d7',
 'job_id': '1def5433-70d1-42bd-b520-e58b196a273b',
 'created_date': '2023-10-27T03:33:37.657289',
 'model_type': 'EXACT_GP',
 'additional_metadata': {'input_dims': 13,
  'embedding_model': 'TorchLowRankSVD',
  'sequence_length': 346,
  'projection_layer': '78134472-be5b-4041-8bf0-21ebb480a9d7_pca.joblib',
  'measurement_names': ['acetamide_normalized_fitness'],
  'original_task_index': 1}}

Sequence design

We can set up a design job using our trained model as a criteria:

[22]:
from openprotein.models import DesignJobCreate, ModelCriterion, NMutationCriterion, Criterion
design_data = DesignJobCreate(
    assay_id=assay.id,
    criteria=[
        [
            ModelCriterion(
                criterion_type='model',
                model_id=model_id,
                measurement_name="acetamide_normalized_fitness",
                criterion=Criterion(target=-0.5, weight=1.0, direction="<")
            ),
        ],
        [NMutationCriterion(criterion_type="n_mutations", )]
    ],
    mutation_positions=[2,13],
    num_steps=10
)


json.loads(design_data.json())


[22]:
{'model_config': {'protected_namespaces': []},
 'assay_id': '78134472-be5b-4041-8bf0-21ebb480a9d7',
 'criteria': [[{'model_config': {'protected_namespaces': []},
    'criterion_type': 'model',
    'model_id': '70424c5f-2133-44d9-8e5c-40b1f8f22daa',
    'measurement_name': 'acetamide_normalized_fitness',
    'criterion': {'model_config': {'protected_namespaces': []},
     'target': -0.5,
     'weight': 1.0,
     'direction': '<'}}],
  [{'model_config': {'protected_namespaces': []},
    'criterion_type': 'n_mutations'}]],
 'num_steps': 10,
 'pop_size': None,
 'n_offsprings': None,
 'crossover_prob': None,
 'crossover_prob_pointwise': None,
 'mutation_average_mutations_per_seq': None,
 'mutation_positions': [2, 13]}
[23]:
# create the design job
design_job = session.design.create_design_job(design_data)
design_id = design_job.id
design_job
[23]:
Job(model_config={'protected_namespaces': ()}, status=<JobStatus.PENDING: 'PENDING'>, job_id='da3e4296-671b-45b4-bcff-25f943d4587d', job_type='/workflow/design', created_date=datetime.datetime(2023, 10, 27, 3, 33, 56, 146691), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=None, num_records=None)
[24]:

#design_job = session.design.load_job(design_id)
[25]:
results = design_job.wait()
results[-3:]
[25]:
[DesignStep(model_config={'protected_namespaces': ()}, step=9, sample_index=2557, sequence='MRHGDIMSSNMTVGVAVVFPKMPRDRSGEWRLDNADKIKYMTAGMKRKQQQYQLVVEPIRVWQGQMWDYAEYTEQMCYIPGCETSIHSDACKKVNVWGVYSLMGEKHEEHIDKAQYYTCDLIDMDGTFADKYYKISPWYQIEQWYWGQQDYVSRDPVDMKIYLAFCDCHNYPEIWYDTAMKGAYCCVPCNGSMQPAKDDEQQMAKAMQWCHNCYVEVKNMASGSGVQSDFRVSASIIFDQRIVAETGCTEMCIQYAQLSLSDQRYARCEPQSINKQFTIQMRGYSGLQASGDGDRPLIACPRHFVRTYVGTRVIYRESVHGNIRSTTGSAQADVGAWEYKMYENDA', initial_scores=[0.0, 195.0], scores=[[DesignSubscore(model_config={'protected_namespaces': ()}, score=0.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=-1.052441954612732, y_var=0.008634247817099094))], [DesignSubscore(model_config={'protected_namespaces': ()}, score=195.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=None, y_var=None))]], umap1=0.8074265122413635, umap2=4.540599346160889),
 DesignStep(model_config={'protected_namespaces': ()}, step=9, sample_index=2558, sequence='RRHGDISSNWDTYGVRVVNYTCPRLGHWAEVLANAPNCPGQILGMRLMLRGATGGRCPMYSLMGIMLTCAERMLTAQACVSETVHDFSEACRVATVWGVFKAGTQRCEECGIKGPYNCLVLIPQNGEAQQCYRKILLPCPMEGDYAQTQTYDSANPKGFESSQNHCRDPNEPSEWRDCASFGAELIVRCQGYRYPAKQIWPMNPKNMRWANNWYTGVANAACRDPHESIFPHSMIRGFDGRTWGYQGWEECITQCFQESLQQILSCRANDQSQNETFKIVKRSWRVLQALKRGDRGLNELCFRFYRTWVNDCPKARENVGRLTRSSPGCAQWSVGGLNYWGLEHRA', initial_scores=[0.0, 196.0], scores=[[DesignSubscore(model_config={'protected_namespaces': ()}, score=0.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=-1.052441954612732, y_var=0.008634313941001892))], [DesignSubscore(model_config={'protected_namespaces': ()}, score=196.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=None, y_var=None))]], umap1=0.8681785464286804, umap2=4.545353889465332),
 DesignStep(model_config={'protected_namespaces': ()}, step=9, sample_index=2559, sequence='MRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKLAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA', initial_scores=[0.0, 346.0], scores=[[DesignSubscore(model_config={'protected_namespaces': ()}, score=0.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=-1.022948145866394, y_var=0.008386557921767235))], [DesignSubscore(model_config={'protected_namespaces': ()}, score=346.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=None, y_var=None))]], umap1=0.20108361542224884, umap2=4.529971122741699)]

We can access the design results:

[26]:
results[-1].scores
[26]:
[[DesignSubscore(model_config={'protected_namespaces': ()}, score=0.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=-1.022948145866394, y_var=0.008386557921767235))],
 [DesignSubscore(model_config={'protected_namespaces': ()}, score=346.0, metadata=DesignMetadata(model_config={'protected_namespaces': ()}, y_mu=None, y_var=None))]]

Sequence predictions

We can also predict scores for new sequences using our models trained on our old sequences:

[27]:
import random
def fakeseq(length=100, amino_acids="GAVLIMFWPSTCYNQDEKRH"):
    simulated_protein = ""
    for i in range(length):
        aa = random.choice(amino_acids)
        simulated_protein += aa
    return simulated_protein

# Create some random sequences to predict
np.random.seed(111)
p_seqs = [fakeseq(assay.sequence_length) for i in range(3)]
p_seqs

[27]:
['CRMQLEAWVWPTPTIWQHTNTAAWFRRQIPPTRLKDYRSTIFKPFLYPKQDRFCQDLDGMPGNHYNGAVLTDVWFDQLAVMDVTNKPQYACHGDAETAVASIEKSITFWDMIEPDWHAMGPYGSNGYPDTLPRMVTDTPQEIAHCYVAYWYQANCCYIAMWYAKQHMVEWVWWMNEIPMHQKHEPIDHPSTLGAVYQMSKNMEERTMMKCNMKCPCRIGTQPCIGHDLRIIWYTGHRNPLYGHDIDQTYGKQMMRDESLDAMSKLVDQKKYLQIHLKCRRFFHFSVYILPHWPYAGMNKYQKQALDAYCLGYDLPNDKLKDVLDAYLSFFRQWHVIQYYIFHVKQP',
 'LPEDADIPAREQKNGPCWHRICTHCTGYPTKCWLVWAWPMFCFIRTTGAPCYPVHRHLPDNWHGEKLTQWRAQISWCHKIFDFEQCCMRVSVFSPGGWMGNEWAIANCWQMDLTGVQWMENYLPIHKLQLYNSDSFYEVRLPSSNDYICIWQYWPIFVNERPYKYQVKDRAHQASGFITVGKTCNGVQSNPDQTFQMWYCNYRGIKIYEQNEKWIDLDNSAKHMQGGPNRYEPRTQDTSDGLSDPDTFPQERPHGNDWYAEAKNYAKMTNFSRELDFKQPMLPADVMFRPVVIQCEQSRRYWCRGSCLQNVIPYCKYFYWFSHDWGNCPMLYMGMPFDLHFCDPAL',
 'YKALHMRKQEWCLRWDWEIETIWWGWNPHSGCYWTVCGDHYCDVPYICATDFSNSEEEYYCYFSEVSHEIPAFKKNVTQETDPKMQNNMRNYHCWPEGHQEQFARISNCVCKFHHMCVFISVWPMLADCPPGEKRICQATCMSCKKMKIMMGALGMIDYKTIATITQEEQAYNCPRHIHHMTDKVRHIMRMEKPTAYIHHHIFGWPPQAGWPWTCLLNWLIWISPKHPYVVIRPEYPDNCANNNLFDTEAWWVVFISFVTKKHELQQWVHRIHCHSQWSKWCIESISGRENKQGWGVEGLMMEQKMVLNLPAFCIKCSNNTSHGQTQSWKGSQITGDYQNIPWSGY']
[28]:
pjob = session.predict.create_predict_job(sequences=p_seqs, train_job=train)
pjob_id = pjob.id
pjob

[28]:
Job(model_config={'protected_namespaces': ()}, status=<JobStatus.PENDING: 'PENDING'>, job_id='5c61f674-7a5b-434a-b86a-327de4e95004', job_type='/workflow/predict', created_date=None, start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, num_records=None)
[29]:
results = pjob.wait(verbose=True)
Waiting: 100%|██████████| 100/100 [00:35<00:00,  2.81it/s, status=SUCCESS]
[30]:
results[0].dict()
[30]:
{'model_config': {'protected_namespaces': ()},
 'sequence': 'CRMQLEAWVWPTPTIWQHTNTAAWFRRQIPPTRLKDYRSTIFKPFLYPKQDRFCQDLDGMPGNHYNGAVLTDVWFDQLAVMDVTNKPQYACHGDAETAVASIEKSITFWDMIEPDWHAMGPYGSNGYPDTLPRMVTDTPQEIAHCYVAYWYQANCCYIAMWYAKQHMVEWVWWMNEIPMHQKHEPIDHPSTLGAVYQMSKNMEERTMMKCNMKCPCRIGTQPCIGHDLRIIWYTGHRNPLYGHDIDQTYGKQMMRDESLDAMSKLVDQKKYLQIHLKCRRFFHFSVYILPHWPYAGMNKYQKQALDAYCLGYDLPNDKLKDVLDAYLSFFRQWHVIQYYIFHVKQP',
 'predictions': [{'model_config': {'protected_namespaces': ()},
   'model_id': '1def5433-70d1-42bd-b520-e58b196a273b',
   'model_name': 'mymodel - isobutyramide_normalized_fitness',
   'properties': {'isobutyramide_normalized_fitness': {'y_mu': -0.5799487829208374,
     'y_var': 0.07197286188602448}}},
  {'model_config': {'protected_namespaces': ()},
   'model_id': '70424c5f-2133-44d9-8e5c-40b1f8f22daa',
   'model_name': 'mymodel - acetamide_normalized_fitness',
   'properties': {'acetamide_normalized_fitness': {'y_mu': -1.052441954612732,
     'y_var': 0.008634313941001892}}}]}

We can also send a single sequence for single site mutation analysis:

[31]:
sequence = assay.get_first().sequence[0]

sspredict = session.predict.create_predict_single_site(sequence, train)
[32]:
ssp_results = sspredict.wait(verbose=True)
ssp_results[0:3]
Waiting: 100%|██████████| 100/100 [04:31<00:00,  2.72s/it, status=SUCCESS]
[32]:
[SequencePrediction(model_config={'protected_namespaces': ()}, position=0, amino_acid='A', predictions=[Prediction(model_config={'protected_namespaces': ()}, model_id='1def5433-70d1-42bd-b520-e58b196a273b', model_name='mymodel - isobutyramide_normalized_fitness', properties={'isobutyramide_normalized_fitness': {'y_mu': -0.5117661356925964, 'y_var': 0.04643872752785683}}), Prediction(model_config={'protected_namespaces': ()}, model_id='70424c5f-2133-44d9-8e5c-40b1f8f22daa', model_name='mymodel - acetamide_normalized_fitness', properties={'acetamide_normalized_fitness': {'y_mu': -1.058506965637207, 'y_var': 0.0085744708776474}})]),
 SequencePrediction(model_config={'protected_namespaces': ()}, position=0, amino_acid='R', predictions=[Prediction(model_config={'protected_namespaces': ()}, model_id='1def5433-70d1-42bd-b520-e58b196a273b', model_name='mymodel - isobutyramide_normalized_fitness', properties={'isobutyramide_normalized_fitness': {'y_mu': -0.3882776200771332, 'y_var': 0.01813693717122078}}), Prediction(model_config={'protected_namespaces': ()}, model_id='70424c5f-2133-44d9-8e5c-40b1f8f22daa', model_name='mymodel - acetamide_normalized_fitness', properties={'acetamide_normalized_fitness': {'y_mu': -1.0603437423706055, 'y_var': 0.008427632972598076}})]),
 SequencePrediction(model_config={'protected_namespaces': ()}, position=0, amino_acid='N', predictions=[Prediction(model_config={'protected_namespaces': ()}, model_id='1def5433-70d1-42bd-b520-e58b196a273b', model_name='mymodel - isobutyramide_normalized_fitness', properties={'isobutyramide_normalized_fitness': {'y_mu': -0.396081805229187, 'y_var': 0.019559185951948166}}), Prediction(model_config={'protected_namespaces': ()}, model_id='70424c5f-2133-44d9-8e5c-40b1f8f22daa', model_name='mymodel - acetamide_normalized_fitness', properties={'acetamide_normalized_fitness': {'y_mu': -1.0605361461639404, 'y_var': 0.008434966206550598}})])]
[33]:
ssp_results[0:3][0].dict()
[33]:
{'model_config': {'protected_namespaces': ()},
 'position': 0,
 'amino_acid': 'A',
 'predictions': [{'model_config': {'protected_namespaces': ()},
   'model_id': '1def5433-70d1-42bd-b520-e58b196a273b',
   'model_name': 'mymodel - isobutyramide_normalized_fitness',
   'properties': {'isobutyramide_normalized_fitness': {'y_mu': -0.5117661356925964,
     'y_var': 0.04643872752785683}}},
  {'model_config': {'protected_namespaces': ()},
   'model_id': '70424c5f-2133-44d9-8e5c-40b1f8f22daa',
   'model_name': 'mymodel - acetamide_normalized_fitness',
   'properties': {'acetamide_normalized_fitness': {'y_mu': -1.058506965637207,
     'y_var': 0.0085744708776474}}}]}
[34]:
preds = pd.DataFrame([i.dict() for i in ssp_results])
preds['acetamide_normalized_fitness'] = [i[1]['properties']['acetamide_normalized_fitness']['y_mu'] for i in preds.predictions]
preds.head()
[34]:
model_config position amino_acid predictions acetamide_normalized_fitness
0 {'protected_namespaces': ()} 0 A [{'model_config': {'protected_namespaces': ()}... -1.058507
1 {'protected_namespaces': ()} 0 R [{'model_config': {'protected_namespaces': ()}... -1.060344
2 {'protected_namespaces': ()} 0 N [{'model_config': {'protected_namespaces': ()}... -1.060536
3 {'protected_namespaces': ()} 0 D [{'model_config': {'protected_namespaces': ()}... -1.064618
4 {'protected_namespaces': ()} 0 C [{'model_config': {'protected_namespaces': ()}... -1.067208
[35]:

df_pivot = preds.pivot(columns='position', index='amino_acid', values='acetamide_normalized_fitness') # Create heatmap plt.figure(figsize=(14, 5)) sns.heatmap(df_pivot, cmap='coolwarm', annot=False, fmt=".2f") plt.title('Acetamide Normalized Fitness Heatmap') plt.xlabel('Amino Acid') plt.ylabel('Position') plt.show()
../_images/demos_core_demo_47_0.png

Resume workflows

Lastly, it’s possible to resume from where you left off with the job id:

[36]:
train = session.train.load_job(train_id)
train
[36]:
Jobplus(model_config={'protected_namespaces': ()}, status=<JobStatus.SUCCESS: 'SUCCESS'>, job_id='1def5433-70d1-42bd-b520-e58b196a273b', job_type='/workflow/train', created_date=datetime.datetime(2023, 10, 27, 3, 26, 22, 575393), start_date=datetime.datetime(2023, 10, 27, 3, 33, 6, 881307), end_date=datetime.datetime(2023, 10, 27, 3, 33, 37, 832716), prerequisite_job_id='696689f4-bf76-4cd3-b565-5d0b031e78e2', progress_message=None, progress_counter=None, num_records=None, sequence_length=346)

This reloaded job can be used as above for predict or design tasks, and those can also be reloaded!

[37]:
pjob = session.predict.load_job(pjob_id)
pjob
[37]:
Job(model_config={'protected_namespaces': ()}, status=<JobStatus.SUCCESS: 'SUCCESS'>, job_id='5c61f674-7a5b-434a-b86a-327de4e95004', job_type='/workflow/predict', created_date=datetime.datetime(2023, 10, 27, 3, 46, 44, 300406), start_date=datetime.datetime(2023, 10, 27, 3, 46, 53, 985978), end_date=datetime.datetime(2023, 10, 27, 3, 47, 17, 961787), prerequisite_job_id='1def5433-70d1-42bd-b520-e58b196a273b', progress_message=None, progress_counter=None, num_records=None)