Using single site analysis#
This tutorial shows you how to perform a single site analysis using your trained models. It will enumerate all single mutants of a selected sequence and predict the relevant property selected at model training. The Predict module works with single sequences as well as single mutant variants of the sequence.
What you need before getting started#
This workflow requires experimental data. If you don’t have experimental data, get started with our PoET single site analysis tool which uses evolutionary data from a multiple sequence alignment. See Creating a multiple sequence alignment for more information.
In order to perform a single site analysis with the Predict module, you need a trained model. To upload a dataset and train a model, please see Uploading data and Training models.
Predicting your sequences#
Create a new Predict job for single site mutation analysis using your trained model:
[ ]:
sequence = assay.get_first().sequence[0]
sspredict = predictor.single_site(sequence)
[ ]:
sspredict
PredictSingleSiteJob(job_id='d0e7b585-a98c-45ea-8d27-1d81d8aad64e', job_type=<JobType.predictor_predict_single_site: '/predictor/predict_single_site'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2024, 12, 3, 6, 5, 22, 897647, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
Call your results:
[ ]:
ssp_results = sspredict.wait(verbose=True)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [09:44<00:00, 5.84s/it, status=SUCCESS]
[ ]:
from collections import namedtuple
Score = namedtuple("Score", ["sequence", "mean", "var"])
[Score(s, mu.item(), var.item()) for s, mu, var in zip(sspredict.sequences[:5], ssp_results[0].T[0][:5], ssp_results[1].T[0][:5])]
[Score(sequence=b'ARHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA', mean=-0.5376995205879211, var=0.0051577556878328),
Score(sequence=b'RRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA', mean=-0.6088411808013916, var=0.0051713902503252),
Score(sequence=b'NRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA', mean=-0.5321740508079529, var=0.0051800329238176),
Score(sequence=b'DRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA', mean=-0.5016323924064636, var=0.0051677394658327),
Score(sequence=b'CRHGDISSSNDTVGVAVVNYKMPRLHTAAEVLDNARKIAEMIVGMKQGLPGMDLVVFPEYSLQGIMYDPAEMMETAVAIPGEETEIFSRACRKANVWGVFSLTGERHEEHPRKAPYNTLVLIDNNGEIVQKYRKIIPWCPIEGWYPGGQTYVSEGPKGMKISLIICDDGNYPEIWRDCAMKGAELIVRCQGYMYPAKDQQVMMAKAMAWANNCYVAVANAAGFDGVYSYFGHSAIIGFDGRTLGECGEEEMGIQYAQLSLSQIRDARANDQSQNHLFKILHRGYSGLQASGDGDRGLAECPFEFYRTWVTDAEKARENVERLTRSTTGVAQCPVGRLPYEGLEKEA', mean=-0.5288747549057007, var=0.0051603335887193)]
Next steps#
Our Predictor API page contains more information about single site analysis using your trained model.
Once you’re finished evaluating single substitution variants, use Structure prediction to visualize your sequence of interest.