Open In Colab Get Notebook View In GitHub

Using AlphaFold2#

This tutorial shows you how to use the AlphaFold2 model to create a PDB of your protein sequence of interest. We recommend using AlphaFold2 with multi-chain sequences. If you have a single-chain sequence, please visit Using ESMFold. If you have ligands or DNA/RNA of interest, please try Using Boltz instead.

What you need before getting started#

Specify a sequence of interest whose structure you want to predict. The example used here is interleukin 2:

[1]:
import openprotein

# Login to your session
session = openprotein.connect()

# Specify your sequence
sequence = "MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP"

Creating an MSA#

AlphaFold2 requires evolutionary context from a multiple sequence alignment (MSA) to make structure predictions. This section demonstrates how to create an MSA based on the sequence you wish to fold.

Start by getting the alphafold model object:

[2]:
afmodel = session.fold.get_model('alphafold2')
afmodel.fold?
Signature:
afmodel.fold(
    proteins: list[openprotein.protein.Protein] | openprotein.align.msa.MSAFuture | None = None,
    num_recycles: int | None = None,
    num_models: int = 1,
    num_relax: int = 0,
    **kwargs,
) -> openprotein.fold.future.FoldComplexResultFuture
Docstring:
Post sequences to alphafold model.

Parameters
----------
proteins : List[Protein] | MSAFuture
    List of protein sequences to fold. `Protein` objects must be tagged with an `msa`. Alternatively, supply an `MSAFuture` to use all query sequences as a multimer.
num_recycles : int
    number of times to recycle models
num_models : int
    number of models to train - best model will be used
num_relax : int
    maximum number of iterations for relax

Returns
-------
job : Job
File:      ~/Projects/openprotein/openprotein-python-private/openprotein/fold/alphafold2.py
Type:      method

You can review some of the metadata about the AlphaFold2 model. Note that the input tokens for the model is null because it accepts an MSA instead of directly with sequences.

[3]:
afmodel.metadata
[3]:
ModelMetadata(id='alphafold2', description=ModelDescription(citation_title='Highly accurate protein structure prediction with AlphaFold.', doi='10.1038/s41586-021-03819-2', summary='alphafold2 model.'), max_sequence_length=2400, dimension=-1, output_types=['fold'], input_tokens=None, output_tokens=None, token_descriptions=[[TokenInfo(id=0, token='A', primary=True, description='Alanine')], [TokenInfo(id=1, token='R', primary=True, description='Arginine')], [TokenInfo(id=2, token='N', primary=True, description='Asparagine')], [TokenInfo(id=3, token='D', primary=True, description='Aspartic acid')], [TokenInfo(id=4, token='C', primary=True, description='Cysteine')], [TokenInfo(id=5, token='Q', primary=True, description='Glutamine')], [TokenInfo(id=6, token='E', primary=True, description='Glutamic acid')], [TokenInfo(id=7, token='G', primary=True, description='Glycine')], [TokenInfo(id=8, token='H', primary=True, description='Histidine')], [TokenInfo(id=9, token='I', primary=True, description='Isoleucine')], [TokenInfo(id=10, token='L', primary=True, description='Leucine')], [TokenInfo(id=11, token='K', primary=True, description='Lysine')], [TokenInfo(id=12, token='M', primary=True, description='Methionine')], [TokenInfo(id=13, token='F', primary=True, description='Phenylalanine')], [TokenInfo(id=14, token='P', primary=True, description='Proline')], [TokenInfo(id=15, token='S', primary=True, description='Serine')], [TokenInfo(id=16, token='T', primary=True, description='Threonine')], [TokenInfo(id=17, token='W', primary=True, description='Tryptophan')], [TokenInfo(id=18, token='Y', primary=True, description='Tyrosine')], [TokenInfo(id=19, token='V', primary=True, description='Valine')], [TokenInfo(id=20, token=':', primary=False, description='Chain token, used for polymers')]])

Use your seed sequence to create an MSA:

[4]:
msa = session.align.create_msa(sequence.encode())
print(msa)
job_id='df4da7b0-55ac-4db7-8cca-a7a52d5911bc' job_type=<JobType.align_align: '/align/align'> status=<JobStatus.SUCCESS: 'SUCCESS'> created_date=datetime.datetime(2025, 8, 21, 7, 36, 6, 317723) start_date=None end_date=datetime.datetime(2025, 8, 21, 7, 36, 6, 317880) prerequisite_job_id=None progress_message=None progress_counter=None sequence_length=None

Examine the outputs once the MSA is complete:

[5]:
msa.wait_until_done(verbose=True)

print(list(msa.get())[0:3])
Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 381.79it/s, status=SUCCESS]
[('101', 'MYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGMYRMQLLSCIALSLALVTNSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVLELKGSEP'), ('UniRef100_G1RE34\t243\t0.764\t2.142E-68\t0\t138\t239\t0\t152\t153', 'MYRMQLLSCIALSLALVTNGAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRPRDLISNINVIVQELKGSETTFMCEyadetativeflnrWITFCQSIISTLT----------------------------------------------------------------------------------------------------'), ('UniRef100_A0A2K5MA48\t234\t0.753\t1.582E-65\t0\t138\t239\t0\t153\t154', 'MYRMQLLSCIALSLALVANSAPTSSSTKKTQLQLEHLLLDLQMILNGINNYKNPKLTRMLTFKFYMPKKATELKHLQCLEEELKPLEEVLNLAQSKNFHLRdTKDLISNINVIVLELKGSETTLMCEyadetativeflnrWITFCQSIISTLT----------------------------------------------------------------------------------------------------')]

Predicting your sequence#

Call the AlphaFold2 model by sending the MSA to the fold endpoint and return a fold job to await:

[6]:
fold = afmodel.fold(msa, num_models=1)

fold
[6]:
FoldJob(num_records=1, job_id='4e50f2d1-f921-46ac-8d23-cdcebba3ebbb', job_type=<JobType.embeddings_fold: '/embeddings/fold'>, status=<JobStatus.PENDING: 'PENDING'>, created_date=datetime.datetime(2025, 8, 21, 7, 36, 8, 793708, tzinfo=TzInfo(UTC)), start_date=None, end_date=None, prerequisite_job_id=None, progress_message=None, progress_counter=0, sequence_length=None)
[7]:
fold.wait_until_done(verbose=True, timeout=900)
Waiting: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:52<00:00,  1.91it/s, status=SUCCESS]
[7]:
True

Wait for the job to complete and fetch the results all with wait():

[8]:
result = fold.wait(verbose=True)
print("\n".join(result.decode().splitlines()[100:110]))
Waiting: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 517.18it/s, status=SUCCESS]
ATOM 80   C CB  . ILE A ? 10  ? -22.625 -1.933  1.770   1.0 58.75 10  A 1
ATOM 81   O O   . ILE A ? 10  ? -21.391 -1.501  -1.284  1.0 58.75 10  A 1
ATOM 82   C CG1 . ILE A ? 10  ? -22.359 -2.760  3.031   1.0 58.75 10  A 1
ATOM 83   C CG2 . ILE A ? 10  ? -23.484 -2.715  0.774   1.0 58.75 10  A 1
ATOM 84   C CD1 . ILE A ? 10  ? -23.609 -3.113  3.818   1.0 58.75 10  A 1
ATOM 85   N N   . ALA A ? 11  ? -21.844 0.433   -0.271  1.0 55.09 11  A 1
ATOM 86   C CA  . ALA A ? 11  ? -22.062 1.217   -1.481  1.0 55.09 11  A 1
ATOM 87   C C   . ALA A ? 11  ? -20.781 1.376   -2.287  1.0 55.09 11  A 1
ATOM 88   C CB  . ALA A ? 11  ? -22.641 2.586   -1.127  1.0 55.09 11  A 1
ATOM 89   O O   . ALA A ? 11  ? -20.797 1.257   -3.514  1.0 55.09 11  A 1

Visualize the structure using molviewspec

[9]:
%pip install molviewspec
]4;0;#1B1A1C\]1;0;#1B1A1C\]4;1;#B071FF\]4;2;#64DCF0\]4;3;#FFDCF3\]4;4;#9AA9D8\]4;5;#B59EEA\]4;6;#9DCEFF\]4;7;#E8D3DE\]4;8;#C3B5C0\]4;9;#D5B1FF\]4;10;#F7FDFF\]4;11;#FFFFFF\]4;12;#D1DCF9\]4;13;#E3D2FF\]4;14;#F8FAFF\]4;15;#E5E0E9\]10;#E8D3DE\]11;[100]#1B1A1C\]12;#E8D3DE\]13;#E8D3DE\]17;#E8D3DE\]19;#1B1A1C\]4;232;#E8D3DE\]4;256;#E8D3DE\]708;[100]#1B1A1C\]11;#1B1A1C\Collecting molviewspec
  Downloading molviewspec-1.6.0-py3-none-any.whl.metadata (10 kB)
Requirement already satisfied: pydantic<3,>=1 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from molviewspec) (2.11.4)
Requirement already satisfied: annotated-types>=0.6.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (2.33.2)
Requirement already satisfied: typing-extensions>=4.12.2 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (4.13.2)
Requirement already satisfied: typing-inspection>=0.4.0 in /home/jmage/Projects/openprotein/openprotein-python-private/.pixi/envs/dev/lib/python3.12/site-packages (from pydantic<3,>=1->molviewspec) (0.4.0)
Downloading molviewspec-1.6.0-py3-none-any.whl (31 kB)
Installing collected packages: molviewspec
Successfully installed molviewspec-1.6.0
Note: you may need to restart the kernel to use updated packages.
[10]:
from molviewspec import create_builderbuilder = create_builder()
structure = builder.download(url="mystructure.cif")\
    .parse(format="mmcif")\
    .model_structure()\
    .component()\
    .representation()\
    .color(color="blue")
builder.molstar_notebook(data={'mystructure.cif': result}, width=500, height=400)

Next steps#

Use the predicted structure to compare with query structure, or try another structure predictor like Boltz or save your structure for future use:

[11]:
with open("my_structure.cif", "wb") as f:
    f.write(result)