Creating a multiple sequence alignment#

Multiple sequence alignment (MSA) is a technique for biological sequence analysis, used to infer sequence homology and conduct phylogenetic analysis to assess the sequences’ shared evolutionary origins. You can create an MSA from a seed sequence, or upload a ready-made file. This tutorial covers the workflow for both options.

What you need before getting started#

You need either a seed sequence or an existing MSA formatted as a .fa, .fasta, or .csv file.

Creating an MSA from a seed sequence#

Initiate the seed workflow by specifying your seed sequence. This example uses Alpha-synuclein:

[ ]:
seed =  "MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA"

Use the Align module to create an MSA from your seed sequence:

[ ]:
msa = session.align.create_msa(seed.encode())
print(msa)

status=<JobStatus.SUCCESS: 'SUCCESS'> job_id='52d676fb-18bf-4803-9912-0380252b78e8' job_type=<JobType.align_align: '/align/align'> created_date=datetime.datetime(2024, 6, 13, 3, 12, 6, 555562) start_date=None end_date=datetime.datetime(2024, 6, 13, 3, 12, 6, 556046) prerequisite_job_id=None progress_message=None progress_counter=None num_records=None sequence_length=None msa_id='52d676fb-18bf-4803-9912-0380252b78e8'

Wait for the results with:

[ ]:
r = msa.wait()

If you want to examine the inputs you have used:

[ ]:
list(msa.get_seed())
[['seed',
  'MDVFMKGLSKAKEGVVAAAEKTKQGVAEAAGKTKEGVLYVGSKTKEGVVHGVATVAEKTKEQVTNVGGAVVTGVTAVAQKTVEGAGSIAAATGFVKKDQLGKNEEGAPQEGILEDMPVDPDNEAYEMPSEEGYQDYEPEA']]

View the resulting MSA:

[ ]:
msa.get_msa() # or msa.wait()
<_csv.reader at 0x79cbbd0bd8c0>

Uploading an MSA#

If you have an existing MSA formatted as a .fa, .fasta, or .csv file, upload it with upload_msa(msa_file).

Upload and view your MSA:

[ ]:
f = ">101\nAAALLLPPP"

msa = session.align.upload_msa(f.encode())

list(msa.get_msa())
[['101', 'AAALLLPPP']]

Next steps#

Learn more about the MSAs on our MSA API page.

You can use your MSA to create a prompt and start generating, scoring, and analyzing sequences with our state-of-the-art PoET model. See Creating a prompt for instructions.

You can also use your MSA with our structure prediction tool to visualize the 3D structure of sequence. See Structure prediction for more information.