Creating your designs
Running the Design tool
Under ‘Dataset on the left panel’, select your desired dataset and click on “Create a design”. This will take you to a new page where you can define your design objectives and then the platform will search for sequence variants most likely to achieve those objectives.
To explore tradeoffs between the number of mutations in each variant and predicted properties, choose the “Use number of mutations criteria” option and set criteria for multiple properties.
The design algorithm is set to run for 10 steps by default. However, to generate more candidate sequences and allow the algorithm more time to find potentially better variants, this number can be increased.
After adjusting the settings, click “Generate design” to initiate the algorithm. The design will be saved and can be accessed from the navigation panel. Please note that the algorithm may take some time to complete, but you will be able to view the results as they are generated.
Design Criteria
Property predictors in OpenProtein.AI are Bayesian. They output a distribution over possible values of the property for a variant. The mean is what you would get from a typical regression model, but our models also output a standard deviation which indicates how certain we are in the value of that property. Based on this distribution, we can calculate the probability that a sequence variant meets some design goal defined by the property value being greater than or less than some target value. This is how our design criteria are defined. The score given by the predictive models is the log probability that the sequence meets the defined design criteria.
This means that it is important to set reasonable target values for your design criteria, because it has a direct impact on the behaviour of the search algorithm. If the target value is set too ambitiously, then the algorithm will favor exploration by proposing variants with high uncertainty. Why? Because no variant will have expected property value at or beyond the target value, so variants with high uncertainties will be more likely to achieve the design criteria than sequences with low uncertainties. In other words, the model is certain that those variants will not achieve the design objective and it will explore high uncertainty variants instead. This will generally cause the algorithm to explore variants more distant from your mutagenesis dataset.
Examining design results
This article details the suite of visualizations methods available on the platform.
UMAP
Once the algorithm finishes, you’ll see the variant sequences generated by the design process overlaid on the UMAP. To distinguish between variants, you can adjust the color settings and change the property the new points are colored by in the color options panel. The designed sequences are colored by predicted properties.
Viewing sequences on the UMAP: select and highlight
In UMAP, pinpoint specific sequences by either clicking on individual points or by holding down the ‘Shift’ button while dragging to select multiple points. These chosen points will be highlighted in the Design Results Table.
Toggle the visibility of sequences in UMAP by simply clicking the ‘eye’ icon in the Design Results Table.
Histogram
You can view histograms comparing the expected property distributions for the designs against your original library and joint plots for all of the properties in the “Histogram” and “Joint plot” tabs.
Joint plots
You can view joint plots for all of the properties under the “Joint plot” tab.
Note that the design table will show all sequences, not just the best ones.
Below the plots, you can see the table of the generated sequences. The design algorithm may not generate all unique sequences at every step, so you can filter the table to only show unique sequences using the option in “Advanced filters.” You can sort the sequences by predicted property and the score assigned to each according to your design criteria. For score, larger (closer to zero) is better.
To view where mutations have been introduced to your generated sequences, add a reference sequence and select ‘Show mutations’.
The filter icon next to each column name also allows you to set simple filters that can be applied to the designs.
Exporting results
The design table can be downloaded as a CSV to work with in other software using the “Export…” button. You can choose to download the whole table or only your currently filtered rows.