Dataset upload

Preparing your dataset for upload

A dataset is expected to be uploaded as a CSV formatted table. It should have the following columns:

  • the full sequence of each variant and
  • additional columns with measurement values associated with each variant.

To upload your dataset to the protein engineering platform, format it as a CSV table with two columns: the full sequence of each variant and additional columns with measurement values. Missing measurements are acceptable.

If you use mutation codes, specify the full wildtype sequence in the “Sequence options” dropdown, and the platform will enumerate the full sequence of each variant.

Creating a project

Upon first login, you will be prompted to create a project. You can name the project and include a description for your reference.

Uploading a dataset

You can upload your dataset by clicking on the ‘Upload dataset’ button in the navigation panel or the project landing page. This will open a file explorer where you can select your dataset file.

You have the option to edit the name of your dataset to your preference. By default, the name of the uploaded file is used. Additionally, you can add an optional description to provide more information about your dataset.To change the selected file, you can click on the “Change…” button to return to the file explorer and select a different file.

The application will automatically detect the column where your sequences are based on the column name.If the column cannot be found, you can manually input column type for each column.

If your table encodes variants using mutant codes, please ensure that you include the wildtype sequence of your protein under ‘Parent sequence’.

Once you’re ready, click “Upload” to initiate the upload process.