Demo Datasets#

Aliphatic amidase (AMIE_PSEAE)#

This dataset is a single mutation site saturating dataset from Wrenbeck, Azouz, and Whitehead (2017) containing activity measurements for enzyme variants against three different substrates. The parent sequence is AMIE_PSEAE (UniProt identifier P11436).

6,819 entries with 3 properties. Download

Antibody heavy chain (14H)#

This dataset contains binding affinity measurements to the target for a random mutagenesis library containing single, double, and triple mutants of an antibody heavy chain variable region from Li et al. (2022). Mutations were limited to CDRs 1, 2, and 3. Measurements are log base 10 of the binding Kd in nM.

7,476 entries with 1 property. Download

Antibody light chain (14L)#

This dataset contains binding affinity measurements to the target for a random mutagenesis library containing single, double, and triple mutants of an antibody light chain variable region from Li et al. (2022). Mutations were limited to CDRs 1, 2, and 3. Measurements are log base 10 of the binding Kd in nM.

14,339 entries with 1 property. Download

Aminoglycoside 3'-phosphotransferase (KKA2_KLEPN)#

This is a single mutation site saturating dataset from Melnikov et al. (2014) containing measurements of E. coli growth in the presence of six different antibiotics. This protein is a kinase that confers antibiotic resistance, hence, growth measures the activity of the kinase on these antibiotic substrates. The parent sequence is KKA2_KLEPN (UniProt identifier P00552).

5,279 entries with 6 properties. Download

Reference

[LGS+22] (1,2)

Lin Li, Esther Gupta, John Spaeth, Leslie Shing, Rafael Jaimes, Rajmonda Sulo Caceres, Tristan Bepler, and Matthew E Walsh. Machine learning optimization of candidate antibodies yields highly diverse sub-nanomolar affinity antibody libraries. bioRxiv, pages 2022.10.07.502662, October 2022. URL: https://www.nature.com/articles/s41467-023-39022-2.

[MRW+14]

Alexandre Melnikov, Peter Rogov, Li Wang, Andreas Gnirke, and Tarjei S Mikkelsen. Comprehensive mutational scanning of a kinase in vivo reveals substrate-dependent fitness landscapes. Nucleic Acids Res., 42(14):e112, August 2014. URL: http://dx.doi.org/10.1093/nar/gku511.

[WAW17]

Emily E Wrenbeck, Laura R Azouz, and Timothy A Whitehead. Single-mutation fitness landscapes for an enzyme on multiple substrates reveal specificity is globally encoded. Nat. Commun., 8:15695, June 2017. URL: https://www.nature.com/articles/ncomms15695.