Click here to flash read.
Computational and machine learning approaches to model the conformational
landscape of macrocyclic peptides have the potential to enable rational design
and optimization. However, accurate, fast, and scalable methods for modeling
macrocycle geometries remain elusive. Recent deep learning approaches have
significantly accelerated protein structure prediction and the generation of
small-molecule conformational ensembles, yet similar progress has not been made
for macrocyclic peptides due to their unique properties. Here, we introduce
CREMP, a resource generated for the rapid development and evaluation of machine
learning models for macrocyclic peptides. CREMP contains 36,198 unique
macrocyclic peptides and their high-quality structural ensembles generated
using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this
new dataset contains nearly 31.3 million unique macrocycle geometries, each
annotated with energies derived from semi-empirical extended tight-binding
(xTB) DFT calculations. We anticipate that this dataset will enable the
development of machine learning models that can improve peptide design and
optimization for novel therapeutics.