
Northwestern Medicine scientists have developed a new experimental method to analyze conformational fluctuations in protein domains on a uniquely large scale, which may improve data-driven modeling, biology and protein engineering, as detailed in a recent study published in Nature.
“Proteins move around between different structures but understanding what the energies of those different conformations are and how rare or common those conformations is totally unknown for most proteins. This study was really about developing a new method that let us illuminate all these different dynamics of proteins on a large scale for the first time,” said Gabriel Rocklin, PhD, assistant professor of Pharmacology, who was senior author of the study.
Nearly all biological processes rely on the folding of proteins, from conducting electrical signals in nerve cells to inducing immune responses throughout the body. All folded proteins fluctuate between different conformational states, including a low-energy native folded state, a higher-energy unfolded state, and other excited states at different energy levels that can influence protein function, interactions and aggregation.
Historically, studying conformational fluctuations in protein energy landscapes has been challenging. High-energy states are rare and transient, making them difficult with current experimental methods, and current AI methods have limited ability to predict them. Additionally, even small changes to a protein’s sequence due to a mutation or intentional protein engineering can cause large changes in the populations of different conformational states.
“Previously, we could study one protein at a time, but we couldn’t look at tens or hundreds of these proteins to analyze protein dynamics in parallel. We have computational tools like molecular dynamics simulations to model how these proteins are behaving, but experimental tools have lagged in their ability to measure them at scale,” said Állan Ramos Ferrari, PhD, research assistant professor of Pharmacology and lead author of the study.

To study protein energy landscapes on a larger scale, the scientists developed a new experimental approach called multiplexed hydrogen-deuterium exchange mass spectrometry (mHDX-MS) strategy, that enables parallel analysis of hundreds of proteins domains, revealing how sequence differences give rise to distinct conformational landscapes.
“There’s an unlimited number of possible different combinations of these amino acids, and if you’re designing a new drug or a new sensor for biotechnology, for example, what is the best sequence of amino acids to use for your particular function or how will that sequence change the way that protein behaves? The mHDX-MS approach now lets us examine these conformational fluctuations for thousands of different protein sequences,” Rocklin said.
Hydrogen-deuterium exchange measures the energies of individual residues transitioning from “closed” protein conformations to higher energy “open” protein conformations, which can’t be detected by current approaches. This provides more information about the conformational landscapes of each protein compared to previous high-throughput approaches.
“To enable large-scale HDX analysis of both natural and designed protein domains, we leveraged DNA oligo pool library synthesis to produce customized synthetic proteomes comprising up to 1,300 small protein domains in a single mixture (28 to 64 amino acids in length). Analyzing these mixtures by mHDX-MS revealed the exchange rate distributions and approximate opening energy distributions for each protein domain,” the authors wrote.
Using their approach, the scientists measured the opening energy distributions of more than 5,700 protein domains from ten domain families.
This dataset revealed hidden differences in energy landscapes between protein sequences with the same overall fold, differences in landscapes between domains sharing the same global folding stability, and systematic differences between domain families. The scientists also used machine learning to identify common determinants of energy landscapes across a broad range of sequences in their dataset.
The characterized proteins can be used to develop improved computational models of conformational fluctuations, providing a framework to understand how sequence variations contribute to disease and to guide protein and drug development, Ferrari said.
“There are many protein families that have mutations that are characterized to have a certain disease phenotype, so now we can take this approach and look at those variants and ask how the dynamics are correlated with the phenotypes,” Ferrari said.
Going forward, the scientists aim to improve the resolution and granularity of their approach and ultimately increase the size of their dataset. The dataset is also publicly available to use by the scientific community.
“The dream is to build a dataset on the scale of the Protein Data Bank, with the same quality and quantity needed to develop a new generation of machine learning tools. We are also working to extend this approach to larger proteins,” Ferrari said.
Co-authors of the study include Mario Garcia, Claire Phoumyvong and Cydney Martell, graduate students in the Driskill Graduate Program in Life Sciences (DGP).
This work was supported by the National Institutes of Health (NIH) Director’s New Innovator Award DP2GM140927; the São Paulo Research Foundation (FAPESP) Grant #20/14421-1; NIH grants T32GM008382, T32GM149439 and F31GM151811; the PhRMA Foundation Predoctoral Fellowship in Drug Delivery; National Science Foundation (NSF) NRT Award 2021900; Human Frontier Science Program Long-Term Fellowship; JST PRESTO Grant JPMJPR21E9; and NSF Award 2304707.





