Entry Date:
January 4, 2012

Very Large Datasets and New Models to Predict and Design Protein Interactions

Principal Investigator Amy Keating

Project Start Date September 2010

Project End Date
 August 2017


Specific protein-protein interactions are responsible for organizing the cell, for processing biological signals and information, and for the chemistry of life. Thus, understanding biological mechanism relies on understanding the interactions that occur between proteins. An important long-term goal is to develop methods for reliably predicting and rationally modifying protein-protein interactions. Such capabilities would provide insight into the molecular details of pathology and highlight opportunities for disease treatment. This proposal describes an integrated experimental/computational technology platform that will provide predictive models of protein interaction specificity. The experimental component involves constructing randomized libraries of proteins or peptides that will be sorted according to their affinities for binding a particular receptor. The identities and binding affinities for very large numbers of library members will be decoded using high-throughput sequencing methods.

The data, consisting of up to 107 {sequence, affinity} pairs per sequencing run, will be used as input to computational machine learning methods. Models will be generated that capture the relationship between sequence and interactions, and the predictive power of these models will be tested experimentally. The work described in this proposal emphasizes technology development and application of the new platform to study two general types of protein complexes. First are interactions of short helical ligands with mid-sized globular proteins, here studied using anti-apoptotic Bcl-2 and Ca2+ binding EF-hand proteins. Second are interactions of short linear peptides with modular interaction domains, here PDZ and SH3 domains. These four protein families mediate an enormous number of important molecular recognition events in human cells, and the resulting models will provide valuable support to study of their biological functions. This work will also provide a stringent test of the capabilities of the proposed technology, which can then be applied to a much wider variety of molecular complexes, e.g., protein-protein, protein-small molecule and protein-nucleic acid assemblies. Given the paucity of high- throughput methods for accurately measuring protein-protein interactions, and the primitive capabilities of most computational models for predicting protein binding, the proposed technology platform has the potential to dramatically transform the study of protein interaction specificity.

PUBLIC HEALTH RELEVANCE: Specific protein-protein interactions underlie all biological processes. Knowledge of interactions that occur in healthy vs. diseased tissues, coupled with methods for inhibiting such interactions, would dramatically expand opportunities to treat human disease. This proposal describes a new technology for advancing the measurement, prediction and design of protein complexes.