Proteins and multi-protein complexes play crucial roles in our cells. The function of a protein, including its interactions, is encoded in its amino-acid sequence, and the recent explosion of available sequences has inspired data-driven approaches to discover the principles of protein operation. At the root of these new approaches is the observation that amino-acid residues which possess related functional roles often evolve in a correlated way.
First, I will present two novel statistical physics-inspired methods to predict protein-protein interactions from sequence data. One method is based on the maximum-entropy inference approach that has already allowed to infer protein structures from sequences, and the other one is based on information theory. I will further discuss the role of correlations arising from the shared evolutionary history of interacting partners in the success of these methods.
Then, I will propose a simple interpretation of the origin of the "sectors" of collectively correlated amino acids that have been discovered in several protein families through statistical analyses of sequence alignments. I will show that selection acting on any functional property of a protein, represented by an additive trait, can give rise to such a sector.