Structure of representations in generative models and applications to biological sequences
Biological sequences (DNA, RNA, protein) encode molecular processes that support life. However, the correspondence between sequence and function is complex, context-dependent, and often unknown. Sequence variation during evolution is constrained by conservation of function, which imprints large sequence datasets with informative signatures about this mapping.
In this talk, I will present recent work on generative models of RNA sequences. I will focus on Restricted Boltzmann machines (RBM), an unsupervised neural network that implements a data/representation duality. As a future perspective, I will argue that representations can be used to manipulate properties of generated sequences. Lastly, I will discuss how statistical mechanics methods can help understand the structure of these representations.
Département de Physique, ENS Paris

