Structure of representations in generative models and applications to biological sequences
Jorge Fernández de Cossío Díaz
Département de Physique, ENS Paris
Thu, May. 30th 2024, 14:00-15:00
Salle Claude Itzykson, Bât. 774, Orme des Merisiers
Biological sequences (DNA, RNA, protein) encode molecular processes that support life. However, the correspondence between sequence and function is complex, context-dependent, and often unknown. Sequence variation during evolution is constrained by conservation of function, which imprints large sequence datasets with informative signatures about this mapping.
In this talk, I will present recent work on generative models of RNA sequences. I will focus on Restricted Boltzmann machines (RBM), an unsupervised neural network that implements a data/representation duality. As a future perspective, I will argue that representations can be used to manipulate properties of generated sequences. Lastly, I will discuss how statistical mechanics methods can help understand the structure of these representations.