Politecnico di Torino (logo)

Analysis Of The Discriminator’s Training In A Simple GAN Model With Statistical Physics Methods

Francesca Mignacco

Analysis Of The Discriminator’s Training In A Simple GAN Model With Statistical Physics Methods.

Rel. Alfredo Braunstein, Matteo Matteucci, Lenka Zdeborova Zdeborova. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2019


Generative Adversarial Networks (GANs) are artificial neural networks belonging to the class of implicit generative models. GANs achieved the state-of-the-art performance in most unsupervised tasks that require learning how to generate new samples which look statistically indistinguishable from those drawn by an unknown real-world distribution. The main idea behind GANs is to set up a competition between two networks: the generator and the discriminator. These models have emerged as one of the most powerful solutions for realistic image generation and their use encompasses also different applications, such as model-based reinforcement learning or semi-supervised learning. However, a comprehensive understanding of the theoretical conditions underlying the successful training of GANs is still missing. The statistical physics approach could provide useful insights into studying high dimensional models and suggest new directions to tackle these open questions. In the present work, I apply statistical physics tools to conduct the analytic study of the typical performance of the batch learning in a simple model of GAN, with perceptrons as generator and discriminator. This model can be seen as a combination of a rank-1 matrix factorization model and a perceptron problem with structured disorder. Therefore, the result obtained by this GAN model can be compared to the known solution of the analogous rank-1 matrix factorization problem, found without the use of neural networks. The main difference with respect to the perceptron problems previously studied in statistical physics literature is the presence, in this case, of input-output correlations which cannot be easily removed by means of symmetry arguments. I apply Replica theory to study the behaviour of the discriminator during the first step of training, varying the ratio "alpha" of training examples over the dimensionality of the perceptron’s vector of tunable parameters. I define an order parameter, the magnetization, which captures the ability of the discriminator to recover the structure within the real data, and I study it as a function of the control parameter "alpha". I consider the impact of different choices of the perceptron’s activation function on the performance. I find a good agreement between my results and numerical simulations, obtained by training the model with gradient descent. Finally, I discuss about possible implementations of the training dynamics.

Relators: Alfredo Braunstein, Matteo Matteucci, Lenka Zdeborova Zdeborova
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 59
Additional Information: Tesi secretata. Full text non presente
Corso di laurea: Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)
Classe di laurea: New organization > Master science > LM-44 - MATHEMATICAL MODELLING FOR ENGINEERING
Ente in cotutela: Université de Paris-Sud (Paris XI) (FRANCIA)
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/11721
Modify record (reserved for operators) Modify record (reserved for operators)