polito.it
Politecnico di Torino (logo)

Structured latent embeddings for generating and reposing DXA images

Mattia Delleani

Structured latent embeddings for generating and reposing DXA images.

Rel. Lia Morra. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2022

Abstract:

Recent advances in machine learning allow to map image features with semantic descriptions into aligned latent representations. These representations are useful in that they capture the "essence" of the observed and described elements, allowing generalization for unseen domains and classes. In addition, they provide the means that allow both: i) to generate new images from arbitrary semantic descriptions and ii) to generate semantic descriptions from input images. In this thesis, we aim at studying the applications of such tools in the medical context. For this purpose, the DXA (dual-energy X-ray Absorptiometry) scans are used. DXAs capture subtle characteristics of patients' body structures which are difficult to notice and analyze by humans but are important for the holistic evaluation of the subject. We strive to develop a model whose latent space captures the subtle characteristics of patients such as pose, the orientation of body parts, shape and structure of the body, etc. This task is challenging since when the data distribution has subtle differences, it is difficult to develop a structured and discriminative latent space due to mode collapse. In order to do that, a Variational Auto Encoder (VAE), which is a well-known generative probabilistic model, is leveraged. The traditional VAE is not sufficient to build a structured and generic latent space. Thus, starting from the VAE, new architectures are developed exploiting 3D human modeling components (STAR body model), more precisely parameters related to posing, translation, and shape of the human body. These data types have been leveraged in different ways in the architecture to train a pose-shape encoder and to enforce some constraints in the reconstruction. These constraints drove us to the modification of the initial VAE into a more complex architecture composed of a VAE with a Pose-Shape Encoder that can reconstruct images considering the shape and the pose of patients which was not true for the original VAE. This final model is also able, given a patient in a certain position, to re-pose it in a given input position.

Relatori: Lia Morra
Anno accademico: 2022/23
Tipo di pubblicazione: Elettronica
Numero di pagine: 98
Informazioni aggiuntive: Tesi secretata. Fulltext non presente
Soggetti:
Corso di laurea: Corso di laurea magistrale in Data Science And Engineering
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-32 - INGEGNERIA INFORMATICA
Ente in cotutela: INSTITUT NATIONAL POLYTECHNIQUE DE GRENOBLE (INPG) - ENSIMAG (FRANCIA)
Aziende collaboratrici: INRIA
URI: http://webthesis.biblio.polito.it/id/eprint/24693
Modifica (riservato agli operatori) Modifica (riservato agli operatori)