Large-scale video scene retrieval through Transformer Encoder
Lorenzo De Nisi
Large-scale video scene retrieval through Transformer Encoder.
Rel. Andrea Calimera. Politecnico di Torino, Corso di laurea magistrale in Data Science And Engineering, 2021
|
Preview |
PDF (Tesi_di_laurea)
- Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives. Download (23MB) | Preview |
Abstract
Over the last few years the production of multimedia content has experienced a rapid growth. Such data constitutes a valuable source of information, but to leverage that great potential, automating human processes is crucial. A good portion of multimedia data is represented by video data. From social media and streaming services to security, videos constitute one of the most immediate mediums to convey information. Combining the great expressivity of written text with vision is the foundation of Vision-Language understanding, often employed to perform automatic supervision, moderation and anomaly detection. The Thesis goes in this direction, investigating different solutions for an application capable of performing retrieval and detection on a video, starting from a textual description of the desired scene.
Experiments have been conducted with Transformer-based architectures and particular attention is given to scale efficiency and real-time capabilities, analyzing the trade-off between latency and precision, increasing input resolution and altering the architectures
Relatori
Tipo di pubblicazione
URI
![]() |
Modifica (riservato agli operatori) |
