Politecnico di Torino (logo)

A distributed framework for real-time ingestion of unstructured streaming data

Alessandro D'Armiento

A distributed framework for real-time ingestion of unstructured streaming data.

Rel. Paolo Garza. Politecnico di Torino, Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering), 2018

PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial Share Alike.

Download (2MB) | Preview

The explosive growth in the number of devices connected to the Internet of Things (IoT) only reflect how the growth of big data perfectly overlaps with that of IoT. The management of big data in a continuously expanding network gives rise to non-trivial concerns regarding data collection efficiency, data processing, analytics, and security. The Internet of things would encode 50 to 100 trillion objects, and be able to follow the movement of those objects. Human beings in surveyed urban environments are each surrounded by 1000 to 5000 trackable objects. In 2015 there were already 83 million smart devices, and this number is most likely intended to grow. Challenges for producers of IoT applications are to clean, process and interpret the vast amount of data which is gathered by the sensors, along with the storage of this bulk data. Depending on the application, there could be high data acquisition requirements, which in turn lead to high storage requirements. The main purpose of this Thesis is to analyse the main obstacles faced during the development of solutions for the ingestion of large amount of unstructured data, received from multiple, heterogeneous sources. Subsequently, a dedicated Framework has been eveloped as a working and testable PoC to verify the proposed solutions. This software is not to be intended as a ready to use, “as is”, product, but instead, an extensible framework able to efficiently cover the different most of the common use cases. Among the requirements of the Framework, scalability and resilience are key topics. The platform will have to be able to support millions of devices in real time, ensuring the possibility to subscribe to filtered data flow from streaming devices, for monitoring, filtering, transformation and visualization purposes. The Framework will exploit existing Big Data components and will be integrated as part of the WASP platform, the Open Source and Cloudera-certified framework, specialized in the analysis of streaming data in the Big Data environment.

Relators: Paolo Garza
Academic year: 2018/19
Publication type: Electronic
Number of Pages: 101
Corso di laurea: Corso di laurea magistrale in Ingegneria Informatica (Computer Engineering)
Classe di laurea: New organization > Master science > LM-32 - COMPUTER SYSTEMS ENGINEERING
Aziende collaboratrici: Agile Lab S.r.l.
URI: http://webthesis.biblio.polito.it/id/eprint/9024
Modify record (reserved for operators) Modify record (reserved for operators)