polito.it
Politecnico di Torino (logo)

A combined rule-based and machine learning approach for blackout analysis using natural language processing

Yu Gao

A combined rule-based and machine learning approach for blackout analysis using natural language processing.

Rel. Tao Huang. Politecnico di Torino, Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro), 2022

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

In the field of natural language processing, traditional information extraction methods involve lexical and syntactic analysis to extract words and parts of speech from sentences to establish semantics. This development of the new artificial intelligence branch makes it suitable for automatic tracing and analyzing blackouts in the power systems, which is very costly to society. Therefore, the purpose of this thesis is to develop a model for extracting useful information from texts about the power industry to conduct an effective blackout analysis. To achieve this goal, we proposed a combined traditional rule-based and machine learning approach. A critical step was to build training data and clean data. We considered blackouts using information about when, where, and what equipment and installations failed. The dataset was generated related to blackouts by scraping websites and using OCR to get text documents. More specifically, first, blackout data was collected, and appropriate training data was created through several steps including sentence extraction, relation, and named entity extraction for tagging purposes. Then, a recognition model for a given entity type could be built based on the constructed vocabulary. From experiments, given the blackout texts, we demonstrated how to build a model to extract the desired entities, i.e. time, location, faulty facility, etc. The best results and provable evaluation metrics were obtained by continuously optimizing the model. This research helps to highlight and perceive useful information from outage incidents to specific facilities. The framework proposed by this study can surely migrate to other specific fields and can certainly improve the quality of incident analysis and provide practitioners with technical support for specific tasks.

Relators: Tao Huang
Academic year: 2022/23
Publication type: Electronic
Number of Pages: 91
Subjects:
Corso di laurea: Corso di laurea magistrale in Ict For Smart Societies (Ict Per La Società Del Futuro)
Classe di laurea: New organization > Master science > LM-27 - TELECOMMUNICATIONS ENGINEERING
Aziende collaboratrici: UNSPECIFIED
URI: http://webthesis.biblio.polito.it/id/eprint/24494
Modify record (reserved for operators) Modify record (reserved for operators)