polito.it
Politecnico di Torino (logo)

Minimal parsimonious chunking of written language: investigating the storage-computation trade-off as a driving principle in chunk formation

Francesca Di Giovanni

Minimal parsimonious chunking of written language: investigating the storage-computation trade-off as a driving principle in chunk formation.

Rel. Alessandro Pelizzola, Davide Crepaldi. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2021

[img]
Preview
PDF (Tesi_di_laurea) - Tesi
Licenza: Creative Commons Attribution Non-commercial No Derivatives.

Download (2MB) | Preview
Abstract:

Visual word identification is the process that allows the brain to recognize a familiar and meaningful word from an ordered collection of letters. Chunking seems to play an important role in this process: rather than jumping from single letters to words, the brain seems to group letters in smaller units. Some recent studies suggest to oust morphemes, the smallest meaning-bearing units in language, from their role as building blocks in chunking: they should instead be replaced by letter chunks which do not necessarily have an explicit connection with semantics, but which could be explained by statistical regularities in letter co-occurence. However, the exact principles according to which these chunks emerge in skilled readers are still unclear. The algorithm developed in the thesis tries to answer this question, looking for the set of chunks that optimizes the trade-off between the storage of many different units and the computational effort needed to process completely new words. This optimization problem is formally translated in the minimization of a one-parameter function featuring two competing terms, the number of stored chunks and the average number of chunks per word. The parameter in the objective function allows us to adjust the relative weight of the two players in this competition, and potentially mirrors psychologically meaningful phenomena, such as the progressive mastering of literacy. Since we used a massive database to learn the chunks, many computational tricks are introduced to accelerate the algorithm, which otherwise would not be able to compute a solution in a finite time. A natural improvement of the algorithm is then to assign different weights to the chunks. We considered here the concept of “chunk productivity”, which we defined as the number of times that a chunk is used to identify a word – although this concept turned out to be fairly elusive. Finally, in order to evaluate the algorithm’s performance against real psychological data, we tested its ability to account for priming, the time saving in the identification of a target word (e.g., deal) that is brought about by a related one (the prime, e.g., dealer). The core idea is that priming is larger when the prime is chunked onto its target (dealer=deal+er). This naïve reader, whose only goal is to find the best compromise between storage and computation without any semantic or morphological information, surprisingly proves able to select many interesting affixes and chunks. Nevertheless, it only partially accounts for human performance. This can be considered a further hint supporting the hypothesis that chunks could partially emerge in a language-independent mechanism, which could take into account, among other factors, the computation and storage trade-off.

Relatori: Alessandro Pelizzola, Davide Crepaldi
Anno accademico: 2020/21
Tipo di pubblicazione: Elettronica
Numero di pagine: 87
Soggetti:
Corso di laurea: Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-44 - MODELLISTICA MATEMATICO-FISICA PER L'INGEGNERIA
Aziende collaboratrici: Sissa
URI: http://webthesis.biblio.polito.it/id/eprint/17916
Modifica (riservato agli operatori) Modifica (riservato agli operatori)