Francesca Di Giovanni
Minimal parsimonious chunking of written language: investigating the storagecomputation tradeoff as a driving principle in chunk formation.
Rel. Alessandro Pelizzola, Davide Crepaldi. Politecnico di Torino, Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi), 2021

PDF (Tesi_di_laurea)
 Tesi
Licenza: Creative Commons Attribution Noncommercial No Derivatives. Download (2MB)  Preview 
Abstract: 
Visual word identification is the process that allows the brain to recognize a familiar and meaningful word from an ordered collection of letters. Chunking seems to play an important role in this process: rather than jumping from single letters to words, the brain seems to group letters in smaller units. Some recent studies suggest to oust morphemes, the smallest meaningbearing units in language, from their role as building blocks in chunking: they should instead be replaced by letter chunks which do not necessarily have an explicit connection with semantics, but which could be explained by statistical regularities in letter cooccurence. However, the exact principles according to which these chunks emerge in skilled readers are still unclear. The algorithm developed in the thesis tries to answer this question, looking for the set of chunks that optimizes the tradeoff between the storage of many different units and the computational effort needed to process completely new words. This optimization problem is formally translated in the minimization of a oneparameter function featuring two competing terms, the number of stored chunks and the average number of chunks per word. The parameter in the objective function allows us to adjust the relative weight of the two players in this competition, and potentially mirrors psychologically meaningful phenomena, such as the progressive mastering of literacy. Since we used a massive database to learn the chunks, many computational tricks are introduced to accelerate the algorithm, which otherwise would not be able to compute a solution in a finite time. A natural improvement of the algorithm is then to assign different weights to the chunks. We considered here the concept of “chunk productivity”, which we defined as the number of times that a chunk is used to identify a word – although this concept turned out to be fairly elusive. Finally, in order to evaluate the algorithm’s performance against real psychological data, we tested its ability to account for priming, the time saving in the identification of a target word (e.g., deal) that is brought about by a related one (the prime, e.g., dealer). The core idea is that priming is larger when the prime is chunked onto its target (dealer=deal+er). This naïve reader, whose only goal is to find the best compromise between storage and computation without any semantic or morphological information, surprisingly proves able to select many interesting affixes and chunks. Nevertheless, it only partially accounts for human performance. This can be considered a further hint supporting the hypothesis that chunks could partially emerge in a languageindependent mechanism, which could take into account, among other factors, the computation and storage tradeoff. 

Relators:  Alessandro Pelizzola, Davide Crepaldi 
Academic year:  2020/21 
Publication type:  Electronic 
Number of Pages:  87 
Subjects:  
Corso di laurea:  Corso di laurea magistrale in Physics Of Complex Systems (Fisica Dei Sistemi Complessi) 
Classe di laurea:  New organization > Master science > LM44  MATHEMATICAL MODELLING FOR ENGINEERING 
Aziende collaboratrici:  Sissa 
URI:  http://webthesis.biblio.polito.it/id/eprint/17916 
Modify record (reserved for operators) 