polito.it
Politecnico di Torino (logo)

Training and inference of fully connected networks with resistive memories: design and optimization of multiple conductance-based structure, and of a novel architecture to implement an arbitrary activation function

Cristiano, Giorgio

Training and inference of fully connected networks with resistive memories: design and optimization of multiple conductance-based structure, and of a novel architecture to implement an arbitrary activation function.

Rel. Carlo Ricciardi, Candido Pirri. Politecnico di Torino, Corso di laurea magistrale in Nanotechnologies For Icts (Nanotecnologie Per Le Ict), 2018

Abstract:

Nowadays the AI has reached human-like, or even better-than-human performances in tasks such as classification, recognition, etc.,so it is gaining more and more use in the everyday life. However, as of today, these networks run on non-dedicated hardware such as CPUs or GPUs, and to achieve such high results, these systems require a lot more power, time and area than necessary. Thus a lot of effort has been put into developing dedicated hardware, such as the digital accelerator TPU developed by Google, to drastically reduce the used resources. However, such accelerators still follow a Von-Neumann architecture, and the necessity of moving data from the memory to a processor is still a heavily limiting factor. A possible solution is to encode the synaptic weights into analog memories, and perform the operations directly at the location of the data. This can be done exploiting the emerging resistive memory technologies, such as Phase Change Memories (PCM) or Resistive RAMs (RRAMS). These analog accelerators can provide better results than the digital counterpart, but strongly rely on the parameters of the used non-volatile memories (NVM). This Master thesis focuses on improvements of an analog accelerator structure. The work studies the parameters needed to achieve software equivalent accuracies on the MNIST dataset with PCM-like memories, analyzing the impact of many different device parameters on the network training and final accuracy. Additionally, such work will also propose a novel architecture to map arbitrary non-linear activation functions. Such task, easy to achieve in software, is more complex in hardware, and being able to accurately map a function whilst providing flexibility on the shapes of the used functions is crucial for a good forward inference and training of a network. Chapter 1 provides a brief overview of Deep Neural Network (DNN) theory, for both forward inference and training. Chapter 2 focuses on the implementation of NVM-based array cores to map a fully connected (FC) neural layer, a basic structure needed to encode a signed synaptic weight that is able to both gradually increase and decrease. It also shows the connection between multiple array cores in order to achieve networks with more than 2 neural layers. Chapter 3 analyses the most promising resistive NVMs and their possible use for the analog NN accelerator with the structure described in the previous chapter. The advantages and drawbacks of different memories such as PCMs, RRAMs, MRAMs are shown. Chapter 4 proposes a more complex weight structure, capable of achieving both large dynamic range and small minimum step. In this chapter we also propose procedure to optimize the use of such structure for training the network, and we evaluate the impact of many device parameters, such as smallest achievable change, device variability, exc., on the training of the network by evaluating the final achieved accuracy, and comparing it to the one obtainable with state of the art tools such as Tensorflow. Chapter 5 deals with the challenging task of mapping arbitrary activation functions in hardware. It provides a brief overview of a possible solution, and then proposes a novel structure capable of achieving a highly flexible and efficient conversion that can be massively parallelized among the neurons of a single layer. Finally, the simulation results of an implementation in IBM 90nm technology is shown, and an analysis in terms of area, time, and power is provided.

Relatori: Carlo Ricciardi, Candido Pirri
Anno accademico: 2018/19
Tipo di pubblicazione: Elettronica
Soggetti:
Corso di laurea: Corso di laurea magistrale in Nanotechnologies For Icts (Nanotecnologie Per Le Ict)
Classe di laurea: Nuovo ordinamento > Laurea magistrale > LM-29 - INGEGNERIA ELETTRONICA
Ente in cotutela: IBM Research - Almaden (STATI UNITI D'AMERICA)
Aziende collaboratrici: IBM
URI: http://webthesis.biblio.polito.it/id/eprint/8334
Modifica (riservato agli operatori) Modifica (riservato agli operatori)