Master Degree Course in Electronic Engineering

Master Degree Thesis

**UVM environment for RISC-V processors**

**Advisor**
Prof. Edgar Ernesto Sanchez Sanchez
Ph.D. Annachiara Ruospo

**Candidate**
Leonardo Barraco

April 2021
Abstract

In the VLSI design flow, functional verification is the task of checking that the digital design is compliant with the specifications in order to find bugs in the hardware description before being mass-produced. Due to the fast growth of the design size and complexity, functional verification has become the bottleneck in the design flow. According to industry surveys, verification can take up to 70% of the total amount of time while the design phase requires around 30%. For this reason, it is necessary to develop a proper verification framework to speed up the verification phase and avoid delays in time-to-market. In this thesis, a simulation-based verification environment has been developed to verify the RISC-V RV32IMFCXpulp processor. In the UVM environment, Agents are responsible for driving input test-vectors into the DUV, collecting the output transactions, and finally performing the comparison of the actual results with the expected ones. Python scripts are used to generate random constrained stimuli according to the ISA and to extract simulation results. A good verification effort must be characterized by a coverage greater than 90% as this parameter represents the confidence of the verification process. Code Coverage, with its metrics, has been used to keep track of the improvements. In order to reach a 90.1% Coverage, it was necessary to test the processor functionalities out of normal operating conditions, by injecting proper test vectors including illegal instructions (Fault Injection), interrupt requests and asynchronous resets.
Questa tesi è dedicata a Leonardo, Ignazio, Pia ed Adriana
che insieme ai miei genitori mi hanno cresciuto trasmettendomi valori e ambizioni.
# 3 UVM Testbench

3.1 Overall Structure ........................................ 29
3.2 Top .................................................. 31
3.3 Wrapper ............................................... 31
3.4 Interfaces ............................................ 31
   3.4.1 Interface in ..................................... 32
   3.4.2 Interface out ................................... 33
3.5 Sequences ............................................. 34
   3.5.1 Processor Sequence ............................. 35
   3.5.2 Packet out ...................................... 36
3.6 Environment .......................................... 37
3.7 Agents ................................................ 38
   3.7.1 Agent in ......................................... 38
   3.7.2 Agent out ...................................... 39
3.8 Driver ................................................ 39
3.9 Monitors .............................................. 39
   3.9.1 Monitor in ....................................... 40
   3.9.2 Monitor out .................................... 40
3.10 Scoreboard .......................................... 40
    3.10.1 Decode_check .................................. 42
    3.10.2 Summary of simulation ...................... 50

# 4 Simulation Environment .................................. 51
4.1 ISA Database .......................................... 51
4.2 RV Generator .......................................... 54
4.3 UVM Env Configurator ................................ 59
   4.3.1 GUI Elements ................................... 59
   4.3.2 GUI Result Frames ............................. 61

# 5 Simulation and Results ................................ 65
5.1 Coverage and metrics .................................. 65
   5.1.1 Statement Coverage ............................ 66
   5.1.2 Branch Coverage ................................. 66
   5.1.3 Focused Condition Coverage .................... 66
   5.1.4 Focused Expression Coverage ................. 67
   5.1.5 FSM Coverage ..................................... 67
   5.1.6 Toggle Coverage ................................ 68
5.2 Simulations ........................................... 69
   5.2.1 Single Simulation ............................... 69
   5.2.2 Multiple Simulations ......................... 71

# 6 Conclusion and Future works ......................... 81
## A ALU Extension

- A.1 Bit Manipulation Operations ........................................... 83
- A.2 General ALU Operations ................................................. 84
- A.3 Immediate Branching ....................................................... 85
- A.4 MAC Operations ............................................................ 85

## B Vectorial Extension

- B.1 Vectorial ALU ............................................................. 87
- B.2 Vectorial Comparison ..................................................... 90
# List of Tables

1.1 UVM components ........................................... 10
1.2 UVM phases ........................................... 10
2.1 Base Immediate Encoding instructions .................. 15
2.2 Base Register Encoding instructions .................. 16
2.3 Control Transfer Instructions .......................... 17
2.4 Load and Store instructions ............................ 18
2.5 System instructions .................................... 18
2.6 Mul/Div Instructions ................................... 19
2.7 Compressed Instructions Quadrant 0 .................... 20
2.8 Compressed Instructions Quadrant 1 .................... 20
2.9 Compressed Instructions Quadrant 2 .................... 21
2.10 Register-Immediate loads with post increment ......... 21
2.11 Register-Register loads with post increment ......... 22
2.12 Register-Immediate stores with post increment ...... 22
2.13 Register-Register stores with post increment ...... 22
2.14 Hardware Loop instruction encoding ................... 23
4.1 sel string encoding ................................... 55
5.1 Example of illegal instruction ........................... 76
A.1 Bit Manipulation Encoding ................................ 83
A.2 Bit Manipulation Encoding ................................ 83
A.3 General Alu Encoding ................................... 84
A.4 General Alu Encoding ................................... 84
A.5 General Alu Encoding ................................... 85
A.6 Immediate Branching Encoding .......................... 85
A.7 MAC Encoding ......................................... 85
A.8 MAC Encoding ......................................... 86
B.1 Vectorial General ALU Instructions .................... 87
B.2 Vectorial General ALU Instructions .................... 88
B.3 Vectorial Dot Product Instructions .................... 88
B.4 Vectorial Shuffle-pack Instructions .................... 88
B.5 Vectorial Shuffle-pack Instructions .................... 89
B.6 Vectorial comparison Instructions ....................... 90
# List of Figures

<table>
<thead>
<tr>
<th>Figure</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.1</td>
<td>VLSI Design Flow</td>
<td>2</td>
</tr>
<tr>
<td>1.2</td>
<td>Statistics showing the increase of time spent in verification phase[8]</td>
<td>2</td>
</tr>
<tr>
<td>1.3</td>
<td>RISCV-DV framework architecture</td>
<td>3</td>
</tr>
<tr>
<td>1.4</td>
<td>MIPS UVM framework architecture</td>
<td>4</td>
</tr>
<tr>
<td>1.5</td>
<td>Functional Verification Aspects [21]</td>
<td>6</td>
</tr>
<tr>
<td>1.6</td>
<td>Scoreboard approach</td>
<td>7</td>
</tr>
<tr>
<td>1.7</td>
<td>Phases of Random stimuli based verification</td>
<td>8</td>
</tr>
<tr>
<td>1.8</td>
<td>UVM Classes Diagram [13]</td>
<td>9</td>
</tr>
<tr>
<td>1.9</td>
<td>Transaction Level Modeling</td>
<td>11</td>
</tr>
<tr>
<td>2.1</td>
<td>RI5CY Architecture Block Diagram</td>
<td>13</td>
</tr>
<tr>
<td>2.2</td>
<td>RI5CY Architecture Block Diagram</td>
<td>24</td>
</tr>
<tr>
<td>2.3</td>
<td>General Purpose Register File</td>
<td>26</td>
</tr>
<tr>
<td>3.1</td>
<td>UVM Framework Structure</td>
<td>30</td>
</tr>
<tr>
<td>3.2</td>
<td>Processor interface block diagram</td>
<td>32</td>
</tr>
<tr>
<td>3.3</td>
<td>Processor out interface block diagram</td>
<td>34</td>
</tr>
<tr>
<td>3.4</td>
<td>Processor Environment block diagram</td>
<td>38</td>
</tr>
<tr>
<td>3.5</td>
<td>Result of AUIPC</td>
<td>41</td>
</tr>
<tr>
<td>3.6</td>
<td>Result of Branch not taken</td>
<td>41</td>
</tr>
<tr>
<td>3.7</td>
<td>Result of branch taken</td>
<td>42</td>
</tr>
<tr>
<td>3.8</td>
<td>decode and check function collapsed</td>
<td>47</td>
</tr>
<tr>
<td>3.9</td>
<td>Plot of simulation summary</td>
<td>50</td>
</tr>
<tr>
<td>4.1</td>
<td>Random Program generated by RVGEN2.py</td>
<td>58</td>
</tr>
<tr>
<td>4.2</td>
<td>UVM Env Graphical User Interface</td>
<td>59</td>
</tr>
<tr>
<td>4.3</td>
<td>Simulation Result frame</td>
<td>61</td>
</tr>
<tr>
<td>4.4</td>
<td>Single Coverage Result frame</td>
<td>62</td>
</tr>
<tr>
<td>4.5</td>
<td>Aggregate Coverage Result frame</td>
<td>62</td>
</tr>
<tr>
<td>4.6</td>
<td>Coverage Trend frame</td>
<td>63</td>
</tr>
<tr>
<td>5.1</td>
<td>Expression coverage</td>
<td>67</td>
</tr>
<tr>
<td>5.2</td>
<td>alu div FSM example</td>
<td>67</td>
</tr>
<tr>
<td>5.3</td>
<td>FSM coverage</td>
<td>68</td>
</tr>
<tr>
<td>5.4</td>
<td>Coverage trends</td>
<td>69</td>
</tr>
<tr>
<td>5.5</td>
<td>Results of the simulation</td>
<td>70</td>
</tr>
<tr>
<td>Section Number</td>
<td>Title</td>
<td>Page</td>
</tr>
<tr>
<td>---------------</td>
<td>--------------------------------------------</td>
<td>------</td>
</tr>
<tr>
<td>5.6</td>
<td>Coverage Report</td>
<td>72</td>
</tr>
<tr>
<td>5.7</td>
<td>Instruction Set Coverage reports</td>
<td>73</td>
</tr>
<tr>
<td>5.8</td>
<td>Instruction Set Coverage reports</td>
<td>74</td>
</tr>
<tr>
<td>5.9</td>
<td>Coverage Report</td>
<td>75</td>
</tr>
<tr>
<td>5.10</td>
<td>Coverage Report</td>
<td>77</td>
</tr>
<tr>
<td>5.11</td>
<td>Instruction Set Coverage reports</td>
<td>78</td>
</tr>
<tr>
<td>5.12</td>
<td>Coverage Report</td>
<td>79</td>
</tr>
</tbody>
</table>
List of Acronyms

ALU  Arithmetic Logic Unit
API  Application Programming Interface
APU  Auxiliary Processing Unit
AUIPC  Add Upper Immediate to Program Counter
CDV  Coverage Driven Verification
CSR  Control and Status Registers
CSV  Comma Separated Values
DUV  Device Under Verification
FCC  Focused Condition Coverage
FEC  Focused Expression Coverage
FIFO  First In First Out
FPU  Floating Point Unit
FP  Floating Point
FSM  Finite State Machine
GPR  General Purpose Register
GUI  Graphic User Interface
HDL  Hardware Description Language
IC  Integrated Circuit
ISA  Instruction Set Architecture
ISG  Instruction Stream Generator
ISS  Instruction Set Simulator
JALR  Jump and Link Register
JAL  Jump and Link
LUI  Load Upper Immediate
MOS  Metal-Oxide Semiconductor
OBI  Open Bus Interface
OOP  Object Oriented Programming
OPIMM  Operand Immediate
OVM  Open Verification Methodology
PC  Program Count
PULP  Parallel Ultra Low Power
RAM  Random Access Memory
RD  Register Destination
RISC  Reduced Instruction Set Computer
ROM  Read Only Memory
RS  Register Source
RTL  Register Transfer Level
SIMD  Single Instruction Multiple Data
STMTS  Statements
SoC  System on Chip
TLM  Transaction Level Modeling
UVC  UVM Verification Component
UVM  Universal Verification Methodology
VECOP  Vectorial Operation
VHDL  VHSIC Hardware Description Language
VLSI  Very Large Scale Integration
eRM  e Reuse Methodology
Chapter 1

Introduction

1.1 Goal of the thesis

The goal of this thesis work is to build a verification environment based on UVM methodology to verify a RISCV architecture including not only the base ISA but also extensions and proprietary extensions. It has been decided to use Code Coverage enabling all the metrics, it is important reaching a high coverage level (> 90%) to ensure that both standard situation and corner cases have been verified.

1.2 Motivation

The number of transistor/IC double every 2 years

According to Moore’s law[19] stated in 1965, the number of transistor per integrated circuit doubles every two years, and thanks to the technologic progress in silicon manufacturing this law is still valid. Thanks to the possibility to integrate a larger number of transistor in the same chip area, devices complexity is increasing and engineers are able to implement complex systems providing more functionalities on a single chip (SoC).

VLSI is the process of producing an IC which contain millions of MOS transistors onto a single chip. Microprocessors and memory chips are typical example of VLSI devices. The VLSI design cycle starts with a formal specification of a VLSI chip, following a series of steps, and eventually lead to the production of a packaged chip. A typical design cycle may be represented by the flow chart shown in Fig. 1.1.

As complexity increases, the probability of having bugs in the hardware description will increase. According to "The 2020 Wilson Research Group Functional Verification Study" (Fig. 1.2), the average time spent in HDL coding represents 30%
of the total time and around 60-70% of time is spent to verify that the architecture meets the required specification [8]. Generally, it is difficult that design meets the specification at the first verification, and delays in hardware verification lead to delays in time to market which are major issues in a company. It is clear that developing a verification framework is fundamental in order to reduce the amount of time necessary to produce a verified hardware design.

1.2.1 State of art

Before moving on, it would be interesting to analyze what has already been done in the field of UVM Based RISC-V Verification.
RISC-V DV

RISC-V DV is an SV/UVM based open-source RISC-V verification environment. It is available on Github\(^1\), it has been supported by Google. The verification environment structure is reported in Fig. 1.3.

RISC-V ISG produces a constrained set of assembly programs which are then cross-compiled and fed to the ISS and RTL. Both the DUV and the reference model write-back the log of the simulation on a .csv file. Finally, the log files are compared to find out any discrepancies. This Instruction Set Simulator is configurable by modifying the ISS.yaml file, supported ISS’s are SPIKE, Imperas OVPsim, Western Digital Whisper, SAIL_RISCV. Being UVM-based it is compatible with the major HDL simulator vendors such as Synopsys, Cadence, Mentor Graphics, Metrics. As it is an open-source project it is possible to clone it from GitHub and modify the source code to be adapted to the DUV. The Device under Verification has proprietary extensions which means customs instructions, so two modifications would be required:

- Implement custom ISA in Instruction Stream Generation;
- Implement a custom ISS capable of dealing with Pulp-proprietary extensions.

Unfortunately, the ISS developed by the RI5CY producer is not an open-source project, so a large amount of work would be necessary to develop a proper ISS.

Processor-UVM-Verification by Anish Gupta

This project, available on Github\(^2\) has been the starting point of this thesis work.

---

\(^1\)https://github.com/google/riscv-dv

\(^2\)https://github.com/gupta409/Processor-UVM-Verification
It is a System Verilog based Verification environment for MIPS 5 staged pipelined processor. This project embeds a simple UVM environment with UVC derived from base classes. Random Instructions are generated inside the sequencer using SystemVerilog constraints. In reality, this project is far different from what is needed to verify RI5CY but it represents a quite good example of a UVM framework in which the reference model is not an external ISS but is embedded in the scoreboard allowing a run-time check of results.

1.3 Introduction to Verification

According to Andrew Piziali, the most appropriate definition of functional verification is "Demonstrating the intent of a design is preserved in its implementation"[21]. It is important to remember that the first steps in VLSI flow have a high abstraction level, while when the design reaches the HDL coding step it is less abstract. The main consequence is that with each transformation during the design process the intent is clarified removing both ambiguity and redundancy. The implementation is the RTL realization of the design written in an HDL such as Verilog, VHDL, SystemVerilog. Verification is a comparative process between the RTL implementation and the intent exploited to find functional logic errors. Logic errors or bugs are differences between the observed behaviour of the DUV and its
expected behaviour (intent). This kind of errors could be caused by designers because of misinterpretation of specifications, or ambiguous specification.

1.3.1 Verification Methods

According to Andrew Piziali [21] there are 2 main methods to verify a DUV:

- Static Methods;
- Dynamic Methods.

Static Methods

Static methods are not simulation-based and use a mathematical model of the design to determine if there is any violation of the assertion. As a result, it is not required to generate and drive stimuli in the DUV to verify the design. That can be considered an advantage as the most time-consuming step in dynamic methods is the one related to the generation of the proper test vector. On the other hand, static methods have significant disadvantages related to the verification of complex architectures made up of multiple blocks. Static methods appear to be an effective verification tool if the DUV is a small block and verification engineer is interested in its behaviour without caring about interaction with other blocks.

Dynamic Methods

Dynamic methods are simulation-based and require a simulation environment. They are characterized by simulating the DUV applying certain test vectors and comparing its response against the expected behaviour. The simulation environment should be able to record verification progress using coverage metrics. Dynamic methods are advantageous as they allow to verify all the possible test condition. However, for a large design, it could be time-expensive as the number of test vector increase dramatically.

1.3.2 Verification Plan

The verification plan defines what must be verified in a hardware design, the verification strategy, the coverage metrics that should be set and then met to move to the next step of the design flow. Verification plan is composed of three important aspects:

- Coverage Measurements;
Introduction

- Stimulus Generation;
- Response Checking.

Figure 1.5: Functional Verification Aspects [21]

Coverage Measurement

The coverage measurement section of the verification plan is the one in which the verification scopes are described. It is the most important section because determining if all bugs have been found is not possible, so metrics are required to estimate the level of coverage that has been achieved. This section includes the kinds of coverage: functional, code, assertions and eventually the metrics.

Stimulus generation

The stimulus generation part is responsible for generating the input test vector required to fully exercising the DUV and exhibiting all the possible behaviours. That means not only generating valid test vectors showing that the device is working as intended but also invalid test vectors to drive the device into corner-cases. So an important aspect of stimulus generation is verifying situation that occurs only outside of normal operating parameter in order to check the error detection logic of the DUV. The objective of stimulus generation is generating test-vectors that allow reaching a high coverage level.

Response Checking

The response checking section is responsible for verifying that DUV responses conform to the specifications. There are two different strategies:
1.3 – Introduction to Verification

- Reference Model Check;
- Distributed data and temporal check.

**Reference Model Check**

This approach requires a reference model, so a sort of implementation of the DUV at a higher abstraction level. The reference model is used alongside the DUV and receive the same input test-vectors. The responses coming from the DUV are compared to the expected results provided by the reference model. The problem in this kind of approach is that building up a reliable reference model could lead to complex work comparable to the design process.

**Distributed data and temporal check**

This second strategy exploits temporal check on some monitored signals to capture device behaviour. One of the used approaches is based on Monitors and a Scoreboard in a structure like the one shown in Fig. 1.6 Input packets are captured by

![Scoreboard approach](image_url)

Figure 1.6: Scoreboard approach

the input monitor and sent to the reference model residing in the scoreboard while DUV outputs are collected by the output monitor and sent to the checker in the scoreboard. Inside the scoreboard, input packets are processed according to the specification to produce the expected outputs. Finally, the checker provides a Pass or Fail according to the result of the comparison.
1.4 The Universal Verification Methodology

In the previous section, the verification issue has been explained and appeared clear that there was the necessity of a universal methodology to increase the speed and the efficiency of the verification process.

UVM is a standardized methodology for verifying IC designs. UVM is derived mainly from OVM which was based on the eRM by Verisity Design. The advantages of using a universal methodology are that the best practices for an exhaustive verification are coded and UVCs are provided. It is open-source and compatible with all the major commercial simulator like Aldec, Cadence, Mentor Graphics, and Synopsys.

1.4.1 Coverage Driven Verification

UVM provides a complete framework to achieve Coverage Driven Verification combining automatic test-vector generation, self-checking testbench and coverage measurements. UVM has made it possible to create a test environment capable of exploiting "controlled randomness" of the input vectors to discover sooner design bugs. It is also possible to meet verification goals by changing testbench parameters and in this way run specific simulations to reach specific scenarios that are not easy to reach randomly (Corner cases).

Fig. 1.7 clearly shows that random tests are sufficient to reach about 50% of the coverage goal. After the first random simulations, it is necessary to adjust and add

![Figure 1.7: Phases of Random stimuli based verification](image-url)
some constraint to the input sequences in order to reach corner cases.

1.4.2 UVM Components

UVM is based on OOP, this allows to increase reusability, a fundamental concept in the verification process. UVM Library provides a set of useful class from which deriving object and components, each class contains methods to deal with common operations. Thanks to OOP Verification Engineers can derive object and components from base classes and produce any modification to obtain customized classes.

![UVM Classes Diagram](image)

Figure 1.8: UVM Classes Diagram [13]

The uvm_object class is the base class for all UVM data and hierarchical classes. It contains a set of methods for common operations:

- create;
- copy;
- compare;
The `uvm_components` class contains the UVM framework components shown in Tab. 1.1.

<table>
<thead>
<tr>
<th>Component</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>uvm_driver</td>
<td>Drives signals to DUV</td>
</tr>
<tr>
<td>uvm_monitor</td>
<td>Monitor signals</td>
</tr>
<tr>
<td>uvm_sequencer</td>
<td>Create Input vectors</td>
</tr>
<tr>
<td>uvm_agent</td>
<td>Contains Sequencer, Driver and Monitor</td>
</tr>
<tr>
<td>uvm_env</td>
<td>Contain all the components of the framework</td>
</tr>
<tr>
<td>uvm_scoreboard</td>
<td>It represents the checker</td>
</tr>
<tr>
<td>uvm_subscriber</td>
<td>Receive the transaction to perform functional coverage analysis</td>
</tr>
</tbody>
</table>

Table 1.1: UVM components

In UVM, phases are used as a synchronization mechanism in the simulation. In this way, each component has to pass through phases and must wait for other components before moving to the next phase. The main phases are shown in Tab. 1.2

<table>
<thead>
<tr>
<th>Phase</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>build_phase</td>
<td>Components build and instantiation</td>
</tr>
<tr>
<td>connect_phase</td>
<td>Connect components through TLM ports</td>
</tr>
<tr>
<td>end_of_elaboration_phase</td>
<td>Ensure that all the connection are properly set</td>
</tr>
<tr>
<td>start_of_simulation_phase</td>
<td>Initialization of components to avoid zero time dependencies</td>
</tr>
<tr>
<td>run_phase</td>
<td>During this phase time-consuming operation are performed</td>
</tr>
<tr>
<td>extract_phase</td>
<td>Simulation has been completed and results can be extracted</td>
</tr>
<tr>
<td>check_phase</td>
<td>Receive the transaction to perform functional coverage analysis</td>
</tr>
<tr>
<td>report_phase</td>
<td>Display result or summary of check phase</td>
</tr>
</tbody>
</table>

Table 1.2: UVM phases

UVM uses TLM APIs to facilitate the inter-communication between UVM components. Sequences and methods are combined to form a packet (transaction) and each UVM component can use predefined methods (put and get) to send or receive transactions.
1.5 Introduction to RISCV Processors

The Device Under Verification in this thesis work is RV32IMFCXpulp. It is a RISCV Processor developed by the Integrated Systems Laboratory (IIS) of ETH Zürich and Energy-efficient Embedded Systems (EEES) group of the University of Bologna in 2013.

RISC-V is an open standard ISA based on RISC principles, developed at Berkeley into the EECS Department. Originally it was developed to support computer architecture research and education but now it has become a standard architecture. The RISC-V ISA is provided under an open-source license.

The base integer ISA (RV32I) is sufficient to perform basic operation typical of a modern instruction set. It contains 40 unique instruction encoded in four different formats (R/I/S/U). Each of them has a fixed length (32 bit) and must be aligned on a four-bytes in memory. In order to simplify the decoding operations, some fields keep the same position in all formats (like opcode, source and destination register).

RISC-V has 32 integer registers, with the x0 location hardwired to 0 while x1-x31 are general purpose. Except for memory access instruction, instructions operate only with registers. Load and store instructions are used to perform operations to and from memory. Apart from RV32I other extensions have been developed and are usually identified by a letter:

- 'M' Standard Extension for Integer Multiplication and Division;
- 'A' Standard Extension for Atomic Instructions;
- 'F' Standard Extension for Single-Precision Floating-Point;
- 'D' Standard Extension for Double-Precision Floating-Point;
- 'Zicsr' Control and Status Register (CSR);
- 'Zifencei' Instruction-Fetch Fence;
- 'Q' Standard Extension for Quad-Precision Floating-Point;
• "L" Standard Extension for Decimal Floating-Point;
• "C" Standard Extension for Compressed Instructions;
• "B" Standard Extension for Bit Manipulation;
• "J" Standard Extension for Dynamically Translated Languages;
• "T" Standard Extension for Transactional Memory;
• "P" Standard Extension for Packed-SIMD Instructions;
• "V" Standard Extension for Vector Operations;
• "N" Standard Extension for User-Level Interrupts;
• "H" Standard Extension for Hypervisor;
• "Zam" Misaligned Atomics;
• "Ztso" Total Store Ordering.

Some of them has been ratified while some others are still open and subjected to change.
Chapter 2

RISC-V PULP

Before moving to the dissertation of the UVM framework, it is necessary to introduce the device under verification and its main characteristics. The DUV is RV32IMFCXpulp also known as RI5CY and it is a RISC-V processor core developed in collaboration between ETH University and the University of Bologna. RI5CY is an open-source processor provided under a permissible SolderPad open-source license. As the PULP name suggest this processor is concerned about energy efficiency avoiding power consumption when is in idle. The processor block diagram is shown in Fig. 2.1.

![RI5CY Architecture Block Diagram](image)

Figure 2.1: RI5CY Architecture Block Diagram

It is a 32 bit pipelined architecture with 4-stages clearly visible in the previous figure where each stage is separated by pipeline registers:

- Instruction Fetch [IF];
- Instruction Decode [ID];
- Execution [EX];
RISC-V PULP

- WriteBack [WB].

It has a large Instruction set providing support for some of the RISC-V standard extensions and some proprietary extensions. As the name 'RV32IMFCXpulp' suggest the supported extensions are:

- I → Base Integer Instruction Set;
- M → Integer Multiplication and Division Instruction Set;
- F → Single precision Floating point Instruction Set;
- C → Compressed Instruction Set;
- Xpulp → Pulp specific extensions including:
  - Post-incrementing load and stores;
  - Multiply and accumulate extension;
  - ALU Extensions;
  - Hardware Loops.

2.1 Complete ISA with extensions

In addition to the extensions already stated RI5CY supports also the Vectorial instructions. In this section, each of the extension will be briefly analyzed showing the available instructions.

2.1.1 Base Integer

The base integer instruction set contains:

- Integer Computational Instructions;
- Control Transfer Instructions;
- Load and Store instructions;
- Memory Ordering Instructions;
- System Instructions.
**Integer Computational Instructions**

Integer Computational Instructions operate on 32 bits operands stored in the integer register file. They are encoded as R-type and I-type depending on the input operands used to execute the operation. Depending on the input operands they can be furtherly divided in:

- Integer Register-Immediate Instructions (I-Type);
- Integer Register-Register Instructions (R-Type).

The destination is register \( r_d \) for both register-immediate and register-register instructions. The R-type instructions use two operands coming from the register file according to the source addresses specified in the instruction fields while the I-type use an operand coming from the register file (\( r_s1 \)) and the other is specified in the immediate field of the instruction. Immediate must be sign-extended before being used as an operand in the execution stage. The available instructions for the register-immediate are reported in Tab. 2.1.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>000</td>
<td>rd</td>
<td>0010011</td>
<td>ADDI</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>010</td>
<td>rd</td>
<td>0010011</td>
<td>SLTI</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>011</td>
<td>rd</td>
<td>0010011</td>
<td>SLTIU</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>100</td>
<td>rd</td>
<td>0010011</td>
<td>XORI</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>110</td>
<td>rd</td>
<td>0010011</td>
<td>ORI</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>111</td>
<td>rd</td>
<td>0010011</td>
<td>ANDI</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>111</td>
<td>rd</td>
<td>0010011</td>
<td>ANDI</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>111</td>
<td>rd</td>
<td>0010011</td>
<td>ANDI</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rsl</td>
<td>111</td>
<td>rd</td>
<td>0010011</td>
<td>ANDI</td>
</tr>
</tbody>
</table>

Table 2.1: Base Immediate Encoding instructions

ADDI, SLTI, SLTIU, XORI, ORI, ANDI are standard operations that use the whole immediate field to obtain the immediate operand, while SLLI, SRLI and SRAI use only the 5 LSB of the immediate to define the Shift Amount (SHAMT), the other part of the immediate is still necessary for the decode operation.

The available instructions for the register-register are reported in Tab. 2.2.
Table 2.2: Base Register Encoding instructions

Here the two operands are specified by using their register file address in fields \( \text{instr}[24:20] \) and \( \text{instr}[19:15] \) while the field \( \text{instr}[31:25] \) is used to decode and distinguish between ADD-SUB and SRL-SRA.

**Control Transfer Instructions**

RV32I provides two different types of control transfer instructions:

- Unconditional jumps;
- Conditional jumps (i.e. Branches).

The unconditional jumps instructions are JAL and JALR, these two instructions differ on the encoding (J-type for JAL and I-type for JALR) and on the behaviour. Even if both JAL and JALR stores in register rd the instruction following the jump \((pc+4)\) the jump targets are obtained in different ways. In JAL an offset on 20 bit is explicitly provided as immediate and it is sign-extended and added to the address of the jump instruction. The jump target, in this case, is \( \pm 1\text{MiB} \) range. In JALR the target address is obtained by adding the sign-extended 12-bit immediate to the content of register rs1.

The other type of control transfer instructions is the conditional branches. In this kind of instruction the content of two registers is compared, if the resulting condition is true then the branch is taken. The branch target is obtained by sign-extending the 12-bit offset provided as immediate and added to the address of the branch instruction. There are several branch instructions that differs on the type of comparison.

- \( \text{BEQ} \rightarrow \) Branch if is equal;
- \( \text{BNE} \rightarrow \) Branch if not equal;
- \( \text{BLT}/\text{BLTU} \rightarrow \) Branch if lower (Signed and Unsigned);
2.1 – Complete ISA with extensions

- **BGE/BGEU** → Branch if greater equal (Signed and Unsigned);

- **BLE/BLEU** → Branch if lower equal (Signed and Unsigned);

- **BGT/BGTU** → Branch if greater (Signed and Unsigned).

All the control transfer instructions are summarized in Tab. 2.3

|--------------|-------------|-------------|-------------|-----------|-------|

|--------------|-------------|-------------|-------|

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>imm[11:0]</td>
<td>rs1</td>
<td>000</td>
<td>rd</td>
<td>0010111</td>
<td>JALR</td>
</tr>
</tbody>
</table>

Table 2.3: Control Transfer Instructions

**Load and Store Instructions**

RISC-V processors are load and store architecture, standard arithmetic instructions are not allowed to read or write data memory. Only load and store instructions can access RAM to read and write data. This kind of operations is used to transfer values between the registers and memory. In particular, load instructions copy a value from the memory to register rd and stores copy the value contained in rs2 in data memory.

The effective memory address is obtained by adding the content of register rs1 to the 12-bit sign-extended offset. Load and store instructions can work not only with the complete 32-bit word but also with half-words or byte and in those cases, the lower part of the 32-bit data is used. Load and store instructions are summarized in Tab. 2.4
### Memory Ordering Instructions & System Instructions

Memory ordering instructions i.e. FENCE or FENCE-I are used to order device I/O and memory accesses as viewed by other hardware threads, coprocessors and external devices.

System instructions are privileged instructions that in some cases require a certain privilege level. These instructions can be divided into two main groups:

- CSR Operations;
- Privileged.

#### Table 2.4: Load and Store instructions

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>imm[11:0]</td>
<td>rs1</td>
<td>000</td>
<td>rd</td>
<td>0000011</td>
<td>LB</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rs1</td>
<td>001</td>
<td>rd</td>
<td>0000011</td>
<td>LH</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rs1</td>
<td>010</td>
<td>rd</td>
<td>0000011</td>
<td>LW</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rs1</td>
<td>100</td>
<td>rd</td>
<td>0000011</td>
<td>LBU</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rs1</td>
<td>101</td>
<td>rd</td>
<td>0000011</td>
<td>LHU</td>
</tr>
</tbody>
</table>

|--------------|--------------|--------------|--------------|--------------|--------------|------|

#### Table 2.5: System instructions

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>fm-pred-succ</td>
<td>rs1</td>
<td>000</td>
<td>rd</td>
<td>0001111</td>
<td>FENCE</td>
</tr>
<tr>
<td>imm[11:0]</td>
<td>rs1</td>
<td>001</td>
<td>rd</td>
<td>0001111</td>
<td>FENCE.I</td>
</tr>
<tr>
<td>000000000000</td>
<td>00000</td>
<td>000</td>
<td>00000</td>
<td>1110011</td>
<td>ECALL</td>
</tr>
<tr>
<td>000000000001</td>
<td>00000</td>
<td>000</td>
<td>00000</td>
<td>1110011</td>
<td>EBREAK</td>
</tr>
<tr>
<td>001100000010</td>
<td>00000</td>
<td>000</td>
<td>00000</td>
<td>1110011</td>
<td>MRET</td>
</tr>
<tr>
<td>000000000010</td>
<td>00000</td>
<td>000</td>
<td>00000</td>
<td>1110011</td>
<td>URET</td>
</tr>
<tr>
<td>011110110100</td>
<td>00000</td>
<td>000</td>
<td>00000</td>
<td>1110011</td>
<td>DRET</td>
</tr>
<tr>
<td>000100000101</td>
<td>00000</td>
<td>000</td>
<td>00000</td>
<td>1110011</td>
<td>WFI</td>
</tr>
<tr>
<td>csr</td>
<td>rs1</td>
<td>001</td>
<td>rd</td>
<td>1110011</td>
<td>CSRRW</td>
</tr>
<tr>
<td>csr</td>
<td>rs1</td>
<td>010</td>
<td>rd</td>
<td>1110011</td>
<td>CSRSS</td>
</tr>
<tr>
<td>csr</td>
<td>rs1</td>
<td>011</td>
<td>rd</td>
<td>1110011</td>
<td>CSRRC</td>
</tr>
<tr>
<td>csr</td>
<td>uimm</td>
<td>101</td>
<td>rd</td>
<td>1110011</td>
<td>CSRRWI</td>
</tr>
<tr>
<td>csr</td>
<td>uimm</td>
<td>110</td>
<td>rd</td>
<td>1110011</td>
<td>CSRRSI</td>
</tr>
<tr>
<td>csr</td>
<td>uimm</td>
<td>111</td>
<td>rd</td>
<td>1110011</td>
<td>CSRRCI</td>
</tr>
</tbody>
</table>

In Tab. 2.5 are summarized all the instructions of this type. The CSR instructions...
atomically read-modify-write a single Control and Status Register. The CSR address is provided as immediate in instr[31:20]. There are two different versions of the same (CSRRW, CSRRS, CSRRC) instructions, the standard one in which rs1 register is used as an operand and another one in which the operand is provided as a 5-bit immediate value to be zero-extended. Privileged instructions are like ECALL, EBREAK, and so on, which are used to make a service request or to return from a service request.

2.1.2 Multiplication Extension

Multiplication extension contains instructions that are used to multiply and divide values coming from integer register file:

- MUL : 32bit x32bit multiplication, lower 32 bit are stored in rd;
- MULH : 32bit x32bit multiplication, higher 32 bit are stored in rd;
- MULHU: unsigned(32bit) x unsigned(32bit) multiplication, higher 32 bit are stored in rd;
- MULHSU: signed(32bit) x unsigned(32bit) multiplication, higher 32 bit are stored in rd;
- DIV : signed(32 bit)/signed(32 bit) division with rounding toward zero;
- DIVU : unsigned(32 bit)/unsigned(32 bit) division with rounding toward zero;
- REM : return the remainder of the signed division;
- REMU : return the remainder of the unsigned division.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>000</td>
<td>rd</td>
<td>0110011</td>
<td>MUL</td>
</tr>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>001</td>
<td>rd</td>
<td>0110011</td>
<td>MULH</td>
</tr>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>010</td>
<td>rd</td>
<td>0110011</td>
<td>MULHSU</td>
</tr>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>011</td>
<td>rd</td>
<td>0110011</td>
<td>MULHU</td>
</tr>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>100</td>
<td>rd</td>
<td>0110011</td>
<td>DIV</td>
</tr>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>101</td>
<td>rd</td>
<td>0110011</td>
<td>DIVU</td>
</tr>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>110</td>
<td>rd</td>
<td>0110011</td>
<td>REM</td>
</tr>
<tr>
<td>00000001</td>
<td>rs2</td>
<td>rs1</td>
<td>111</td>
<td>rd</td>
<td>0110011</td>
<td>REMU</td>
</tr>
</tbody>
</table>

Table 2.6: Mul/Div Instructions
2.1.3 Compressed extension

The compressed extension named "C" allows a reduction of static and dynamic code size by adding short 16-bit instruction encodings for common integer operations. Exploiting compressed instructions a reduction of code size around 25% is achieved. In general, in order to keep unchanged the processor architectures and support C extension a compressed decoder is introduced in the fetch stage. Its role is to extend a 16-bit instruction to its correspondent on 32-bit, in this way is not required to change the decoder in the decode stage. Compressed instructions are reported in Tab. 2.7, Tab. 2.8 and Tab. 2.9

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>001</td>
<td>uimm[5:3]</td>
<td>rs1</td>
<td>uimm[7:6]</td>
<td>rd</td>
<td>00</td>
<td>C.FLD</td>
</tr>
<tr>
<td>010</td>
<td>uimm[5:3]</td>
<td>rs1</td>
<td>uimm[7:6]</td>
<td>rd</td>
<td>00</td>
<td>C.FLW</td>
</tr>
<tr>
<td>101</td>
<td>uimm[5:3]</td>
<td>rs1</td>
<td>uimm[7:6]</td>
<td>rs2</td>
<td>00</td>
<td>C.FSD</td>
</tr>
<tr>
<td>110</td>
<td>uimm[5:3]</td>
<td>rs1</td>
<td>uimm[7:6]</td>
<td>rs2</td>
<td>00</td>
<td>C.SW</td>
</tr>
<tr>
<td>111</td>
<td>uimm[5:3]</td>
<td>rs1</td>
<td>uimm[7:6]</td>
<td>rs2</td>
<td>00</td>
<td>C.FSW</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>nzimm[5:4][9:6][2:3]</td>
<td>rd</td>
<td>00</td>
<td>C.ADDI4SPN</td>
</tr>
</tbody>
</table>

Table 2.7: Compressed Instructions Quadrant 0

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>nzimm[5]</td>
<td>0</td>
<td>nzimm[4:0]</td>
<td>01</td>
<td>C.NOP</td>
</tr>
<tr>
<td>000</td>
<td>nzimm[5]</td>
<td>rs1/rd!=0</td>
<td>nzimm[4:0]</td>
<td>01</td>
<td>C.ADDI</td>
</tr>
<tr>
<td>011</td>
<td>nzimm[17]</td>
<td>rd!=0{0,2}</td>
<td>nzimm[16:12]</td>
<td>01</td>
<td>C.LUI</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>100</td>
<td>nzuumm[5]</td>
<td>00</td>
<td>rs1/rd</td>
<td>nzuumm[4:0]</td>
<td>01</td>
</tr>
<tr>
<td>100</td>
<td>nzuumm[5]</td>
<td>01</td>
<td>rs1/rd</td>
<td>nzuumm[4:0]</td>
<td>01</td>
</tr>
<tr>
<td>100</td>
<td>0</td>
<td>11</td>
<td>rs1/rd</td>
<td>00-rs2</td>
<td>01</td>
</tr>
<tr>
<td>100</td>
<td>0</td>
<td>11</td>
<td>rs1/rd</td>
<td>01-rs2</td>
<td>01</td>
</tr>
<tr>
<td>100</td>
<td>0</td>
<td>11</td>
<td>rs1/rd</td>
<td>10-rs2</td>
<td>01</td>
</tr>
<tr>
<td>100</td>
<td>0</td>
<td>11</td>
<td>rs1/rd</td>
<td>11-rs2</td>
<td>01</td>
</tr>
</tbody>
</table>

|-------------|--------------|-------------|-------------|-------------|--------------|

Table 2.8: Compressed Instructions Quadrant 1
2.1.4 Post-incrementing Load and Store Instructions

Post-incrementing load and store instructions belong to the proprietary extension of XPulp. This kind of operation perform a load or a store and at the same time increment the address that was used for the memory access. There are two versions that differ on the offset encoding:

- **Register-Register** (offset come from the register file);

- **Register-Immediate** (offset is encoded as immediate).

In both of them, the modified address is written back in the register file (rs1).

Table 2.9: Compressed Instructions Quadrant 2

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>nzuimm[5]</td>
<td>rs1/rd!=0</td>
<td>nzuimm[4:0]</td>
<td>10</td>
<td>C.SLLI</td>
</tr>
<tr>
<td>100</td>
<td>0</td>
<td>rs1!=0</td>
<td>0</td>
<td>10</td>
<td>C.JR</td>
</tr>
<tr>
<td>100</td>
<td>0</td>
<td>rd!=0</td>
<td>rs2!=0</td>
<td>10</td>
<td>C.MV</td>
</tr>
<tr>
<td>100</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>10</td>
<td>C.EBREAK</td>
</tr>
<tr>
<td>100</td>
<td>1</td>
<td>rs1!=0</td>
<td>0</td>
<td>10</td>
<td>C.JALR</td>
</tr>
<tr>
<td>100</td>
<td>1</td>
<td>rs1/rd !=0</td>
<td>rs2!=0</td>
<td>10</td>
<td>C.ADD</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>101</td>
<td>uimm[5:3]</td>
<td>rs2</td>
<td>10</td>
<td>C.FSDSP</td>
</tr>
<tr>
<td>110</td>
<td>uimm[5:2]</td>
<td>rs2</td>
<td>10</td>
<td>C.SWSP</td>
</tr>
<tr>
<td>111</td>
<td>uimm[5:2]</td>
<td>rs2</td>
<td>10</td>
<td>C.FSWSP</td>
</tr>
</tbody>
</table>

Table 2.10: Register-Immediate loads with post increment
### Table 2.11: Register-Register loads with post increment

<table>
<thead>
<tr>
<th>funct7</th>
<th>rs2</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000000</td>
<td>offset</td>
<td>base</td>
<td>111</td>
<td>dest</td>
<td>0001011</td>
<td>p.lb</td>
</tr>
<tr>
<td>0100000</td>
<td>offset</td>
<td>base</td>
<td>111</td>
<td>dest</td>
<td>0001011</td>
<td>p.lbu</td>
</tr>
<tr>
<td>0010000</td>
<td>offset</td>
<td>base</td>
<td>111</td>
<td>dest</td>
<td>0001011</td>
<td>p.lh</td>
</tr>
<tr>
<td>0101000</td>
<td>offset</td>
<td>base</td>
<td>111</td>
<td>dest</td>
<td>0001011</td>
<td>p.lhu</td>
</tr>
</tbody>
</table>

### Table 2.12: Register-Immediate stores with post increment

|-----------|-----|-----|--------|----------|--------|------|

### Table 2.13: Register-Register stores with post increment

<table>
<thead>
<tr>
<th>funct7</th>
<th>rs2</th>
<th>rs1</th>
<th>funct3</th>
<th>rs3</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000000</td>
<td>src</td>
<td>base</td>
<td>100</td>
<td>offset</td>
<td>0101011</td>
<td>p.sb</td>
</tr>
<tr>
<td>0000000</td>
<td>src</td>
<td>base</td>
<td>101</td>
<td>offset</td>
<td>0101011</td>
<td>p.sh</td>
</tr>
<tr>
<td>0000000</td>
<td>src</td>
<td>base</td>
<td>110</td>
<td>offset</td>
<td>0101011</td>
<td>p.sw</td>
</tr>
</tbody>
</table>

### 2.1.5 Hardware Loops

Hardware loops extensions aim to increase the efficiency of small loops in code. In fact, make it possible to execute a certain amount of instruction multiple times without overhead. In order to set up a hardware loop 3 information are required:

- start address;
- end address;
- counter.

These pieces of information are provided through hardware loop instructions. There are two possibilities to set up a hardware loop, the first one is using long commands in which the information is provided using three different instructions, while the second one is using a single instruction to set the three values. The main difference is that short command allows a limited range for the number of instructions contained in the loop. RI5CY support two levels of nested hardware loops so when a hardware loop is set, the level (0 or 1) must be specified in `instr[7]`. Hardware loops and their encoding is summarized in Tab. 2.14
2.1 – Complete ISA with extensions

<table>
<thead>
<tr>
<th>uimmL[11:0]</th>
<th>rs1</th>
<th>funct3</th>
<th>0000</th>
<th>L</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>000000</td>
<td>000</td>
<td>0000</td>
<td>0000</td>
<td>L</td>
<td>1111011</td>
<td>lp.starti</td>
</tr>
<tr>
<td>000000</td>
<td>000</td>
<td>0000</td>
<td>0000</td>
<td>L</td>
<td>1111011</td>
<td>lp.endi</td>
</tr>
<tr>
<td>000000000000</td>
<td>src1</td>
<td>010</td>
<td>0000</td>
<td>L</td>
<td>1111011</td>
<td>lp.count</td>
</tr>
<tr>
<td>000000</td>
<td>000</td>
<td>011</td>
<td>0000</td>
<td>L</td>
<td>1111011</td>
<td>lp.counti</td>
</tr>
<tr>
<td>000000</td>
<td>src1</td>
<td>100</td>
<td>0000</td>
<td>L</td>
<td>1111011</td>
<td>lp.setup</td>
</tr>
<tr>
<td>000000</td>
<td>uimmS[4:0]</td>
<td>101</td>
<td>0000</td>
<td>L</td>
<td>1111011</td>
<td>lp.setuip</td>
</tr>
</tbody>
</table>

Table 2.14: Hardware Loop instruction encoding

2.1.6 ALU Extension

ALU extensions belong to the Xpulp proprietary extension and aim to extend the base instruction set with:

- Bit-Manipulation instructions;
- General ALU instructions;
- Immediate Branching instructions.

As the ALU extension contains a large number of instruction their encoding tables are reported in Appendix A.

2.1.7 Vectorial

Vectorial Instructions is the extension that allows performing operations on subword elements at the same time by splitting the datapath into smaller parts (SIMD). Vectorial instructions can work either on 8-bit(byte) or 16-bit(halfword), and in addition to that three modes influences the second operand.

- 8-bit:
  - Normal mode (vector-vector operation);
  - Scalar Replication (Operand 2 is treated as a scalar and replicated 4 times to form a complete vector);
  - Immediate Scalar Replication (Operand 2 comes from immediate and has to be replicated 4 times).

- 16-bit:
  - Normal mode (vector-vector operation);
  - Scalar Replication (Operand 2 is treated as a scalar and replicated 2 times to form a complete vector);
– Immediate Scalar Replication (Operand 2 comes from immediate and has to be replicated 2 times).

Finally, Vectorial instructions are divided into ALU operations and comparison operations. Vectorial comparisons are done on bytes or on half-words and if the comparison result is true then all the bits of that byte/half-word are set to 1, otherwise to 0. As the Vectorial extension contains a large number of instruction their behaviour tables are reported in Appendix B.

### 2.2 PULP Architecture

Starting from the complete block diagram shown in Fig. 2.2 in this section is going to be described each of the architectural block providing a brief explanation of the functionalities.

![Figure 2.2: RI5CY Architecture Block Diagram](image)

#### 2.2.1 Instruction Fetch stage

Instruction Fetch is the first stage, it is responsible for providing \( \text{addr}_0 \) to the instruction memory, and get the correspondent instruction stored at the given address. Its main architectural blocks are:

- Prefetch Buffer;
- HwLoop Controller;
- Debug Unit;
- Controller;
Prefetch buffer is the component that actually fetches instructions from the instruction memory or instruction cache. It is available in two different versions:

- 32-bit prefetcher: It allocates a 3 entries FIFO which stores the fetched instruction words;
- 128-bit prefetcher: It stores a 128-bit wide cache line.

The usage of 128-bit or 32-bit prefetcher depends on the setting of \texttt{INSTR_RDATA_WIDTH}, and according to its value, only one prefetcher is allocated.

Hwloop Controller is the component responsible for handling hardware loops. Here the current program counter is compared to all the hw-loop end address, and jump to the right start address if the counter is equal to 0. It has a modular approach, and it is configured by setting the \texttt{N_REGS} parameter, in RI5CY, it is set to 2 because only 2 nested hardware loops are supported.

The Debug Unit is directly connected to the RI5CY Debug Interface and has a signal \texttt{debug_req_i}. That request signal makes the core jumps to a specific address where the debug ROM is mapped. This address is defined through \texttt{DM_HaltAddress} parameter.

The RISC-V controller is the main controller of the CPU, it receives signals from all the pipeline stages and according to their transition is able to handle exceptions, interrupt and normal execution.

### 2.2.2 Instruction Decode stage

Even if the instruction decode stage contains only two components it is an important part of the architecture as it is responsible for the de-codification of the instruction and consequently to provide proper signals to the Execution stage. In addition to that is responsible for providing correct input operands to be delivered to the ALU. Its components are the decoder and the GPR.

As explained in the previous section operands can be either stored in the register file or provided as immediate. The immediate extension is performed in this stage and all the possible combination of operands are connected to two multiplexers. According to the signals set inside the decoder, the correct operands are selected and delivered to the next stage.

GPR is a register file with 32 locations, each can contain a 32-bit word, the location x0 is hardwired to 0 and it’s not possible to overwrite it. It has 3 read ports (necessary for three operands operations) and 2 write ports (Write port A is connected to the load and store unit while Write port B is connected to the Execution stage output). If FPU is used then an additional 32-bit Floating point register file is allocated.
2.2.3 Execution stage

The execution stage is responsible for the execution of the operation. It receives proper signal and operands from the decode stage. Operands and signals are then connected to the different Computational blocks allocated. As each of the computational blocks computes a result, the correct result is selected through a multiplexer. In case of load and store operations, the operands (i.e. the data to be stored and the correspondent R/W address) are forwarded to the load and store unit.

The computational blocks allocated are:

- ALU;
- Multiplier;
- FPU;
- APU;
- CSR.
ALU is responsible for the arithmetic operation and also vectorial arithmetic operation, for instance: Shift, Comparisons, Shuffle, BitManipulation and standard logic-arithmetic.

The multiplier is responsible for Integer multiplication, DotP multiplication (i.e. Multiply and Accumulate operations) and operation with complex numbers.

FPU, when enabled, is used to compute the result of operation involving single-precision floating-point operands. The APU is enabled together with the external floating-point unit \( \text{fpnew}\_\text{pkg} \), exploiting an OBI-interface APU and FPU are able to communicate and results of operations computed outside of the core are available in the execution stage.

Control and Status Register are allocated in the execution stage in order to make it possible to execute atomic CSR instructions. In fact CSR instructions read the actual value stored in the control and status register and save it in \( \text{rd} \), and at the same time, the value is overwritten.

### 2.2.4 WB Stage

The last stage is the write-back, in reality, as shown before the writeback is not responsible for the register file store operation that is going to be performed at the end of the execution stage. The operations done in this stage are mainly related to the load and store operation involving data memory. In case of misaligned memory access, it is necessary to sign-extend data read from data memory, this operation is performed inside the load and store unit to provide the final operand to be stored in the register file.
Chapter 3

UVM Testbench

The starting point of this work is the realization of the UVM Framework. According to the reuse philosophy of UVM, all the components are derived from the UVM base classes exploiting inheritance.

Starting from the given UVM testbench, a large number of architectural modifications were required to fit the UVM framework to the device under verification. For instance, it was necessary to introduce an additional agent with all its sub-components in order to capture input and output transaction from the DUV. The agent in charge of capturing input transition is also responsible for driving the input sequence, for this reason, it can be considered an active agent (as it includes the UVM Driver) while the other agent is a passive entity and its role is to collect internal signals of the DUV, packing them and put transactions to the scoreboard. It was also necessary to define two different interfaces, each of them represents the set of signals to be captured from the DUV at different time instant.

3.1 Overall Structure

In this section the framework structure is briefly analyzed, highlighting the main components and their role in the verification process. A schematic view is shown in Fig. 3.1. The two main components are the \texttt{tb\_top} and the \texttt{uvm\_test}. The wrapper included in the \texttt{tb\_top} is a useful component that has been created to encapsulate the device under-verification and the RAM. Even if the RAM is not to be verified, it is an essential component to make the processor working correctly. In fact, as the RISC-V processor includes load and stores instructions a \(2^{(XLEN)}\) memory is required to be inserted alongside the device. The \texttt{uvm\_test} instantiate the UVM environment that includes 3 main blocks:

- Agent\_in;
- Agent\_out;
• Scoreboard.

Agent_in is an active agent and is capable of driving and capturing signals. In particular, the sequencer is the component in charge of generating random test-vectors according to the processor interface, the generated inputs are then sent to the driver. Once the driver receives the input transactions from the sequencer, it dispatches them to the DUV exploiting the input interface and respecting a certain protocol. Monitor_in, on the other hand, is the input transaction collector, its role is to capture a set of signal inside the DUV and send them to the scoreboard. Agent_out is simply a passive entity capable of catching output transaction (actual results of the operation). Output transaction is then sent to the scoreboard. Once the scoreboard has received the input transaction (input stimuli) and the output transaction (results) it evaluates the expected results of the operation and compares them to the actual results, providing a pass or fail.

Figure 3.1: UVM Framework Structure
3.2 Top

The top hierarchy of the UVM testbench is `tb_top_uvm.sv`. The operation performed in this module are the following:

- Clock generation;
- Reset de-assertion;
- Interfaces assignments;
- Wrapper instantiation;
- test run.

The first two operations are fundamental to define the system clock and to de-assert the `rst_n` after a certain number of cycles defined in the const int `RESET_WAIT_CYCLES`. Interface assignments are the operations necessary to define the connection between the DUV and the UVM interface. All the required signals have to be written hierarchically in the interfaces declarations in order to be available in the UVM framework. Exploiting this approach any of the UVM components that has a connection with the interfaces is able to catch signals coming from the device under verification.

After that Wrapper is instantiated UVM test is launched using `run_test()`.

3.3 Wrapper

The wrapper is the component used to encapsulate the device under test i.e. `riscv_core.sv` and the RAM `data_ram.sv`. Inside the wrapper signals to read and write from and to memory are connected to the core, and some other signals are set to their default value. In this way, the device to be instantiated in the `tb_top` is the wrapper and its signals, both input and output are reduced. The wrapper allows having a clear interface in which only the signals to be driven by UVM Framework are available. Other connections are hidden inside the wrapper module.

3.4 Interfaces

In UVM the DUV is static, as a result, the communication between the testbench and the DUV cannot be done as users used to do in classic test-benches. In UVM Virtual interface feature is used, it represents a collection of signals used to drive and monitor the DUV from the testbench. The direction of the signals is decided by the `mod-ports`. In addition to that clocking-blocks are used to synchronize the
sampling instant of the signals belonging to the same block. In order to better understand how virtual interfaces work, they can be considered as a handle pointing to the interface instance. Using this approach the testbench can access the DUV signals through the virtual interface and vice versa. Interfaces are defined in `processor_interface.sv` and `processor_interface_out.sv`. Each of the UVM components that need to monitor or drive interface signals must declare a virtual interface instance of that interface and get the reference of the interface from the UVM configuration database.

### 3.4.1 Interface in

The input interface, described in `processor_interface.sv`, contains a set of signals to be driven and another set of signals to be monitored. According to that two mod-ports and their clocking-block are defined. The input interface block scheme is shown in Fig. 3.2.

```verilog
clocking driver_cb @ (posedge clk);

//Instruction signals
output instr_in;
output instr_gnt;
output instr_rvalid;

//Interrupt signals
output irq_in;
output irq_id_in;
output irq_sec_in;
```

**Figure 3.2: Processor interface block diagram**

driver_cb

Signals to be driven belong to the `driver_cb`, are synchronized to the positive edge of the clock signal and allows the driver to push input vectors to the DUV at each clock cycle.
Signals in `driver_cb` are defined as output, even if it seems to be not very intuitive, it depends on the fact the interface is considered from the testbench point of view, so the signals which are modified by the driver are the output of the testbench and input of the DUV.

### Monitor_cb

The signals belonging to the `monitor_cb` are synchronized to the negative edge of the clock signal to be sure that the input transaction has been delivered to the DUV. For the same reason explained before, here signals are defined as input.

```verilog
    clocking monitor_cb @ (negedge clk);
    //Instruction signals
    input instr_addr;
    input instr_req;
    input core_busy;
    //Interrupt signals
    input irq_ack_out;
    input irq_id_out;
    //Internal DUV Signals
    input instr_core;
    input rs1_address;
    input rs2_address;
    input rs3_address;
    input rd_address;
    input illegal_insn;
    input pc_value;
    input reg_file_i;
    input b_mask_a;
    input b_mask_b;
    input jump_target;
    input csr_rdata;
    endclocking : monitor_cb
```

#### 3.4.2 Interface out

The out interface is used to collect internal signals of the DUV after that operations have been completed. It represents the collection of the actual results provided by the DUV. Here signals are going only from the device to the testbench, for this reason, there is no need to insert a driver clocking block. The input interface block scheme is shown in Fig. 3.3.
All the signals defined in the out interface belong to the `monitor_out` clocking block.

```systemverilog
clocking monitor_out_cb @ (negedge clk);
  input reg_file_o;
  input pc_value_o;
  input wdata_mem;
  input rdata_mem;
  input csr_rdata;
  input csr_wdata;
  input hwlp_start;
  input hwlp_end;
  input hwlp_cnt;
endclocking : monitor_out_cb
```

### 3.5 Sequences

The UVM System Verilog library provides the `uvm_sequence_item` as a base class to describe data items. Data items are the transactions either to be collected or to be driven in the DUV. Transaction items in the verification framework are derived from the class `uvm_sequence_item` which provide a set of useful methods to randomize transaction fields and to compare or print transaction objects.

The sequence-item is composed of data fields required to generate the stimulus and in that case, are defined as `rand` and can have constraint ranges defined. Data fields can represents also analysis information coming from the DUV for example responses, internal signals, error signals. In the UVM framework developed for processor under-verification, there are two different sequence objects. The first one is related to the input transactions while the other one is associated with the output transactions. The definition of the two sequence object is in `processor_sequence.sv` and in `packet_out.sv`. 
3.5 – Sequences

3.5.1 Processor Sequence

The transaction object in processor sequence is `processor_transaction`, in the following snippet of code the signals included are shown. Some of them are defined as rand to randomize input values and create input vectors. In reality, it was used as a first approach to verify that the UVM environment was working properly. As soon as it was clear that the environment was working properly an external python program was used to generate random instructions to be fed to the processor. That choice was done to simplify the randomization procedure, in fact, as the DUV has a large instruction set it was too complicated to deal with the UVM randomization feature.

```verbatim
class processor_transaction extends uvm_sequence_item;
    `uvm_object_utils(processor_transaction)

    bit instr_gnt;
    bit instr_rvalid;
    bit instr_addr;
    bit instr_req;
    bit irq_in;
    bit irq_id_in;
    bit irq_sec_in;
    bit irq_ack_out;
    bit irq_id_out;
    bit core_busy;
    bit [31:0] instr;
    bit [31:0] instr_core;
    bit [31:0] instrn;
    bit [4:0] rs1_address;
    bit [4:0] rs2_address;
    bit [4:0] rs3_address;
    bit [4:0] rd_address;
    bit [31:0] pc_value;
    bit [31:0] jump_target;
    bit [31:0][31:0] reg_file;
    bit [4:0] b_mask_a;
    bit [4:0] b_mask_b;
    bit [31:0] csr_rdata;
    bit illegal_insn;

    //RANDOMIZATION
    rand bit [11:0] immediate;
    rand bit [4:0] rs1;
    rand bit [4:0] rs2;
```

rand bit [4:0] rs3;
rand bit [4:0] rd;

constraint my_range_1 {rs1 >=5'b00000; rs1<5'b11111; }
constraint my_range_2 {rs2 >5'b00000; rs2<5'b11111; }
constraint my_range_4 {rs3 >5'b00000; rs3<5'b11111; }
constraint my_range_3 {rd >5'b00001; rd<5'b11111; }
constraint myrange4 {immediate>12'b0; rd<12'b00111111111;}

function new (string name = "");
    super.new(name);
endfunction

endclass: processor_transaction

In processor_sequence are also defined the sequencer operations. In the body task of inst_sequence, an object of type processor_transaction is created. The random instructions that were written in instructions.txt by the random generator, are read line by line and associated with the transaction object. Then finish_item() method is used to signal that the transaction object has been completed. Now the sequencer is responsible for redirecting the created sequence to the driver. In the last part of the processor_sequence, the behaviour of the sequence is described through a simple for loop, in this way each time that the driver calls get_next_item() a new transaction is available.

### 3.5.2 Packet out

Packet out is simply the container of the output transaction and it is extended from the uvm_sequence_item base class. The signals to be captured from the DUV are shown in the following piece of code.

class packet_out extends uvm_sequence_item;

`uvm_object_utils(packet_out)

bit [31:0][31:0] reg_file;
bite [31:0] pc_value;
bite [31:0] wdata_mem;
bite [31:0] rdata_mem;
bite [31:0] csr_rdata;
bite [31:0] csr_wdata;
bite [31:0] hwlp_start;
bite [31:0] hwlp_end;
3.6 Environment

In the UVM framework, the environment is the container class, in general, it contains one or more agents, and other components such as the monitor, the scoreboard, and the subscriber. The processor environment is defined in processor_env.sv and contains the instantiation of:

- Agent;
- Agent out;
- Scoreboard;
- Subscriber.

The environment operations are performed only during the build and connect phases. In fact, after the instantiation of the UVC’s during the build phase the creator function is called to create each of them. During the connect phase, the analysis ports of Driver and Monitors are connected to the implementation of the analysis port in the Scoreboard. This is a very important step as analysis ports are the way transactions move throughout the UVM framework. A scheme of the connection is shown in Fig. 3.4.
Note that the dots on the scoreboard represent the implementation of the analysis ports while the diamonds represent the analysis port.

### 3.7 Agents

As explained in the previous section this UVM framework requires two separated agents both derived extending `uvm_agent` base class. The first one is the active agent and it is used to drive and capture the transaction, while the second one is a passive agent and is used only to monitor signals inside the DUV. An active agent typically contains a driver, a sequencer, and a monitor while a passive agent consists of only the monitor.

#### 3.7.1 Agent in

Agent in is the active agent of the framework and it is defined in `processor_agent.sv`. Agent in is responsible for the creation of the required UVC’s during the build phase and for the connection of the sequencer export port with the driver port during the connect phase. In addition to that during the build phase two text files are opened and their file descriptor is returned. The UVC’s are created by calling the constructor function as shown in the following piece of code.

```cpp
1 driver = processor_driver::type_id::create("driver", this);
2 mon = processor_monitor::type_id::create("mon", this);
3 sequencer = uvm_sequencer#(processor_transaction)::type_id::create("sequencer", this);
```
The two files opened here are:

- **instruction.txt**: it is the file in which the generator writes the random instructions to be sent to the processor. Opening it here make it available in sequencer which is going to read its content line by line;

- **illegal_dump.txt**: it is used inside the scoreboard. Each time that an illegal instructions has been recognized then it is written to that file. It will be useful to check if the generator is producing proper test vectors or not.

### 3.7.2 Agent out

Agent out is the passive agent and as a result, it contains only the declaration of the `monitor_out` component and its creation during the build phase.

### 3.8 Driver

The driver, derived from `uvm_driver` base class is responsible for sending the input vectors received from the sequencer to the DUV. This operation must be done accordingly to the protocol specified in the RI5CY User Manual. The signals required are:

- **instr_rdata_i**: Data read from instruction memory (i.e. the instruction to be sent);

- **instr_rvalid_i**: This signal will be high for exactly one cycle per request. Is used to signal that instr_rdata holds valid data;

- **instr_gnt_i**: The other side accepted the request.

During the build phase, the Drv2Sb (Driver to Scoreboard) port is created. Most of the driver operations are done during the `run_phase` task. In fact, if during a positive edge of the clock the request signal is high, then the driver gets a new transaction object from the sequencer using `get_next_item()` method and associate the signals contained in the transaction to the DUV signals. Once the signal has been driven to the DUV the transaction is sent to the scoreboard through the Drv2Sb port using the `write()` method.

### 3.9 Monitors

The monitor is used to extract signals information from the internal bus of the DUV and translate them into transactions. After that transaction has been captured it is sent to the scoreboard. Both the monitor works in the same way, the only difference is related to the type of transaction object and the signals contained in it.
3.9.1 Monitor in
In the input monitor during the run phase, a transaction object `pros_trans` is created and at the negative edge of clock, signals from the processor virtual interface are redirected to the transaction object. Then the processor transaction object is sent to the Scoreboard through the Mon2Sb port. Input Monitor has crucial importance as the scoreboard need to know if the transaction sent to the DUV has been received, and some internal signals are required to evaluate the expected results starting from the current situation (e.g. the current value of the register file).

3.9.2 Monitor out
Monitor out, on the other hand, is used to collect signals information after that operation has been completed. Using the same approach signals contained in the virtual out interface are redirected to a `packet_out` transaction object that is then sent to the Scoreboard through the `Mon2Sb_port_out`. The signals collected here represent the actual results of the instruction sent by the driver.

3.10 Scoreboard
The Scoreboard is the most complex component in the UVM framework. It is derived by extension from the `uvm_scoreboard` base class. It has the important role of verifying that everything has worked as expected by looking at input and output transaction. Up to now, we have seen that transaction objects have been moved throughout the UVM framework using analysis port. Most of them are directed to the scoreboard where their implementation is defined. Analysis port works like a callback, so each time that a UVC’s write a transaction to the analysis port, the correspondent callback function is executed. It is a non-blocking mechanism that avoids time-delay in the verification framework, but it has a drawback as we need to synchronize the received transaction in order to perform meaningful comparisons. For this reason, the callback function of these analysis ports is used to put transactions into `uvm_tlm_fifo`.

The scoreboard is responsible for decoding the instruction and perform the associated operation in order to compute the expected result, compare the expected result to the actual result coming from the output transaction and provide a PASS or FAIL signal. Most of the thesis work was done on this component, in fact as the scoreboard embeds a sort of decoder and the reference model, it was very difficult to build this component.

During the run phase, the transaction objects are consumed from the FIFO using the `get()` method. Once the packets are synchronized two void functions are executed:
3.10 – Scoreboard

- function void print_reg(packet_out pack_out);

- function void decode_check(processor_transaction out_trans, processor_transaction exp_trans, packet_out pack_out).

The first one is used to print out on the terminal the actual content of the register file of the DUV. The second one is the main function of the scoreboard. Before starting the decode of the instruction, a check between the driven instruction and the input one to the instruction fetch stage is performed. The result of the comparison is shown through 'uvm_info:

- 'uvm_info ("1_INSTRUCTION_WORD_PASS ", $sformatf("Actual Instruction=%h Expected Instruction=%h ", out_trans.instr_core, exp_trans.instrn), UVM_LOW);

- 'uvm_info ("1_INSTRUCTION_ERROR ", $sformatf("Actual Instruction=%h Expected Instruction=%h ", out_trans.instr_core, exp_trans.instrn), UVM_LOW).

There are others 'uvm_info used to signals certain conditions. As an example some results of simulation are shown in Fig. 3.5, Fig. 3.6 and Fig. 3.7.

Figure 3.5: Result of AUIPC

Figure 3.6: Result of Branch not taken
The meaning of `uvm_info` signals are explained in the final part of this chapter.
Now the decode and check function will be analyzed.

### 3.10.1 Decode_check

As explained before, this function receives the transaction objects and for sake of simplicity, the signals contained in transaction objects will be assigned to signals before moving to the decode and check part.

```verilog
//DECLARATION FOR EXP_TRANS
bit [4:0] exp_rs1,exp_rs2,exp_rd;

//DECLARATION FOR OUT_TRANS
bit [4:0] out_rs1,out_rs2,out_rs3,out_rd;
bit [31:0][31:0] in_reg_file;
bit illegal_found;
bit [31:0] in_pc_value;
int shamt;
bit [31:0] out_instr;
bit [31:0] csr_data;
bit [31:0] jump_target;

//DECLARATION FOR PACK_OUT
bit [31:0][31:0] out_reg_file;
bit [31:0] out_pc_value;
bit [31:0] out_wdata_mem;
bit [31:0] out_rdata_mem;
bit [31:0] out_csr_wdata;
bit [31:0] out_csr_rdata;
bit [31:0] hwlp_start;
bit [31:0] hwlp_end;
bit [31:0] hwlp_cnt;

//GENERIC VARIABLES
bit[31:0] expected_res;
bit[31:0] tmp32_0,tmp32_1,tmp32_2;
```
In the previous piece of code, signals are declared while in the next piece useful assignments are done to redirect transaction signals to the internal signals.

```c
// ASSIGNMENTS EXPECTED VARIABLES
exp_rs1=exp_trans.instrn[20:16];
exp_rs2=exp_trans.instrn[24:20];
exp_rd=exp_trans.instrn[11:7];

// ASSIGNMENTS PACK_OUT VARIABLES
out_reg_file=pack_out.reg_file;
out_pc_value=pack_out.pc_value;
out_wdata_mem=pack_out.wdata_mem;
out_rdata_mem=pack_out.rdata_mem;
out_csr_rdata=pack_out.csr_rdata;
out_csr_wdata=pack_out.csr_wdata;
hwp_start=pack_out.hwp_start;
hwp_end=pack_out.hwp_end;
hwp_cnt=pack_out.hwp_cnt;

// ASSIGNMENTS OUT_TRANS VARIABLES
out_rs1=out_trans.rs1_address;
out_rs2=out_trans.rs2_address;
out_rs3=out_trans.rs3_address;
out_rd=out_trans.rd_address;
in_reg_file=out_trans.reg_file;
illegal_found=out_trans.illegal_insn;
in_pc_value=out_trans.pc_value;
shamt=out_rs2;
out_instr=out_trans.instr_core;
csr_data=out_trans.csr_rdata;
jump_target=out_trans.jump_target;

// ASSIGNMENTS IMMEDIATE VARIABLES
```
To avoid redundancy, the meaning of signals have not been explained before. Now that UVM Framework is going to use them a brief description is required.

- **exp_rs1**: It is the expected address on 5-bit of source register 1 coming from driver transaction;
- **exp_rs2**: It is the expected address on 5-bit of source register 2 coming from driver transaction;
- **exp_rd**: It is the expected address on 5-bit of destination register coming from driver transaction;
- **out_reg_file**: It is the [32bit|x][32bit] register file coming from monitor out and used to check the actual result of the computation;
- **out_pc_value**: This signal hold the program counter value coming from monitor out, useful to check if Control Transfer instruction has been executed correctly;
- **out_wdata_mem**: This signal contains the data to be written in data memory sampled by monitor out;
- **out_rdata_mem**: This signal contains the data to be read from data memory sampled by monitor out;
- **out_csr_rdata**: This signal represents the data that has been read from Control and Status Register file coming from monitor out;
• **out csr wdata**: This signal represents the data that has been written in Control and Status Register file coming from monitor out;

• **hwlp_start**: When a hardware loop is set, start address value is going to be written in CSR, this signal holds the value that has been written;

• **hwlp_end**: When a hardware loop is set, end address value is going to be written in CSR, this signal holds the value that has been written;

• **hwlp_cnt**: When a hardware loop is set, the counter value is going to be written in CSR, this signal holds the value that has been written;

• **out rs1**: It is the actual address on 5-bit of source register 1 coming from input monitor;

• **out rs2**: It is the actual address on 5-bit of source register 2 coming from input monitor;

• **out rs3**: For three operands operation it is the actual address on 5-bit of source register 3 coming from input monitor;

• **out rd**: It is the actual address on 5-bit of destination register coming from input monitor;

• **in_reg_file**: This signal on [32bit]x[32bit] holds the register file before that instruction has been executed, It is used to get operands and evaluate the expected result;

• **illegal_found**: This signal raise when an illegal instruction is encountered. When it happens the scoreboard signal it using an ’uvm_info and the illegal instruction is written in 'illegal_dump.txt';

• **in pc value**: This signals hold the program counter value sampled by input monitor, is used in some instructions to evaluate jump target (JAL, JALR) and in arithmetic instruction (AUIPC);

• **shamt**: It is an integer value specified in SRL and SRA instructions to specify the shift amount, it can be represented on 5 bit as the maximum shift amount is 32;

• **out instr**: It is the actual instruction sampled by the input monitor that is going to be compared to the one sent by the driver;

• **csr data**: It is the value contained in CSR before that atomic RW operation has been executed;
• **jump_target**: This signal is the Jump target sampled by input monitor, in reality, is never used but during the verification process was used to check some conditions;

• **imm_i_type**: Immediate sign extended;

• **imm_iz_type**: Immediate zero extended;

• **imm_s_type**: S-type immediate;

• **imm_u_type**: U-type immediate;

• **imm_uj_type**: UJ-type immediate;

• **imm_s2_type**: S2-type immediate;

• **imm_bi_type**: BI-type immediate;

• **imm_s3_type**: S3-type immediate;

• **imm_vs_type**: VS-type immediate (Sign-extension for vectorial type);

• **imm_vu_type**: VU-type immediate (Zero-extension for vectorial type);

• **imm_shuffleb_type**: SHUFFLEB-type immediate (SHUFFLE type for byte vectorial);

• **imm_shuffleh_type**: SHUFFLEH-type immediate (SHUFFLE type for half-word vectorial);

• **imm_clip_type**: CLIP-type immediate;

• **bitmask_first**: Signal used for bit-manipulation instructions;

• **bitmask**: Signal used for bit-manipulation instructions, it is evaluated by left shifting bitmask of an amount coming from input monitor;

• **bitmask_inverse**: Signal used for bit-manipulation instructions, it is simply the bitwise negation of bitmask.

The decoding phase starts with a check on the expected instruction (coming from driver transaction) and actual instruction (coming from the input monitor). If this check was successful then we can move on to identify the instructions using its opcode (i.e. **instr[6:0]**) and its function fields (i.e. **instr[14:12]** function3 and **instr[31:25]** function7) according to the type of instructions. The scoreboard **decode_check** function is shown in Fig. 3.8, all the case and if are collapsed because otherwise, it is not readable.
Finally, inside each of the opcode cases, the instruction is recognized and the expected value is computed. Then it is compared to the content of register rd coming from output monitor. As an example is reported in the following piece of code the complete decode and check for OPIMM Instructions.

```verilog
7'b0010011:begin //OPCODE_OPIMM
  'uvm_info("_OPCODE_OPIMM", $sformatf("Instruction is \%h\n", out_trans.instr_core), UVM_LOW)
  if(illegal_found==1) begin
    'uvm_error("_ILLEGAL_INSN", $sformatf("RAISED ILLEGAL SIGNAL IN CONTROLLER"))
    $fdisplay(processor_agent.dump_illegal,"\%h",out_trans.instr_core);
  end
  else begin
    case(out_trans.instr_core[14:12])
      3'b000: begin
        expected_res=imm_i_type+in_reg_file[out_reg_file[out_rd][31:0]]
        if(expected_res==out_reg_file[out_rd][31:0]) begin
          'uvm_info("_ADDI_SUCCESS", $sformatf("Actual Calculation=\%h", out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
        end
        else begin
          'uvm_info("_ADDI_FAILED", $sformatf("Actual Calculation=\%h", out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
        end
      end
      3'b010: begin
        expected_res = $signed(in_reg_file[out_reg_file[out_rd][31:0]]) < $signed(imm_i_type) ? 32'h00000001 : '0;
        if(expected_res==out_reg_file[out_reg_file[out_rd][31:0]]) begin
          'uvm_info("_SLTS_SUCCESS", $sformatf("Actual Calculation=\%h", out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
        end
      end
    endcase
  end
end
```

Figure 3.8: decode and check function collapsed
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
end
else begin
  'uvm_info ('4_SLTS_FAILED', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
end

3'b011: begin
  //comparison evaluation
  expected_res = $unsigned(in_reg_file[out_rs1][31:0]) < $unsigned(imm_i_type) ? 32'h00000001 : '0;
  if(expected_res==out_reg_file[out_rd][31:0]) begin
    'uvm_info ('4_SLTU_SUCCESS', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
  else begin
    'uvm_info ('4_SLTU_FAILED', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
  end

3'b100: begin
  expected_res=imm_i_type^in_reg_file[out_reg_file[out_rd][31:0],expected_res);
  if(expected_res==out_reg_file[out_rd][31:0]) begin
    'uvm_info ('4_XORI_SUCCESS', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
  else begin
    'uvm_info ('4_XORI_FAILED', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
end

3'b110: begin
  expected_res=imm_i_type|in_reg_file[out_reg_file[out_rd][31:0],expected_res);
  if(expected_res==out_reg_file[out_rd][31:0]) begin
    'uvm_info ('4_ORI_SUCCESS', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
  else begin
    'uvm_info ('4_ORI_FAILED', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
end

3'b111: begin
  expected_res=imm_i_type&in_reg_file[out_reg_file[out_rd][31:0],expected_res);
  if(expected_res==out_reg_file[out_rd][31:0]) begin
    'uvm_info ('4_ANDI_SUCCESS', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
  else begin
    'uvm_info ('4_ANDI_FAILED', $sformatf('Actual Calculation=%h
Expected Calculation=%h\n",out_reg_file[out_rd][31:0],expected_res), UVM_LOW)
  end
end
end
else begin
  `uvm_info ('4_ANDI_FAILED', $sformatf("Actual Calculation=%h
Expected Calculation=%h
", out_reg_file[out_rd][31:0], expected_res), UVM_LOW)
end
end

3'b001: begin
  expected_res=in_reg_file[out_rs1][31:0]<<shamt;
  if(expected_res==out_reg_file[out_rd][31:0]) begin
    `uvm_info ('4_SLLI_SUCCESS', $sformatf("Actual Calculation=%h
Expected Calculation=%h
", out_reg_file[out_rd][31:0], expected_res), UVM_LOW)
  end
else begin
  `uvm_info ('4_SLLI_FAILED', $sformatf("Actual Calculation=%h
Expected Calculation=%h
", out_reg_file[out_rd][31:0], expected_res), UVM_LOW)
end

3'b101: begin
  if(out_trans.instr_core[31:25]==7'b0) begin
    expected_res=$signed(in_reg_file[out_rs1][31:0]) >> $signed(shamt);
    if(expected_res==out_reg_file[out_rd][31:0]) begin
      `uvm_info ('4_SRLI_SUCCESS', $sformatf("Actual Calculation=%h
Expected Calculation=%h
", out_reg_file[out_rd][31:0], expected_res), UVM_LOW)
    end
  end
else begin
  `uvm_info ('4_SRLI_FAILED', $sformatf("Actual Calculation=%h
Expected Calculation=%h
", out_reg_file[out_rd][31:0], expected_res), UVM_LOW)
end

else if(out_trans.instr_core[31:25]==7'b0100000) begin
  expected_res=$signed(in_reg_file[out_rs1][31:0]) >>> $signed(shamt);
  if(expected_res==out_reg_file[out_rd][31:0]) begin
    `uvm_info ('4_SRAI_SUCCESS', $sformatf("Actual Calculation=%h
Expected Calculation=%h
", out_reg_file[out_rd][31:0], expected_res), UVM_LOW)
  end
else begin
  `uvm_info ('4_SRAI_FAILED', $sformatf("Actual Calculation=%h
Expected Calculation=%h
", out_reg_file[out_rd][31:0], expected_res), UVM_LOW)
end
case
endcase
end
3.10.2 Summary of simulation

In the summary of the simulation, the count of each uvm_info signal is reported. To simplify the extraction of the useful information from the summary a number was inserted at the begin of the string.

- 1_INSTRUCTION_WORD_PASS or 1_INSTRUCTION_ERROR;
- 1_BRANCH_TAKEN or 1_BRANCH_NOT_TAKEN;
- 2_ILLEGAL_OPCODE or 2_OPCODE XXX;
- 3_ILLEGAL_INSN;
- 4_OP_SUCCESS or 4_OP_FAILED.

Exploiting this approach the plot_report() function included in functions.py can extract the needed information from the simulation log and display a plot bar allowing a faster check of the performances. An example of plot is shown in Fig. 3.9.

![Instruction statistics](image)

Figure 3.9: Plot of simulation summary
Chapter 4
Simulation Environment

In this chapter will be analyzed the python environment developed the support of the UVM framework. To speed up the verification procedure it was decided to develop a random instructions generator mainly based on a database of valid instructions extracted from the RTL hardware description. Finally, a graphical user interface based on Tkinter has been developed, in this way user can specify the number and type of instructions to be tested, compile and run a simulation and collect the results. In the first section will be explained how the ISA Database has been extracted and used, in the second the Random generator features are shown while in the last section the GUI functionality will be explained.

4.1 ISA Database

A large part of this work was dedicated to the extraction of the available instruction set. It was not so easy as for a certain kind of instructions there was a lack of documentation. The first approach was to identify all the valid opcodes accepted by the device under verification, which was done by simply looking at the riscv_defines.sv. Using an extractor script (Extractor.py) and a database creator (db_opcode_32_creator.py) the following database is obtained.

```python
name_byopcode=
    "1110011" : "OPCODE_SYSTEM",
    "0001111" : "OPCODE_FENCE",
    "0110011" : "OPCODE_OP",
    "0110011" : "OPCODE_OPIMM",
    "0000011" : "OPCODE_LOAD",
    "1100011" : "OPCODE_BRANCH",
    "1100111" : "OPCODE_JALR",
    "1101111" : "OPCODE_JAL",
```

51
The second step was the larger time-consuming operation as it was necessary to
find all the possible instructions. A reverse-engineering operation was fundamental to extract from the riscv_decoder.sv the complete instruction set. The encoding of the instructions was reported in a .csv for sake of simplicity. As the random generator has been developed in python the best way to having a database without duplication of elements is using a set of dictionaries. In fact, python dictionaries are used to store data values in key: value pairs and does not allow duplicates. So starting from two CSV databases:

- ISA.csv;
- ISA_C.csv.

Those CSV files have been converted in a database using respectively db_isa_32_creator.py and db_isa_16_creator.py. The resulting database is a simple structure in which instructions are divided by Encoding types (R,RI,I,S,BM,IB,R4,V) and it is possible to search for instructions using different criterion (i.e. keys). For each type of instructions the instruction database is generated using a snippet of code like the following one:

```python
f_write = open("database/db_ISA_32.py","a+")
f_write.write("####################################
")
f_write.write("#R_type_DB"
")
f_write.write("####################################
")
f_write.write("#R_TYPE_BYINSTR_NAME={
")
for c in range(0, len(r_instructions)):
    f_write.write("""{}" :"{}",
". format(r_instructions[c],r_names[c]))
f_write.write("}
")
#search by name and returns instruction complete
f_write.write("R_TYPE_BYNAMES_INSTR={
")
for c in range(0, len(r_instructions)):
    f_write.write("""{}" :"{}",
". format(r_names[c],r_instructions[c]))
f_write.write("}
")
#search by name and returns OPCODE
f_write.write("R_TYPE_BYNAMES_OPCODE={
")
for c in range(0, len(r_instructions)):
    f_write.write("""{}" :"{}",
". format(r_names[c],r_opcodes[c]))
f_write.write("}
")
#search by name and returns instruction complete
f_write.write("R_TYPE_BYNAMES_FUNCT3={
")
for c in range(0, len(r_instructions)):
    f_write.write("""{}" :"{}",
". format(r_names[c],r_funct3s[c]))
f_write.write("}
")
```
Some of the keys are never used, in general, the approach used in the random generator is to get all the instruction names using their opcodes, and after that get the full instructions exploiting the previously obtained name.

## 4.2 RV Generator

The instruction set database was the starting point for the random generator. Once the database has been created the next step is to randomize among opcodes and then among of instructions of that database. In rvgen2.py database dictionaries are imported in order to be available in the current program and then are cast to list. Using lists instead of dictionaries is better as in this way we are able to use random.choice() method to get a random element from a list.

The main function in rvgen2.py is rv_generator, it accepts as arguments 3 parameters:

- num: the number of random instructions to be generated;
- sel: it is a string containing ones and zeros and it is used to select the subset of opcodes to be included in randomizations;
- log: it is a 1 or 0 and gives the possibility of avoiding printing on stdout the log on instruction randomizations.
num does not represent the actual number of instructions generated, in fact, to get the real number we should add prologue instructions, and consider that Control Transfer instructions lead to additional instructions. sel is really important in a CDV (Coverage Driven Verification) environment as allow us to select only opcodes that were not fully covered in previous simulations. Its value is set in the GUI and follow table 4.1

<table>
<thead>
<tr>
<th>OP</th>
<th>FENCE</th>
<th>HWLOOP</th>
<th>SYSTEM</th>
<th>VECOP</th>
<th>STOREFP</th>
<th>PULP_OP</th>
</tr>
</thead>
<tbody>
<tr>
<td>LOADFP</td>
<td>JAL</td>
<td>JALR</td>
<td>BRANCH</td>
<td>STORE</td>
<td>STOREPOST</td>
<td>LOAD</td>
</tr>
<tr>
<td>LOADPOST</td>
<td>LUI</td>
<td>AUIPC</td>
<td>OPIMM</td>
<td>OP_FP</td>
<td>FMADD</td>
<td>FNMADD</td>
</tr>
<tr>
<td>FMSUB</td>
<td>FNMSUB</td>
<td>CORNERCASES</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Table 4.1: sel string encoding

According to the sel string provided as input, the OPCODE_LIST is created in a for-loop by appending OPCODE_DB elements to it using sel[i] as a masking condition. The second step to initialize the environment is to create an instruction list for each of the opcodes. The complete instruction list is obtained by combining together the list of instructions coming from the database. Then by looking at their 7 LSB’s (i.e. OPCODE) instructions are appended to the correct list. Finally what we get are the following lists containing number of elements specified in parenthesis:

- OPCODE_OP (52);
- OPCODE_FENCE (2);
- OPCODE_HWLOOP (6);
- OPCODE_SYSTEM (12);
- OPCODE_VECOP (200);
- OPCODE_STORE_FP (4);
- OPCODE_PULP_OP (20);
- OPCODE_LOAD_FP (4);
- OPCODE_JAL (1);
- OPCODE_JALR (1);
- OPCODE_BRANCH (8);
Simulation Environment

- OPCODE_STORE (3);
- OPCODE_STORE_POST (6);
- OPCODE_LOAD (5);
- OPCODE_LOAD_POST (10);
- OPCODE_LUI (1);
- OPCODE_AUIPC (1);
- OPCODE_OPIMM (9);
- OPCODE_OP_FP (89);
- OPCODE_FMADD;
- OPCODE_FNMADD;
- OPCODE_FMSUB;
- OPCODE_FNMSUB.

The random generator is also capable of providing wrong instructions, illegal instructions in order to raise exception signals and cover also corner cases situations. As explained the random generator has two levels of randomization, in the first step a random opcode is chosen among the ones included in \texttt{OPCODE\_LIST}, then a random instruction is chosen from the instruction list of that opcode. The random instruction got from the randomization process is a prototype containing 'x' in fields that have to be filled. So depending on the opcode different operations are performed to get a complete instruction where all the fields have been properly filled.

In the following snippet of code, the randomization of \texttt{OPCODE\_OP} is shown:

```python
1   typ=random.choice(OPCODE_LIST)
2       x+=1
3   if(typ=="OPCODE\_OP"):
4       inst = random.choice(OPCODE\_OP)
5       instr,name =inst
6       if(instr[0:2]=="11"): #IMMEDIATE BIT MANIPULATION
7           LS3=random_reg()
8           LS2=random_reg()
9           SRC=random_reg()
10          RD=random_regd()
```
4.2 – RV Generator

def random_reg(): used to generate random rs1 and rs2, a random number in range [0,31] is generated and converted to its binary form;

def random_regd(): used to generate random rd, a random number in range [1,31] is generated and converted to its binary form (rd ≠ 0);

def random_rounding_mode(): used to generate random rounding mode, a random element is chosen from list "ROUNDING_MODE=["000","001","010","011","100"]";

def random_immediate(n): Used to generate random operands to be inserted as immediate, it accepts an integer argument to select the number of bits of the immediate.

After instr_fill has been obtained by concatenation of operands and instruction fixed fields, it is always written in the instruction.txt file and ready to be read by the UVM Sequencer. If log option is enabled the instructions are written to the stdout. An example of random generated program is shown in the Fig. 4.1
Finally, after the randomization process has been completed it is necessary to
modify the UVM Framework, in particular, the sequencer file must be updated to change the number of iteration required to send all the transactions read from instruction.txt. This operation is performed by the function `overwrite_sequencesv(num)`.

### 4.3 UVM Env Configurator

UVM Env Configurator is the GUI developed to configure the simulation constraint and run a complete simulation. It has been developed using the Tkinter Python library, and contains a set of elements that will be explained in the next subsections. GUI is shown in Fig. 4.2.

![UVM Env Graphical User Interface](image)

**Figure 4.2: UVM Env Graphical User Interface**

#### 4.3.1 GUI Elements

In this subsection, the GUI elements are described explaining their functionalities. `tk.Entry` and `tk.CheckButton` are used to configure the simulation parameters. `tk.Buttons` are used to run simulations and interact with the UVM Framework.

**tk.Notebook element**

The graphical user interface is split into two tabs using `tk.Notebook`. The first frame shows the *UVM Configurator*, which will be detailed in the next paragraph, while the second frame is used to show the plots created during the simulation.
tk.Entry element

In the GUI there is a single `tk.Entry` element. It is used to specify the parameter `num` that is the number of random instruction to generate. The parameter is then passed as argument of the `rv_generator` function imported from `rvgen2.py`.

tk.CheckButton elements

There are 24 CheckButtons, each of them corresponds to each of the bit of the string `sel`. Each of the check buttons is associated with a `tk.booleanvar` whose initial value is set before initializing the interface. By changing the state of check buttons `sel` string is changed. There is an additional check button that is used to set the log parameter, if it is enabled then all the operations will print on stdout otherwise only some information will be printed.

tk.Buttons elements

There are ten `tk.buttons` used to interact with the UVM framework, in particular, each of them as a callback function that is run whenever the button is pressed:

- CLEAR: It was necessary to add this button to clean up the terminal from the result of the previous simulation. When it is pressed its callback function `clear()` is called;
- CLOSE: When it is pressed the GUI is terminated by using `root.quit()` function;
- RANDOMIZE: When it is pressed, the callback function `randomize()` is executed. All the input parameter are passed to the `rvgen2.rv_generator()` function and progress bar is updated gradually reaching 33%;
- COMPIL: After Randomization the `processor_sequence` has been modified so the UVM framework must be recompiled. When the button is pressed `compile_design()` function is executed and the progress bar is updated to 66%;
- RUN_SIM: When it is pressed `run_sim()` function is executed. At the end of the simulation if log on stdout was enabled then you should see on stdout the summary of results, otherwise, you need to press the SUMMARY button. The progress bar is updated reaching 100%;
- SUMMARY: When it is pressed a summary of the simulation is printed out on stdout;
- PLOT: This button is used to run `plot_report()` function imported from `functions.py`. When pressed a plot is created extracting information of the simulation from the summary;
• COVER: when it is pressed a coverage analysis of the last simulation is run. The result is saved in the coverage folder, a plot of the actual coverage is saved in the figure folder and the Coverage result labels are updated. Furthermore, aggregate coverage is evaluated by a combination of '.ucdb' databases of coverage;

• POST_EDIT: When this button is pressed, a post-editing operation is performed to create coverage reports divided by type and then different plots are created;

• TREND: Whenever it is pressed the result of the post-editing step is exploited to create a plot showing the percentage of coverage over the number of simulation.

4.3.2 GUI Result Frames

In this section the 'Results' tab is shown and explained. Each time that a plot is created using buttons available in UVM Configurator tab, instead of showing the plot using `plt.show()` method the figure is saved in the figures folder, re-opened and shown in the results tab.

Simulation Results

![Simulation Result frame](image.png)

Figure 4.3: Simulation Result frame
Simulation Environment

Single Coverage Result

![Figure 4.4: Single Coverage Result frame](image)

Aggregate Coverage Result

![Figure 4.5: Aggregate Coverage Result frame](image)
Coverage Trend

Figure 4.6: Coverage Trend frame
Chapter 5

Simulation and Results

In this chapter, the results of the verification procedure are shown and described. The first section will describe the type of coverage that has been used and the selected metrics, in the second section the results of the simulations are shown through plots to demonstrate that the DUV is working as intended (i.e. verification scope), the percentage of coverage is shown and how it has been reached is described. The last step is really important to demonstrate that the DUV has been fully exercised in most of its functionalities.

5.1 Coverage and metrics

Collecting coverage information is fundamental because improves efficiency allowing the identification of areas of the design that have not been exercised. In this project, code coverage is used to determine the level of confidence in the verification. Code coverage is a measure of the amount of code of the RTL description is executed when a simulation is run. A program with high code coverage has a lower chance of containing undetected bugs. That suggests we need to reach a high coverage level to make this verification procedure meaningful. Enabling the analysis of code coverage is simple as it is done by inserting a command while the design is compiled and simulation is run, after the simulation, a ".ucdb" database is returned and another ModelSim command is used to extract coverage information in a user-readable form.

The available metrics are the following and will be explained in the following subsections:

- Statement Coverage;
- Branch Coverage;
- Focused Expression Coverage;
• Focused Condition Coverage;
• FSM Coverage;
• Toggle Coverage.

5.1.1 Statement Coverage

Statement coverage reports which RTL statements have been executed or not. Lines can contain multiple statements and this kind of coverage can identify more than one statement for each line of code. It is useful to identify statements that have not executed and may investigate the reasons:

• Statement could not be executed because data and control flow prevents its execution;
• Is possible to execute statement but the condition required has not been created.

\[
Stmts = \frac{N_{of\_executed\_statements}}{Total\_statements} \times 100 \quad (5.1)
\]

5.1.2 Branch Coverage

The branch coverage metric counts the amount of control flow transfer statements like if, case, while, repeat, for, loop. An if statement with a single condition provides 2 possible conditions, and it is necessary to cover both the condition to achieve 100%. In some cases should be checked if the expression can assume both values or not.

\[
Branch = \frac{N_{of\_executed\_branches}}{Total\_branches} \times 100 \quad (5.2)
\]

5.1.3 Focused Condition Coverage

Condition coverage checks boolean expression in conditional statements to test and evaluate the variables or sub-expressions. The goal of condition coverage is to check individual outcomes for each logical condition [12]. For instance let consider the following boolean expression:

\[if(x<y \text{ and } a>b)\] there are two logical conditions, as a result the possible outcomes are

• True,True;
• True,False;
• False,True;
• False,False.

To achieve 100% coverage all the possible conditions must be covered.

\[
\text{Condition} = \frac{N_{\text{of executed operands}}}{\text{Total number of operands}} \times 100
\]  

(5.3)

### 5.1.4 Focused Expression Coverage

The same as condition coverage, but covers concurrent signal assignments instead of branch decisions[7]. As an example, an extract of Expression coverage is reported in Fig. 5.1.

![Figure 5.1: Expression coverage](image)

### 5.1.5 FSM Coverage

FSM Coverage is divided into state coverage and transition coverage. In fact, even if we succeed in covering all the state of the FSM, it is possible that all the transition has not covered. If we consider the FSM shown in Fig. 5.2.

![Figure 5.2: alu div FSM example](image)
Simulation and Results

The FSM is composed of three states and four arcs. State coverage provides a table with a count of the visits for each of the state. Transition coverage will provide a table in which there are all the possible arcs to be covered. A report like the one shown in Fig. 5.3 is provided for each of the FSM recognized in the design.

<table>
<thead>
<tr>
<th>FSM Coverage:</th>
</tr>
</thead>
<tbody>
<tr>
<td>Enabled Coverage</td>
</tr>
<tr>
<td>-----------------</td>
</tr>
<tr>
<td>FSM:</td>
</tr>
<tr>
<td>States:</td>
</tr>
<tr>
<td>Transitions:</td>
</tr>
</tbody>
</table>

Figure 5.3: FSM coverage

5.1.6 Toggle Coverage

Toggle coverage reports the number of times each bit of signals has toggled its value. The basic toggle coverage is enabled with -t option and cover: 1→0 and 0→1 transition. QuestaSim provide also an extended toggle coverage (-x option) to cover also transition from and to undefined ('X'), and tristate ('Z').
5.2 Simulations

In this section, the results of the simulations are shown and discussed. As the aim of the verification is reaching a coverage value over 90% we will see step by step what was necessary to reach that value. Two different attempts have been done to reach 90%, in the first one a large number of simulation were required, while in the second attempt a reduced number was sufficient as only the best test-set has been included while the useless test-set were removed. As the test-size is an important parameter the second attempt is better and only that results will be explained, Coverage trends are reported in Fig. 5.4 for completeness.

![Coverage Trend](image)

(a) First attempt coverage trend  
(b) Second attempt coverage trend

Figure 5.4: Coverage trends

5.2.1 Single Simulation

As a first approach, a single simulation with a random set of instructions has been run.

The expected result is that exploiting unconstrained randomization the value of the coverage will be of course lower than 50% because of the method employed in the random generator. In fact, as some opcode contains a large number of instructions while some other contains only one or two instructions and `random.choice()` method is not weighted, it is not possible to cover the major part of the instruction set with a single run.

The expected result has been confirmed by the results of the simulation shown in Fig. 5.5 and Fig. 5.6.

As you can see from Fig. 5.5 the number of instructions for each of the opcode is not balanced, having a great number of LUI and AUIPC. At this point, there
Simulation and Results

Figure 5.5: Results of the simulation
were two possible solutions:

- Modify the random generator assigning a weight to opcodes to increase the probability of executing different instructions avoiding repetition of the same instruction;

- Run multiple simulations and merge coverage databases.

Coverage report in Fig. 5.6 is a fast and convenient way to look at coverage results but is not sufficient to understand which part of the design has not covered. For this reason, together with this figure, two additional textual reports are provided:

- CovReport.txt: it shows the metric percentage reached for each of the components;

- CovReportlines.txt: It is an expanded version in which it is reported also detailed information about the uncovered parts of the design.

To increase the coverage the second technique has been exploited (i.e. Merging databases), so in the next subsections, the results of multiple runs are reported.

### 5.2.2 Multiple Simulations

QuestaSim provides a useful command `vcover merge` to merge the coverage results obtained in the previous simulations. This mechanism allows joining databases without repetition of previously covered parts. Now on, the coverage efforts will be explained focusing on what has been done to get a certain increase and discussing what has not been covered.

### Instruction Set Coverage

After the first generic simulation, a sequence of constrained simulations has been launched to cover the entire Instruction Set. To be sure that normal situation has been properly tested we can look at the `riscv_compressed_decoder.sv` and `riscv_decoder.sv` coverage results from `CovReportlines.txt`.

For each of the simulation, the user is able to select the opcodes to be inserted according to the coverage requirements. Starting from standard Arithmetic operation up to vectorial and privileged almost all the instructions contained in ISA have been tested. Results of some simulations are shown in Fig. 5.7 and Fig. 5.8.
Simulation and Results

Figure 5.6: Coverage Report
5.2 – Simulations

(a) Single run including AUIPC, OPCODE_OP, OPCODE_OPIIMM
(b) Aggregate coverage after (a)
(c) Single run including OPCODE_SYSTEM
(d) Aggregate coverage after (c)
(e) Single run including PULP_OP
(f) Aggregate coverage after (e)

Figure 5.7: Instruction Set Coverage reports

At that point, Coverage reached 75.6% and a larger increase with this approach.
was not possible as great results were already obtained for the decoder and compressed decoder. Complete coverage results are reported in Fig. 5.9.

Figure 5.8: Instruction Set Coverage reports
5.2 – Simulations

Figure 5.9: Coverage Report
Exception handling

By looking at Fig. 5.9 it appears clear that the verification effort must be directed to the `riscv_controller` and its related components. For this reason, the random generator has been modified to generate not only correct instructions but also illegal ones. Random Generator is capable of generating:

- Illegal Instructions;
- Misaligned Instructions;
- Wrong Hwloop.

In this way, the idea is to cover exceptions that raise in the device under test whenever rules are not respected. The already existing ISA database has been used to modify also the 'non-modifiable' fields of the instructions. To be more clear a simple example is reported in Tab. 5.1

Correct SRAI instruction is highlighted in green while illegal SRAI is red-highlighted.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>0100000-shamt</td>
<td>rs1</td>
<td>101</td>
<td>rd</td>
<td>0010011</td>
<td>SRAI</td>
</tr>
<tr>
<td>0100111-shamt</td>
<td>rs1</td>
<td>101</td>
<td>rd</td>
<td>0010011</td>
<td>SRAI-ILL</td>
</tr>
</tbody>
</table>

Table 5.1: Example of illegal instruction

The only difference is on the funct7 field which contains an abnormal value. When the generator picks a new instruction from the database, normally its fields are filled with register-addresses or immediate, but in this case, also the functional fields (i.e. the ones used to decode) are overwritten with random values.

Another kind of illegal instructions generated are instructions that does not respect the 32bit, or 16bit encoding. In this way misaligned access in instruction memory is simulated allowing a coverage of that corner case.

Many other types of corner cases has been inserted to cover exceptions handling of the design. Exploiting this approach a 82% of coverage has been reached. Coverage Results are reported in Fig. 5.10.
5.2 – Simulations

Figure 5.10: Coverage Report

Coverage Aggregate Statistics: 82.0%
Interrupt handling

The final effort to increase coverage was directed to interrupt handling. As interrupt controller’s and controller’s FSM contains states and transition that occurs only if interrupt request arrives and it is served, a mechanism to generate interrupts is required. This generation is done in the sequencer, signals related to interrupts were already declared in `processor_sequence.sv` so it was necessary to add some constraints on this signal to randomly send interrupt requests. In addition to that, to cover some FSM transitions in multiplier and controller it was necessary to randomly reset the DUV. After a couple of simulation 90.1% coverage has been reached (Fig. 5.12). The remaining part of the design that has not covered is due to corner cases hard to be caused or transition that never happens.

![Aggregate Coverage report](image)

Figure 5.11: Instruction Set Coverage reports
5.2 – Simulations

Figure 5.12: Coverage Report
Chapter 6

Conclusion and Future works

This work aimed to verify a complex processor architecture achieving a high level of confidence. The result provided in the previous chapter clearly shows that apart from some corner cases that were not possible to cover a level of coverage around 90% was achieved. It is important to specify that code coverage is a very strict way to check coverage as it is really hard to increase coverage in some cases. That result was achieved employing merge of Coverage result, otherwise, we have seen that for a single random program the coverage is around 50%. That is not good news as we discovered that randomization is not a good approach to achieve high coverage and a better approach could be introducing a smarter random generator that tries to improve its performance by looking at the previous coverage results. In addition to that, a better idea is to use an external ISS (Instruction Set Simulator) to be embedded in the scoreboard as the reference model. Introducing an ISS could give two main advantages:

- Simplify the UVM scoreboard operations;
- Increase the confidence on the pass/fail results. As ISS are well tested we avoid that errors present in DUV are repeated in the reference model.

That approach could be introduced in future works, as introducing in the currently developed UVM Framework was too complicated because the existent Instruction Set Simulator includes only certain RISC-V Extension and should be modified to include proprietary extensions.
Appendix A

ALU Extension

A.1 Bit Manipulation Operations

<table>
<thead>
<tr>
<th>f2</th>
<th>Is3[4:0]</th>
<th>Is2[4:0]</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>luimm5[4:0]</td>
<td>src</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
<td>p.extract</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>luimm5[4:0]</td>
<td>src</td>
<td>001</td>
<td>dest</td>
<td>0110011</td>
<td>p.extractu</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>luimm5[4:0]</td>
<td>src</td>
<td>010</td>
<td>dest</td>
<td>0110011</td>
<td>p.insert</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>luimm5[4:0]</td>
<td>src</td>
<td>011</td>
<td>dest</td>
<td>0110011</td>
<td>p.bclr</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>luimm5[4:0]</td>
<td>src</td>
<td>100</td>
<td>dest</td>
<td>0110011</td>
<td>p.bset</td>
</tr>
<tr>
<td>10</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
<td>p.extractr</td>
</tr>
<tr>
<td>10</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>0110011</td>
<td>p.extractur</td>
</tr>
<tr>
<td>10</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>0110011</td>
<td>p.insertr</td>
</tr>
<tr>
<td>10</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>011</td>
<td>dest</td>
<td>0110011</td>
<td>p.bclrr</td>
</tr>
<tr>
<td>10</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>100</td>
<td>dest</td>
<td>0110011</td>
<td>p.bsetr</td>
</tr>
</tbody>
</table>

Table A.1: Bit Manipulation Encoding

<table>
<thead>
<tr>
<th>funct7</th>
<th>rs2</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000100</td>
<td>src2</td>
<td>src1</td>
<td>101</td>
<td>dest</td>
<td>0110011</td>
<td>p.ror</td>
</tr>
<tr>
<td>0001000</td>
<td>00000</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
<td>p.ff1</td>
</tr>
<tr>
<td>0001000</td>
<td>00000</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>0110011</td>
<td>p.ff1</td>
</tr>
<tr>
<td>0001000</td>
<td>00000</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>0110011</td>
<td>p.clb</td>
</tr>
<tr>
<td>0001000</td>
<td>00000</td>
<td>src1</td>
<td>011</td>
<td>dest</td>
<td>0110011</td>
<td>p.cnt</td>
</tr>
</tbody>
</table>

Table A.2: Bit Manipulation Encoding
A.2 General ALU Operations

<table>
<thead>
<tr>
<th>funct7</th>
<th>rs2</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000010</td>
<td>0000</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
<td>p.abs</td>
</tr>
<tr>
<td>0000010</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>0110011</td>
<td>p.slet</td>
</tr>
<tr>
<td>0000010</td>
<td>src2</td>
<td>src1</td>
<td>011</td>
<td>dest</td>
<td>0110011</td>
<td>p.sletu</td>
</tr>
<tr>
<td>0000010</td>
<td>src2</td>
<td>src1</td>
<td>100</td>
<td>dest</td>
<td>0110011</td>
<td>p.min</td>
</tr>
<tr>
<td>0000010</td>
<td>src2</td>
<td>src1</td>
<td>101</td>
<td>dest</td>
<td>0110011</td>
<td>p.minu</td>
</tr>
<tr>
<td>0000010</td>
<td>src2</td>
<td>src1</td>
<td>110</td>
<td>dest</td>
<td>0110011</td>
<td>p.max</td>
</tr>
<tr>
<td>0000010</td>
<td>src2</td>
<td>src1</td>
<td>111</td>
<td>dest</td>
<td>0110011</td>
<td>p.maxu</td>
</tr>
<tr>
<td>0001000</td>
<td>0000</td>
<td>src1</td>
<td>100</td>
<td>dest</td>
<td>0110011</td>
<td>p.exths</td>
</tr>
<tr>
<td>0001000</td>
<td>0000</td>
<td>src1</td>
<td>101</td>
<td>dest</td>
<td>0110011</td>
<td>p.exthz</td>
</tr>
<tr>
<td>0001000</td>
<td>0000</td>
<td>src1</td>
<td>110</td>
<td>dest</td>
<td>0110011</td>
<td>p.extbs</td>
</tr>
<tr>
<td>0001000</td>
<td>0000</td>
<td>src1</td>
<td>111</td>
<td>dest</td>
<td>0110011</td>
<td>p.extbz</td>
</tr>
</tbody>
</table>

Table A.3: General Alu Encoding

<table>
<thead>
<tr>
<th>f2</th>
<th>Is3[4:0]</th>
<th>rs2</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduN</td>
</tr>
<tr>
<td>10</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduN</td>
</tr>
<tr>
<td>00</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>110</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduN</td>
</tr>
<tr>
<td>10</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>111</td>
<td>dest</td>
<td>1011011</td>
<td>p.addRN</td>
</tr>
<tr>
<td>00</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>111</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduRN</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>110</td>
<td>dest</td>
<td>1011011</td>
<td>p.addRN</td>
</tr>
<tr>
<td>10</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>111</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduRN</td>
</tr>
<tr>
<td>01</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduN</td>
</tr>
<tr>
<td>11</td>
<td>000000</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduN</td>
</tr>
<tr>
<td>11</td>
<td>000000</td>
<td>src2</td>
<td>src1</td>
<td>110</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduRN</td>
</tr>
<tr>
<td>01</td>
<td>000000</td>
<td>src2</td>
<td>src1</td>
<td>111</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduRN</td>
</tr>
<tr>
<td>11</td>
<td>000000</td>
<td>src2</td>
<td>src1</td>
<td>111</td>
<td>dest</td>
<td>1011011</td>
<td>p.adduRN</td>
</tr>
</tbody>
</table>

Table A.4: General Alu Encoding
A.3 Immediate Branching

### Table A.5: General Alu Encoding

<table>
<thead>
<tr>
<th>funct7</th>
<th>Is2[4:0]</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0001010</td>
<td>Iuimm5[4:0]</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>0110011</td>
<td>p.clip</td>
</tr>
<tr>
<td>0001010</td>
<td>Iuimm5[4:0]</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>0110011</td>
<td>p.clipu</td>
</tr>
<tr>
<td>0001010</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>0110011</td>
<td>p.clipr</td>
</tr>
<tr>
<td>0001010</td>
<td>src2</td>
<td>src1</td>
<td>010</td>
<td>dest</td>
<td>0110011</td>
<td>p.clipur</td>
</tr>
</tbody>
</table>

### Table A.6: Immediate Branching Encoding

<table>
<thead>
<tr>
<th>Imm12</th>
<th>Imm5</th>
<th>rs1</th>
<th>funct3</th>
<th>Imm12</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
</table>

### Table A.7: MAC Encoding

<table>
<thead>
<tr>
<th>funct7</th>
<th>rs2</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>0100001</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
<td>p.mac</td>
</tr>
<tr>
<td>0100001</td>
<td>src2</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>0110011</td>
<td>p.msu</td>
</tr>
<tr>
<td>10</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
</tr>
<tr>
<td>11</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
</tr>
<tr>
<td>10</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>0110011</td>
</tr>
<tr>
<td>10</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>100</td>
<td>dest</td>
<td>0110011</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>100</td>
<td>dest</td>
<td>0110011</td>
</tr>
</tbody>
</table>

A.4 MAC Operations
### ALU Extension

<table>
<thead>
<tr>
<th>f2</th>
<th>Is3[4:0]</th>
<th>rs2</th>
<th>rs1</th>
<th>funct3</th>
<th>rd</th>
<th>opcode</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<td>00</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>1011011</td>
<td>p.mulu</td>
</tr>
<tr>
<td>01</td>
<td>00000</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>1011011</td>
<td>p.mullhu</td>
</tr>
<tr>
<td>00</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>1011011</td>
<td>p.muluN</td>
</tr>
<tr>
<td>01</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>000</td>
<td>dest</td>
<td>1011011</td>
<td>p.mullhuN</td>
</tr>
<tr>
<td>00</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>100</td>
<td>dest</td>
<td>1011011</td>
<td>p.muluRN</td>
</tr>
<tr>
<td>01</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>100</td>
<td>dest</td>
<td>1011011</td>
<td>p.mullhuRN</td>
</tr>
<tr>
<td>10</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>1011011</td>
<td>p.macsN</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>1011011</td>
<td>p.machhsN</td>
</tr>
<tr>
<td>10</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>101</td>
<td>dest</td>
<td>1011011</td>
<td>p.macsRN</td>
</tr>
<tr>
<td>11</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>101</td>
<td>dest</td>
<td>1011011</td>
<td>p.machhsRN</td>
</tr>
<tr>
<td>00</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>1011011</td>
<td>p.macuN</td>
</tr>
<tr>
<td>01</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>001</td>
<td>dest</td>
<td>1011011</td>
<td>p.machhuN</td>
</tr>
<tr>
<td>00</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>101</td>
<td>dest</td>
<td>1011011</td>
<td>p.macuRN</td>
</tr>
<tr>
<td>01</td>
<td>Luimm5[4:0]</td>
<td>src2</td>
<td>src1</td>
<td>101</td>
<td>dest</td>
<td>1011011</td>
<td>p.machhuRN</td>
</tr>
</tbody>
</table>

Table A.8: MAC Encoding
Appendix B

Vectorial Extension

For Vectorial extension, as the instruction set is replicated for sc, sci, normal and for half-words and bytes it is better to provide a table showing the behaviour instead of the encoding.

B.1 Vectorial ALU

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>pv.add{sc,sci}{h,b}</td>
<td>rd[i] = rs1[i] + rs2[i]</td>
</tr>
<tr>
<td>pv.sub{sc,sci}{h,b}</td>
<td>rd[i] = rs1[i] - rs2[i]</td>
</tr>
<tr>
<td>pv.avg{sc,sci}{h,b}</td>
<td>rD[i] = (rs1[i] + op2[i]) &gt;&gt;1</td>
</tr>
<tr>
<td>pv.avgu{sc,sci}{h,b}</td>
<td>rD[i] = (rs1[i] + op2[i]) &gt;&gt;1</td>
</tr>
<tr>
<td>pv.min{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &lt; op2[i] ? rs1[i] : op2[i]</td>
</tr>
<tr>
<td>pv.minu{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &gt; op2[i] ? rs1[i] : op2[i]</td>
</tr>
<tr>
<td>pv.max{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &gt; op2[i] ? rs1[i] : op2[i]</td>
</tr>
<tr>
<td>pv.maxu{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &gt; op2[i] ? rs1[i] : op2[i]</td>
</tr>
<tr>
<td>pv.srl{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &gt;&gt; op2[i]</td>
</tr>
<tr>
<td>pv.sra{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &gt;&gt;&gt; op2[i]</td>
</tr>
<tr>
<td>pv.sll{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &lt;&lt; op2[i]</td>
</tr>
<tr>
<td>pv.or{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i]</td>
</tr>
<tr>
<td>pv.xor{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] ^ op2[i]</td>
</tr>
<tr>
<td>pv.and{sc,sci}{h,b}</td>
<td>rD[i] = rs1[i] &amp; op2[i]</td>
</tr>
<tr>
<td>pv.abs{h,b}</td>
<td>rD[i] = rs1 &lt;0 ? – rs1 : rs1</td>
</tr>
</tbody>
</table>

Table B.1: Vectorial General ALU Instructions
### Table B.2: Vectorial General ALU Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>pv.abs{.h,.b}</td>
<td>( rD[i] = rs1 &lt;0 ? - rs1 : rs1 )</td>
</tr>
<tr>
<td>pv.extract.h</td>
<td>( rD = \text{Sext}(rs1[((I+1)<em>16)-1 : I</em>16]) )</td>
</tr>
<tr>
<td>pv.extract.b</td>
<td>( rD = \text{Sext}(rs1[((I+1)<em>8)-1 : I</em>8]) )</td>
</tr>
<tr>
<td>pv.extractu.h</td>
<td>( rD = \text{Zext}(rs1[((I+1)<em>16)-1 : I</em>16]) )</td>
</tr>
<tr>
<td>pv.extractu.b</td>
<td>( rD = \text{Zext}(rs1[((I+1)<em>8)-1 : I</em>8]) )</td>
</tr>
<tr>
<td>pv.insert.h</td>
<td>( rD[((I+1)<em>16-1:I</em>16] = rs1[15:0] )</td>
</tr>
<tr>
<td>pv.insert.b</td>
<td>( rD[((I+1)<em>8-1:I</em>8] = rs1[7:0] )</td>
</tr>
</tbody>
</table>

### Table B.3: Vectorial Dot Product Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>pv.dotup[.sc,.sci].h</td>
<td>( rD = \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] )</td>
</tr>
<tr>
<td>pv.dotup[.sc,.sci].b</td>
<td>( rD = \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.dotusp[.sc,.sci].h</td>
<td>( rD = \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.dotusp[.sc,.sci].b</td>
<td>( rD = \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.dotsp[.sc,.sci].h</td>
<td>( rD = \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.dotsp[.sc,.sci].b</td>
<td>( rD = \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.sdotup[.sc,.sci].h</td>
<td>( rD = rD + \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] )</td>
</tr>
<tr>
<td>pv.sdotup[.sc,.sci].b</td>
<td>( rD = rD + \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.sdotusp[.sc,.sci].h</td>
<td>( rD = rD + \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.sdotusp[.sc,.sci].b</td>
<td>( rD = rD + \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.sdotsp[.sc,.sci].h</td>
<td>( rD = rD + \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
<tr>
<td>pv.sdotsp[.sc,.sci].b</td>
<td>( rD = rD + \text{rs1}[0] \times \text{op2}[0] + \text{rs1}[1] \times \text{op2}[1] + \text{rs1}[2] \times \text{op2}[2] + \text{rs1}[3] \times \text{op2}[3] )</td>
</tr>
</tbody>
</table>

### Table B.4: Vectorial Shuffle-pack Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
</table>
| pv.shuffle.h | \( rD[31:16] = \text{rs1}[2:16]*16+15:rs2[16]*16 \)  
\( rD[15:0] = \text{rs1}[2:0]*16+15:rs2[0]*16 \) |
| pv.shuffle.sci.h | \( rD[31:16] = \text{rs1}[11:16+15:11*16] \)  
\( rD[15:0] = \text{rs1}[10:16+15:10*16] \) |
| pv.shuffle.b | \( rD[31:24] = \text{rs1}[25:24]*8+7:rs2[25:24]*8 \)  
\( rD[23:16] = \text{rs1}[17:16]*8+7:rs2[17:16]*8 \)  
\( rD[15:8] = \text{rs1}[9:8]*8+7:rs2[9:8]*8 \)  
\( rD[7:0] = \text{rs1}[1:0]*8+7:rs2[1:0]*8 \) |
| pv.shuffle0.sci.b | \( rD[31:24] = \text{rs1}[7:0] \)  
\( rD[23:16] = \text{rs1}[15:14]*8+7: (15:14)*8 \)  
\( rD[15:8] = \text{rs1}[(13:12)*8+7: (13:12)*8 \)  
\( rD[7:0] = \text{rs1}[(11:10)*8+7:(11:10)*8 \) |
### Table B.5: Vectorial Shuffle-pack Instructions

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
</table>
| pv.shuffle1.sci.b   | rD[31:24] = rs1[15:8]  
                      | rD[23:16] = rs1[(15:14)*8+7: (15:14)*8]  
                      | rD[15:8]  = rs1[(13:12)*8+7: (13:12)*8]  
                      | rD[7:0]  = rs1[(11:10)*8+7: (11:10)*8]   |
| pv.shuffle2.sci.b   | rD[31:24] = rs1[23:16]  
                      | rD[23:16] = rs1[(15:14)*8+7: (15:14)*8]  
                      | rD[15:8]  = rs1[(13:12)*8+7: (13:12)*8]  
                      | rD[7:0]  = rs1[(11:10)*8+7: (11:10)*8]   |
| pv.shuffle3.sci.b   | rD[31:24] = rs1[31:24]  
                      | rD[23:16] = rs1[(15:14)*8+7: (15:14)*8]  
                      | rD[15:8]  = rs1[(13:12)*8+7: (13:12)*8]  
                      | rD[7:0]  = rs1[(11:10)*8+7: (11:10)*8]   |
| pv.shuffle2.h       | rD[31:16] = ((rs2[17] == 1) ? rs1 : rD)[rs2[16]*16+15:rs2[16]*16]  
                      | rD[15:0] = ((rs2[1] == 1) ? rs1 : rD)[rs2[0]*16+15:rs2[0]*16]   |
                      | rD[7:0]  = ((rs2[2]  == 1) ? rs1 : rD)[rs2[1:0]*8+7:rs2[1:0]*8]   |
| pv.pack.h           | rD[31:16] = rs1[15:0]  
                      | rD[15:0] = rs2[15:0]   |
| pv.packhi.b         | rD[31:24] = rs1[7:0]  
                      | rD[23:16] = rs2[7:0]   |
| pv.packlo.b         | rD[15:8] = rs1[7:0]  
                      | rD[7:0]  = rs2[7:0]   |
## B.2 Vectorial Comparison

<table>
<thead>
<tr>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>pv.cmpeq[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] == op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmpne[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] != op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmpgt[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &gt; op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmpge[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &gt;= op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmplt[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &lt; op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmple[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &lt;= op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmpgtu[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &gt; op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmpgeu[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &gt;= op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmpltu[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &lt; op2 ? '1' : '0</td>
</tr>
<tr>
<td>pv cmpleu[.sc,.sci].h,.b</td>
<td>rD[i] = rs1[i] &lt;= op2 ? '1' : '0</td>
</tr>
</tbody>
</table>

Table B.6: Vectorial comparison Instructions
Bibliography


Ringraziamenti

Volendo fare un'analogia tra il lavoro di tesi e il funzionamento di un processore si potrebbe dire che è stato come un lungo programma eseguito istruzione dopo istruzione in attesa di una istruzione di jump che mi portasse alla conclusione. Quando ho iniziato il percorso universitario non mi sarei mai aspettato una conclusione di questo genere, con una tesi svolta completamente da casa, sforzandomi ogni giorno di fare un passo avanti nella direzione giusta. Questa situazione ha reso il lavoro più difficile, e quello che mi ha spinto ad andare avanti è stata la voglia di concludere questo lungo percorso nel migliore dei modi. Diverse persone mi sono state accanto alimentando la mia forza di volontà soprattutto quando veniva a mancare ed è a loro che voglio fare i miei ringraziamenti. In primis ci sono i miei genitori che con la loro costante presenza mi hanno sempre invogliato a fare il massimo credendo fortemente in me. Poi mio fratello maggiore Ignazio che con un percorso simile al mio, mi ha mostrato che a volte bisogna solo tenere duro ancora un attimo per raggiungere i propri obiettivi.

Un ringraziamento speciale va a Monica Monticciolo che in tutti questi anni mi ha incoraggiato ad andare avanti aiutandomi sempre a vedere il lato positivo in ogni situazione ma soprattutto quando la confusione e lo stress annebbiavano la mia vista è riuscita a mostrarmi la via di uscita.

Un grande ringraziamento va a tutti i miei amici di Trapani e quelli di Torino con i quali ho condiviso i pochi momenti di gioia di uno studente del Politecnico: Mattia La Francesca, Marco Iuculano, Francesco Biasibetti, Gianluca Monaco, Marco Saladino, Albertino Scarlata, Matteo Ripa, Michelino Baratto, Aldo Moschetti, Francesco Minosi, Filippo Sanangelantoni, Federica Nasr, Fabio Asti. Un ulteriore ringraziamento è rivolto al mio coinquilino, Giuseppe Narducci, con la nostra "ora del tè" e 'birra e patatine' ha contribuito ad alleggerire il momento che stavo attraversando. Infine vorrei ringraziare i due miei colleghi con i quali ho svolto la maggior parte dei progetti in questi anni: Federico Di Fazio e Manuel Capaccio senza i quali probabilmente non sarei qui a scrivere questi ringraziamenti. Con loro ho condiviso momenti difficili sapendo che prima o dopo avremmo trovato una soluzione. Inoltre vorrei ringraziare il professor Sanchez ed Annachiara Ruospo per avermi fornito l'opportunità di lavorare ad un progetto così stimolante e per avermi aiutato in ciascuna delle fasi del lavoro.